Listen "Big Data Normalization"
Episode Synopsis
Discussing a case study which describes a novel technique for efficiently storing and utilizing large amounts of data in massively parallel processing (MPP) databases. The technique, known as Anchor Modeling, is implemented in the HP Vertica database and is employed by Avito, a Russian e-commerce platform, to process terabytes of data for real-time analytics. The paper argues that traditional normalization techniques are inadequate for Big Data scenarios, highlighting the benefits of Anchor Modeling in terms of scalability, performance, and ease of data maintenance. The authors provide theoretical estimates and practical verification through experiments comparing the performance of Anchor Modeling with a traditional 3NF model, demonstrating its effectiveness in handling complex ad-hoc queries.
More episodes of the podcast Talking Data
Prepare for Takeoff
12/10/2024
Data Trustworthiness
09/10/2024
Temporal Dimensional Modeling
27/09/2024
The Early Mathematical Journey
27/09/2024
Random Thoughts
22/09/2024
Time in Databases
19/09/2024
The Infinite Decay of Loyalty
19/09/2024
Rescuing the Excluded Middle
19/09/2024
Rethinking the Database
19/09/2024
The Model-Driven Organization
19/09/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.