Listen "Data Lakes"
Episode Synopsis
Send us a textThis week we talk about data lakes. Essentially, a data lake is a mechanism to store large quantities of (typically) raw data, both structured and unstructured, bringing together data from across an organisation.In a "traditional" data warehouse solution, we tend to think about an "Extract, Transform and Load " process, extracting the data from source, transforming it for analysis, and loading it into the data warehouse. With a data lake, the approach tends to be "Extract, Load, and Transform", data is extracted from source, loaded into the data lake, then transformed when needed. This can simplify the process as there is no need to transform it for every scenario at build time - so we can speed up implementation. The down side of course is that we have to do more work at run time. As such, there is probably not an either/or situation with data lakes vs more structured systems.The flexibility of data lakes makes it tempting to dump anything and everything into the data lake. If this starts to happen without any curation, you are likely to end up in more of a data swamp. Data lakes are not a way to avoid governance.The main cloud players all offer some sort of data lake:Azure Data LakeAWS Data LakeGoogle Data LakeIf you already use Power BI, or are considering it, we strongly recommend you join your local Power BI user group here.To find out more about our services and the help we can offer, contact us at one of the websites below:UK and Europe: https://www.clearlycloudy.co.uk/North America: https://www.clearlysolutions.net/
More episodes of the podcast The Clearly Podcast
The 2024 IT Consulting Job Market
24/06/2024
Azure vs Fabric
17/06/2024
Working Outside the Microsoft Stack
20/05/2024
Choosing a Cloud Provider
13/05/2024
Getting Off the Excel Mindset
06/05/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.