Microsoft Fabric Explained: No Code, No Nonsense

03/10/2025 19 min

Listen "Microsoft Fabric Explained: No Code, No Nonsense"

Descargar episodio Ver en sitio original

Episode Synopsis

Here’s a fun corporate trick: Microsoft managed to confuse half the industry by slapping the word “house” on anything with a data label. But here’s what you’ll actually get out of the next few minutes: we’ll nail down what OneLake really is, when to use a Warehouse versus a Lakehouse, and why Delta and Parquet keep your data from turning into a swamp of CSVs. That’s three concrete takeaways in plain English. Want the one‑page cheat sheet? Subscribe to the M365.Show newsletter. Now, with the promise clear, let’s talk about Microsoft’s favorite game: naming roulette.Lakehouse vs Warehouse: Microsoft’s Naming RouletteWhen people first hear “Lakehouse” and “Warehouse,” it sounds like two flavors of the same thing. Same word ending, both live inside Fabric, so surely they’re interchangeable—except they’re not. The names are what trip teams up, because they hide the fact that these are different experiences built on the same storage foundation. Here’s the plain breakdown. A Warehouse is SQL-first. It expects structured tables, defined schemas, and clean data. It’s what you point dashboards at, what your BI team lives in, and what delivers fast query responses without surprises. A Lakehouse, meanwhile, is the more flexible workbench. You can dump in JSON logs, broken CSVs, or Parquet files from another pipeline and not break the system. It’s designed for engineers and data scientists who run Spark notebooks, machine learning jobs, or messy transformations. If you want a visual, skip the sitcom-length analogy: think of the Warehouse as a labeled pantry and the Lakehouse as a garage with the freezer tucked next to power tools. One is organized and efficient for everyday meals. The other has room for experiments, projects, and overflow. Both store food, but the vibe and workflow couldn’t be more different. Now, here’s the important part Microsoft’s marketing can blur: neither exists in its own silo. Both Lakehouses and Warehouses in Fabric store their tables in the open Delta Parquet format, both sit on top of OneLake, and both give you consistent access to the underlying files. What’s different is the experience you interact with. Think of Fabric not as separate buildings, but as two different rooms built on the same concrete slab, each furnished for a specific kind of work. From a user perspective, the divide is real. Analysts love Warehouses because they behave predictably with SQL and BI tools. They don’t want to crawl through raw web logs at 2 a.m.—they want structured tables with clean joins. Data engineers and scientists lean toward Lakehouses because they don’t want to spend weeks normalizing heaps of JSON just to answer “what’s trending in the logs.” They want Spark, Python, and flexibility. So the decision pattern boils down to this: use a Warehouse when you need SQL-driven, curated reporting; use a Lakehouse when you’re working with semi-structured data, Spark, and exploration-heavy workloads. That single sentence separates successful projects from the ones where teams shout across Slack because no one knows why the “dashboard” keeps choking on raw log files. And here’s the kicker—mixing up the two doesn’t just waste time, it creates political messes. If management assumes they’re interchangeable, analysts get saddled with raw exports they can’t process, while engineers waste hours building shadow tables that should’ve been Lakehouse assets from day one. The tools are designed to coexist, not to substitute for each other. So the bottom line: Warehouses serve reporting. Lakehouses serve engineering and exploration. Same OneLake underneath, same Delta Parquet files, different optimizations. Get that distinction wrong, and your project drags. Get it right, and both sides of the data team stop fighting long enough to deliver something useful to the business. And since this all hangs on the same shared layer, it raises the obvious question—what exactly is this OneLake that sits under everything?OneLake: The Data Lake You Already OwnPicture this: you move into a new house, and surprise—there’s a giant underground pool already filled and ready to use. That’s what OneLake is in Fabric. You don’t install it, you don’t beg IT for storage accounts, and you definitely don’t file a ticket for provisioning. It’s automatically there. OneLake is created once per Fabric tenant, and every workspace, every Lakehouse, every Warehouse plugs into it by default. Under the hood, it actually runs on Azure Data Lake Storage Gen2, so it’s not some mystical new storage type—it’s Microsoft putting a SaaS layer on top of storage you probably already know. Before OneLake, each department built its own “lake” because why not—storage accounts were cheap, and everyone believed their copy was the single source of truth. Marketing had one. Finance had one. Data science spun one up in another region “for performance.” The result was a swamp of duplicate files, rogue pipelines, and zero coordination. It was SharePoint sprawl, except this time the mistakes showed up in your Azure bill. Teams burned budget maintaining five lakes that didn’t talk to each other, and analysts wasted nights reconciling “final_v2” tables that never matched. OneLake kills that off by default. Think of it as the single pool everyone has to share instead of each team digging muddy holes in their own backyards. Every object in Fabric—Lakehouses, Warehouses, Power BI datasets—lands in the same logical lake. That means no more excuses about Finance having its “own version” of the data. To make sharing easier, OneLake exposes a single file-system namespace that stretches across your entire tenant. Workspaces sit inside that namespace like folders, giving different groups their place to work without breaking discoverability. It even spans regions seamlessly, which is why shortcuts let you point at other sources without endless duplication. The small print: compute capacity is still regional and billed by assignment, so while your OneLake is global and logical, the engines you run on top of it are tied to regions and budgets. At its core, OneLake standardizes storage around Delta Parquet files. Translation: instead of ten competing formats where every engine has to spin its own copy, Fabric speaks one language. SQL queries, Spark notebooks, machine learning jobs, Power BI dashboards—they all hit the same tabular store. Columnar layout makes queries faster, transactional support makes updates safe, and that reduces the nightmare of CSV scripts crisscrossing like spaghetti. The structure is simple enough to explain to your boss in one diagram. At the very top you have your tenant—that’s the concrete slab the whole thing sits on. Inside the tenant are workspaces, like containers for departments, teams, or projects. Inside those workspaces live the actual data items: warehouses, lakehouses, datasets. It’s organized, predictable, and far less painful than juggling dozens of storage accounts and RBAC assignments across three regions. On top of this, Microsoft folds in governance as a default: Purview cataloging and sensitivity labeling are already wired in. That way, OneLake isn’t just raw storage, it also enforces discoverability, compliance, and policy from day one without you building it from scratch. If you’ve lived the old way, the benefits are obvious. You stop paying to store the same table six different times. You stop debugging brittle pipelines that exist purely to sync finance copies with marketing copies. You stop getting those 3 a.m. calls where someone insists version FINAL_v3.xlsx is “the right one,” only to learn HR already published FINAL_v4. OneLake consolidates that pain into a single source of truth. No heroic intern consolidating files. No pipeline graveyard clogging budgets. Just one layer, one copy, and all the engines wired to it. It’s not magic, though—it’s just pooled storage. And like any pool, if you don’t manage it, it can turn swampy real fast. OneLake gives you the centralized foundation, but it relies on the Delta format layer to keep data clean, consistent, and usable across different engines. That’s the real filter that turns OneLake into a lake worth swimming in. And that brings us to the next piece of the puzzle—the unglamorous technology that keeps that water clear in the first place.Delta and Parquet: The Unsexy HeroesEver heard someone drop “Delta Parquet” in a meeting and you just nodded along like you totally understood it? Happens to everyone. The truth is, it’s not a secret Microsoft code name or Star Trek tech—it’s just how Fabric stores tabular data under the hood. Every Lakehouse and Warehouse in Fabric writes to **Delta Parquet format**, which sounds dull until you realize it’s the reason your analytics don’t fall apart the second SQL and Spark meet in the same room. Let’s start with Parquet. Parquet is a file format that stores data in columns instead of rows. That simple shift is a game-changer. Think of it this way: if your data is row-based, every query has to slog through every field in every record, even if all you asked for was “Customer_ID.” It’s like reading every Harry Potter book cover-to-cover just to count how many times “quidditch” shows up. Columnar storage flips that around—you only read the column you need. It’s like going straight to the dictionary index under “Q” and grabbing just the relevant bits. That means queries run faster, fewer bytes are read, and your cloud bill doesn’t explode every time someone slices 200 million rows for a dashboard. Parquet delivers raw performance and efficiency. Without it, large tables turn into a laggy nightmare and cost far more than they should. With it, analysts can run their reports inside a coffee break instead of during an all-hands meeting. But Parquet alone just gives us efficient files. What it doesn’t give us is control, reliability, or sanity when teams start hammering the same datasets. That’s where Delta Lake comes in. Delta wraps around those Parquet filBecome a supporter of this podcast: https://www.spreaker.com/podcast/m365-show-podcast--6704921/support.Follow us on:LInkedInSubstack

More episodes of the podcast M365 Show Podcast

Why Your Intune Deployment Is A Security Risk 05/12/2025

Why Your Threat Analytics Is Useless (The Report You Missed) 05/12/2025

The M365 Audit Logs You're Ignoring: Why Zero Trust is a Lie Without Them 04/12/2025

Why Your M365 Security Fails Against Social Engineering 04/12/2025

Teams Channels Are Not Secure By Default: The Admin Lie 03/12/2025

Your "Hybrid Security" Is A Lie: Why Defender XDR Is Mandatory 03/12/2025

The M365 Attack Chain Is Not What You Think 02/12/2025

Your MFA Is Useless: The Entra ID Attack Nobody Audits 02/12/2025

The Doctrine of Distribution: Why Your Power BI Reports Require Apostolic Succession 01/12/2025

Excel Is NOT Your Database: Stop The Power Apps Lie 01/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Microsoft Fabric Explained: No Code, No Nonsense

Listen "Microsoft Fabric Explained: No Code, No Nonsense"

Episode Synopsis

More episodes of the podcast M365 Show Podcast

Positive Attitude, Share your ZARZA Attitude!

WWW. Is it obsolete or not? Should we use it?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD