ICAEW.com works better with JavaScript enabled.

Microsoft Fabric Q&A

Author: Rishi Sapra

Published: 25 Mar 2024

Following on from our webinar “Microsoft Fabric: What is it and why should you care?”, Rishi Sapra explores some of the questions raised by those who attended the live webinar.
  • This webinar, and others from the Data Analytics Community are available to watch on-demand.

What exactly is Fabric? Is it a database, an application, an API tool, or something else?

Microsoft Fabric is an all-encompassing online platform for data analytics. Many of the Microsoft data tools that were typically only available in Azure (such as SQL Datawarehouse and Spark notebooks) have now been brought into the same "Software as a Service" environment as the online Power BI service which reports are published to. Power BI is now just one of the product experiences in Microsoft Fabric along with more Enterprise level capabilities and tools to effectively clean, shape and manage data from source systems, and build reporting, all within one place. So, Microsoft Fabric is all of these things - data stored within the Fabric environment (in OneLake) can have a data warehouse on top of it, can be accessed via APIs, can be visualised with Power BI, and queried with any Microsoft copilot! As its underlying data format, Microsoft have standardised on what is known as a lakehouse - files stored in parquet format with a transaction log that gives it "data warehousing" style capabilities such as rollback and transactional integrity. This format is known as Delta and is an open-source data format that is common across different clouds and vendors (for example it is also used in Databricks which can be hosted on any cloud).

Does fabric do away with the Power BI application? Is Fabric accessed completely on-line?

No Power BI is still very much thriving as a leading Business Intelligence tool, both as a standalone application (including Power BI Desktop) and as a core part of the wider Microsoft Fabric ecosystem. For those that wish to continue using Power BI, the core functionality remains unchanged. The difference now is that Fabric brings more seamless connectivity with enterprise lakehouse architecture – in other words, a new, robust route to bring data into the application. It’s likely that some Power BI capabilities, for example currently Power BI Copilot, will only be supported when Power BI is used as part of the Fabric platform. So, whilst there is nothing stopping you from continuing to use Power BI as a standalone application, using it as part of Fabric will allow you to leverage its full capabilities. Fabric itself is completely online, including the Power BI elements; Data modelling and report building can already be done online within Fabric/the Power BI Service as an alternative to using Power BI Desktop.

What is the sort of size of organisation Fabric would be optimal for? If I work for a small company/have small data volumes what is the real benefit of Fabric compared to Excel or Power BI?

In the webinar I talked about challenges of data complexity when working with financial data. This arises because of the transactional nature of General Ledger/ERP data (not an ideal format for analytics!) and the need to combine it with other sources. In larger organisations this is often done across multiple teams (as per the pyramid structure I showed) with the data extraction and transformation typically done within the IT team in platforms such as Azure. In smaller organisations, it may all be done within one team/by fewer individuals, but the same challenges of data complexity still exist. While smaller data volumes don’t require the performance benefits of Fabric, the biggest benefit is likely to be the flexibility of working in a platform that supports multiple code-first and low code experiences (including Power Query, SQL, and Python) to suit different skillsets/process requirements, and the ability to integrate with a wide range of sources with varying complexity. It is important for smaller companies to avoid taking on too much "technical debt" by building manual processes in Excel, utilising limited “legacy” automation solutions such as VBA or building siloed /non-optimised Power BI semantic models. If (or when) the business grows, these all become harder to maintain and create additional strain on the finance team as the data volumes/complexity also grows, along with increased reporting requirements.

Is OneLake different from Fabric and if so, do you need OneLake as an additional tool to make it work?

OneLake is automatically included as part of MS Fabric and is the single data lake storage layer for all product experiences (Data Factory, Data Engineering, Data Science, Data Warehouse, Data Activator, Real-time Analytics and Power BI). There are multiple ways to get data into OneLake (i.e., into the Fabric environment) - you can use Power Query Dataflows to connect to over 200 data sources and transform data using the same low/no-code interface as found in Power BI and Excel, you can trigger pipelines, write notebooks in Python or SQL that connect to data or even upload/synchronise files directly from your machine! Once it is in OneLake, stored in a compressed file format known as delta (parquet), the data can be queried and accessed using any of the Fabric services without the need to copy data between them.

How much does Fabric cost? Is it true that it starts at £5k / $5k month?

The pricing of Fabric (excluding Power BI Pro or Premium) is based on storage and processing capacity. This is charged on an hourly basis (it can be stopped and started as required) with the lowest capacity units costing just $0.36/hour – even if that was running 24/7 it would cost only $263/month. This can flex according to peaks and troughs in processing requirements (and Microsoft allows a level of smoothing across a day). Power BI licencing within Fabric remains unchanged - to create, edit and share Power BI items you need a Pro ($10/month) or Premium ($20/month) per user licence. Consuming Power BI items also requires a per user licence at lower levels. It gets a little complex when considering Azure and Power BI SKUs – a bit more detail on this can be found here – the starting list price for a Power BI P SKU is $5k/month, however, you would only really use this to leverage an existing investment in Power BI capacity, to use Fabric workloads at no additional cost.

You can find more information on licenses here and pricing options here.

Does Fabric integrate with accounting systems such as Dynamics/Business Central, SAP, QuickBooks, and Xero? How difficult would it be to implement Fabric on top of something like SAP B1 with a SQL database?

Dataflows in Fabric have almost all the same connectors as Power BI including SAP BW/HANA and Dynamics (Dataverse and also D365 Business Central/Customer Insights). Dataflows - with or without Fabric - have the SQL Server Database connector, which can be used for SAP B1, and other sources built with this technology. There isn't an out of the box connector for QuickBooks (though PBI Desktop has one for QuickBooks Online) or Xero but these can either be accessed through APIs - there's a web/API connector in Dataflows and pipelines or notebook code can also access APIs. Alternatively, there are third party tools such as Acterys which have pre-built data models and reports on top of common accounting systems. There's often a lot of financial reporting logic that needs to be built on top of this data (e.g., Chart of Account hierarchies/roll-ups, consolidation, adjustments, and calculation logic) - the extent of this determines the effort required to put Fabric on top of these systems for analytics so it's difficult to estimate what would be involved without delving into specifics!

What would you say the key/crucial roles are when it comes to a successful MS Fabric set-up + deployment? Does there need to be a dedicated Data team/in-house engineer, or can a finance team do it themselves?

As I referred to in the webinar, enablement of a decentralised operating model - e.g., a data mesh - is one of the key benefits of the platform. This is where teams such as finance who know their own data best can work directly on the full end-to-end analytics process rather than being reliant on IT teams (data engineers) to prepare and serve curated data to them. One of the main possible barriers to implementing this is likely to be skills - even though there are low-code options such as Dataflows available in Fabric, it still potentially requires a level of technical familiarity that many accountants might not have had sufficient time to learn. For larger organisations, implementing Data Mesh requires a perhaps significant shift in the culture and ways of working to have true data products. New roles such as Data Product owners, Domain Data Product Developers, and Self-Serve Data Platform (Fabric) Product Owners are needed to make the principles of Data mesh work, and finding or training people with the right blend of skills to be successful in these roles is challenging! From a technical perspective, however, the ability to bring in data from multiple domains/data products into a single place for reporting and self-service analytics is one of the key capabilities enabled through OneLake shortcuts.

How long would you expect it to take a medium / large company with a couple of ERPs to get the data into the lake, so it is ready for analytics? 1 month, 3 months or 6 months assuming a couple of qualified data engineers assigned?

This completely depends on the ERPs, the type of data that comes out of them and the requirements for what you need the data for. Rather than thinking about bringing all of your finance data into Fabric in one go, it is always best to start with the requirements for specific use cases - identify the business questions that need answering/the types of insights that need to be drawn out, work backwards to design a report/semantic data model to answer these, and then identify and source the data specifically for these scenarios. Creating the focused pipelines and processes (e.g., notebooks/dataflows) to clean and shape the ERP data can then be done in Fabric quickly. Unlike with the Azure environment, there is no need to set up any infrastructure or platform components to join them together. Items can typically be created and running in hours and days rather than weeks and months!

One challenge facing global regulated companies is differing regulations around sensitive data in the cloud e.g., China, India, Korea, so it needs a hybrid architecture (Cloud+On-prem) - any tips on how to architect this with Fabric to maximise the benefits but maintain compliance? Is it GDPR compliant?

The multi-geo capability of Microsoft Fabric is where you can have separate capacities in different regions, with corresponding workspaces/content assigned to those capacities, and the data in those workspaces is contained within that region. Azure in China is operated by a different entity (21Vianet) and there are differences with functionality available there. A hybrid on-prem and cloud infrastructure model has always been possible in Power BI with the use of on-premise data gateways (or V-net gateways for cloud data behind a corporate firewall) - these carry over into Fabric though are currently only supported when connecting to data sources using Dataflows Gen 2. Yes, Azure is compliant with GDPR and many other country/industry regulations - see the Microsoft Trust Centre.

I use Power Query (through Excel) to develop models for use by our business. The data involved is not large by any means and sourced from OneDrive. It is proving to be very slow - would having Fabric improve performance? And/or would it reduce the risk of incorrect code bringing back incomplete or duplicated data?

Yes, running the same Power Query in Dataflows Gen 2 in Fabric would likely improve performance compared to sourcing data from OneDrive into Excel Power Query. Not only is the Power Query engine running in the Microsoft data centres rather than on your PC, but there are also more options to improve performance of Dataflows (such as splitting them into multiple Dataflows or staged queries). Even without Microsoft Fabric, you could explore running the Power Query code as a standard Dataflow in the Power BI service and you should see some improvement in performance. However, using files as a data source is typically considerably slower than connecting directly to a system/database. In terms of reducing the risk of incomplete or duplicated data, these data quality checks (and potential fixes) can be brought into Power Query, or you can use other tools in Fabric for this. When writing back from a Dataflow Gen 2 into a lakehouse or warehouse object in Fabric, you are able to choose whether to append or replace data and a common pattern is to load all your data "raw" into one lakehouse, and then use Dataflows/notebooks/SQL to connect to this and clean the data by removing duplicates/dealing with empty values etc, loading cleaned data into a separate lakehouse.

Is Copilot only available as part of Microsoft Fabric or can it be used in a standalone instance of Power BI? Can I use copilot to ask questions of my data from any tool or only from within Fabric (and what are the risks with this with AI being known to "hallucinate"?)

Currently Power BI Copilot is only available in a paid Fabric (or Power BI Premium with Fabric enabled) capacity which is a F64/P1 SKU or above. The seamless integration of Fabric/OneLake data across every Microsoft Copilot is still evolving though OneLake data can be brought straight into Excel (which Excel Copilot can work on), and it is possible to bring Fabric data into Copilot Studio (previously Power Virtual Agents) using Link to Fabric. In his announcement of Microsoft Fabric, Satya Nadella called Fabric the "Data platform for the generation of AI" and so the vision is absolutely for Fabric to be the central source for data that Copilot can sit directly on top of. Of course, the data needs to be cleaned, combined, and enriched with appropriate business logic/context in order for Copilot to be able to understand it and so the metadata you build into your items/semantic models (e.g., descriptions and code comments) are a vital part of enabling this. I would think Copilot on top of your own data has less risk of hallucination compared to using the whole public internet as a data source, but the best way to reduce this risk is to provide as much context in the prompts and metadata as possible. However, a human review and fact check is always going to be required with any generative AI solution, at least for now!

What is the compatibility of Microsoft Fabric with other tools and platforms? Is the reporting interface only Power BI or can we use other tools such as Tableau? Is it supportive of mac and can you apply code other than Python and SQL (e.g., C or Julia)?

One of the advantages with Power BI Premium (a capacity or PPU-Power BI Premium Per User- licence) is that the semantic models are available to be programmatically accessed/queried and accessible to other tools such as Tableau via the XMLA Endpoint. This XMLA endpoint capability is available by default with Microsoft Fabric (even on low end SKUs). This allows other reporting interfaces to use the data, relationships and logic which is part of these semantic models. In addition, the data in any lakehouse in Fabric is automatically available via a SQL endpoint so any application can connect to it as a SQL database and use standard T-SQL queries to return specific views/data. (This connection can also be protected with Row/Object Level security and dynamic data masking!). The data in a lakehouse can be directly queried and transformed using Python, SQL, R, Scala, or HTML in notebooks. The data can also be accessed via standard Azure Data Lake Gen 2 endpoints and therefore can be read into/written back from an Azure function which supports c# and Java code amongst others. There are no current public plans to make Power BI desktop available on Mac but one of the reasons for bringing Power BI report/model creation and editing capabilities into the Power BI/Fabric service is that these features are then available via any supported web browser – including from a Mac!