Many organizations struggle to find appropriate ways to store their data – data warehouses, data marts, data lakes, and operational data stores are some of their options. Data warehouses and data marts are different in terms of size, range, and number of data sources.
A data mart is usually limited to holding data for one purpose, such as serving a single department of a company, whereas a data warehouse can process and transform data sets from many different sources. Organizations wanting to make data-driven decisions need to know whether to use a data mart or a data warehouse to analyze and report on data they collect. Here are some of the differences between a data warehouse and a data mart.
The data warehouse
The data warehouse is an enterprise-wide repository of information from many different data sources from different areas of the organization. Its size is often in the terabyte range for big organizations. It enables strategic decision-making that affects the whole organization. The cost of a data warehouse is often more than $100,000 but cloud solutions can dramatically lower the cost.
Characteristics: Bill Inmon first defined the subject-oriented, integrated, time-variant, and non-volatile characteristics of the data warehouse.
Data is organized around subjects such as products, customers, sales etc and integrated from various operational sources in a variety of formats.
An Operational Data Store (ODS) is a way of dealing with the disadvantages of data warehouses in that they do not contain up-to-date data. An ODS deals with current operational data and offers the most recent state of data elements.
The data warehouse represents the state of data elements at specific moments in time. Time-variant refers to the fact that the data warehouse stores periodic snapshots.
Non-volatile refers to the fact that it is not frequently updated or deleted over time, unlike the ODS which constantly overwrites data. Data loading and data retrieval are its two most important operations.
A full-blown data warehouse is often called an EDW (enterprise data warehouse) as this emphasizes its enterprise-wide aspect. Complex analyses that require a high volume of historical and aggregated data are conducted in the data warehouse.
Centralized Data Warehouse Use Cases
- A company considering expansion: The data warehouse aggregates data from sales, marketing, the supply chain, store management and customer loyalty. The data obtained from all these different data sources gives decision-makers a more holistic view and enables them to make informed decisions when considering expansion.
- An insurance company: Using the data warehouse for information about all the areas that drive profitability can help the company to report on profits. The data warehouse combines information from the claims department, customer demographics, sales, and other areas.
The data mart
When data volumes increase and analytics use cases go up, organizations are unable to serve every use case without affecting the performance of the data warehouse. They may then export a subset of data to the data mart for analysis so the data mart is basically a scaled down version of the data warehouse.
A data mart typically contains less than 100GB, is less expensive to use for storage and enables faster analysis due to its small, specialized design. It typically costs from about $10,000 upwards.
Data mart benefits
A data mart is often a repository of information pertinent to a particular use case or business unit (finances, marketing, human resources, logistics etc). It enables tactical decision-making for a small group of end-users.
One of the benefits of using data marts is that they provide focused content in a format tailored to suit the specific user group. They can be located nearer to end-users and alleviate network traffic, thus offering more control.
Another way to boost business productivity is by using an online file server that offers safe and secure online file sharing.
The data mart approach is useful for reporting and marketing analysis where enterprise-wide data is not necessary. For example, a financial manager could use a finance data mart for financial reporting.
Bill Inmon and Ralph Kimball are two data pioneers who took different approaches to organizational architecture and the relationship between the data mart and data warehouse.
Bill Inmon’s approach was to first build the data warehouse as the main repository of all enterprise data. Data marts would be created after this to serve specific needs. An advantage of this approach is that the data warehouse acts as a single source of truth for the whole organization.
Ralph Kimball’s approach was to initially create separate data marts holding aggregate data of the most important business processes and to merge them as a data warehouse later on.
Are data marts still relevant in cloud architecture?
On-premise data warehouses cost much more to implement than building data marts. They also take a long time to build. However, cloud-based data warehouse services offer many advantages. They are quick to set up and relatively inexpensive to run, supporting an unlimited amount of data and number of users.
With cloud-based data warehouse services it is easy to use more compute resources to address new use cases without affecting database operations. This makes it unnecessary to use separate physical data marts to keep performance at an acceptable level. There is no need to follow Ralph Kimball’s approach of starting small by using data marts and then integrating them at a later stage to form a data warehouse.
A final word
All data structures serve different purposes and it is important to be aware of their differences to make the right investment decisions.
A data mart is a scaled-down version of a data warehouse containing a subset of data. It enables tactical decision-making for a specific group of users. A data warehouse is an enterprise-wide repository of data from various sources that enable strategic decision-making that affects the whole organization.
As cloud-based data warehouses provide a cost-effective and scalable way for businesses to build centralized data warehouses, more organizations of all sizes can take advantage of using them.