Skip Ribbon Commands
Skip to main content

Data Warehouse Rejuvenation

Posted 12/18/2009 12:00 AM by Dudley Drummond-Hay

 A look at factors affecting your data warehouse rejuvenation

Data warehousing has reached a maturity level where organisations have moved from wanting a data warehouse, to having one, to wondering what to do with it and ultimately reflecting on how it got broken. Warehouse rejuvenation can take a significant investment in a data warehouse initiative and refresh and revitalise the technology to make it useable again. The rejuvenation process is similar to building a data warehouse correctly from scratch. It involves approaching the business from executive management and following the process through various layers to the technical source system.

This article will discuss all the players in a warehouse and what to consider when evaluating your existing warehouse.

Business buy-in from senior management

The common misconception is that data warehouses are IT initiatives and the business only gets involved in the user-acceptance testing.  Too often, an "us versus them" mentality characterizes the relationship between business and IT sides. Rather than being the enabler, IT becomes simply the gatekeeper to data and technology. IT must change; it must understand what drives business performance. Is it revenue growth? Driving down costs? Providing better customer service? At a minimum, IT needs to be an active listener. Ideally, IT should be a valuable contributor that is involved in business strategy sessions.

Business executives often have no idea what it costs for their analysts to manually gather information - or how vulnerable they may be because they don't know how their analysts get their numbers. Based on what he sees, an executive wonders why it takes a BI project six months to get going when his staff could give him 'answers' within an hour.

The support from senior management usually has to be earned and re-earned. It is unwise to plan a centralized enterprise data warehouse with multiple downstream data marts covering every subject area to roll out a profitability report. The project will require a huge budget, multiple resources and take years before any benefits are realized.
Rather, implement a small pilot project that addresses a problem area in one part of the business to demonstrate the need for BI. A small victory within 3 months will earn the trust and support from senior executives. All future developments from this point should stick to the same approach of small and “easy” to manage releases with immediate benefits. This strategy will allow the project to adapt to the changing business, whereas the large enterprise solution may very well find itself out of alignment with business two years down the line.

BI is one of the more complex solutions to deploy because comparatively little of the challenge is about technology. Lack of support from senior management will result in the data warehouse being another IT initiative that has no support from business, is not used by users and only lives on through the Herculean efforts of one IT champion that still thinks it is a great idea.

Usage of the Data Warehouse

First identify the users of the data warehouse and then find out if they still use it and, equally importantly, why they don’t. The main users of the data warehouse are the people in the organization who really make decisions and who have staff working for them to put together the proposals for new business initiatives. Following this are the users that need the data on a day-to-day basis to complete operational tasks.

Over time, many data warehouses become underutilized. This is usually either the result of incorrect information emanating from the warehouse, or information being generated that is out-of-date. It is often very difficult to address these problems - technically they may be easy to resolve but changing the user’s perception that the data warehouse can be relied is often the hardest challenge.

Correctness and Currency of the Data

Whether there are still users of a data warehouse or not, this question must still be asked. Incorrect or out-of-date data will inevitably result in incorrect business decisions. It is certainly commendable if users are still using the data, even though they cannot actually trust the data and typically this situation is comparatively easy to solve. Resolution typically involves technical intervention such as visiting the source systems, ETL and business requirements which can turn the warehouse around with very little investment.

If the situation exists where the users have “given up” on the warehouse, then a re-education campaign will have to be launched to attract users back to the data warehouse. Since there is a culture of mistrust in the data, a huge effort is required to rebuild that trust which may involve for example, side-by-side reports from the warehouse and source systems to prove that the data is correct. A new way of extracting/ exploiting the data will go a long way to attract users back to the data warehouse.

Architecture of the Data Warehouse

If we take a simplistic look at a data warehouse it has four major components.
 
Source Systems
The first are the source systems which provide the data for the data warehouse. It is important that the data warehouse design is not based on the source system data design and that any data extraction only impacts the source system minimally. Any data warehouse that is based on the source system will have a limited lifespan, and unlikely to adapt if the source system is replaced. In most cases a source system is an 80% fit to a business, which means if your warehouse is based on the source system it is also only an 80% fit. Correspondingly, this makes it only 80% effective.
If the data warehouse has a large impact on the source system it is unlikely that the data warehouse will be kept current as the business will not allow it to refresh the data.

ErrorScript
The data stage area is the engine room and workhorse of the data warehouse. It is a messy, loosely architected area. All the data cleansing, transforms and publishing routines are kept in this area and the data stage area must NEVER be used for reporting. Sometimes this area contains an ODS (operational data store) reflecting the source systems which is perfectly acceptable however some businesses point their reporting to the ODS. From a best practice perspective, if there is data in the source system that is needed for reporting then it makes sense to rather add this to the warehouse.
There must ideally be a clear line between a clean reporting environment and the chaotic environment of the staging area.

Data Presentation
The third component in the data warehouse is the clean, well planned data presentation area. Data fields are descriptively named, the data design is easily understood by business users, and the data is clean and conformed.

Presentation Layer
The final component is the presentation layer. This encompasses the reporting tools, web pages, dashboards, OLAP cubes and ad-hoc query engines. The Presentation Layer accesses the data warehouse only and never the staging area. It is acceptable for the presentation layer to retrieve information from outlying source systems that have not been incorporated in the data warehouse however these elements should preferably be added to the warehouse, since the business may gain significant value in terms of data conformance. Data conformance will allow business to relate disparate data subjects, and allow a level of analysis that was previously unavailable.

To the business user, an unsound architecture may not be apparent. The information may be correct and current however the maintenance, scalability, growth and adaptability of the warehouse is jeopardized. Usually the original designer/developer of this type of warehouse is the only one that can make changes and maintain the environment. This places the business in an extremely vulnerable state.

Model Design and Planning

Dimensional vs. 3NF (Third Normal Form) will always be a debate, but irrespective of your viewpoint there must be a well-planned data model that reflects the business. You can always build dimensional marts from a well planned clean 3NF data area, and you should. A well planned model is adaptable and easy to understand. It meets current requirements and can meet most future requirements in the targeted subject areas. A haphazard data model that has evolved organically over time becomes costly to maintain, and it will eventually be unable to meet new requirements.

If the answer is no to both these questions then I am afraid you don’t have a data warehouse that can be rescued. Now is the time to redesign and to do it properly.
 
Data source of the Report Elements

This comes back to the architecturally sound data warehouse. You have a well designed data model, however due to unrealistic deadlines some report elements are sourced from the ODS or staging area. Now is the time to rectify that problem. Never let undue business pressure compromise the purity of the data warehouse. It will end up costing the business much more in the long run. A good warehouse is adaptable and it does not take weeks to add new data elements. If it is an entirely new subject area or a simple dimensional attribute then it must be planned. Remember the project manager’s mantra for all changes. “There's time, resources and scope.”

If your warehouse is not adaptable enough to add some new elements in very little time then re-evaluate the previous question.

Metadata

Metadata is not something to do after the warehouse project is complete. In fact metadata definition should be the first task after addressing business requirements. The business must define up front if there is going to be a consensus on a common set of labels across disparate source systems, or will each data element be uniquely named for each source system. There is no right or wrong answer but decide on a direction and stick with it - and then  set up the metadata to reflect this decision. Any new users to the business and IT professionals picking up where the warehouse team left off will immediately understand what is contained in the data.

Dimensions Conformity

Do you have one client index? Is there only definition of the product set? Having conformed dimensions allows the user to relate previously disparate subject areas. This is the greatest power of a data warehouse. For example, if the notion of "product" is found in many places across a large enterprise, the structure of the product dimension in each of these places is likely to be similar. If you have multiple product dimensions then there is a serious problem with the design. Make sure that you enforce conformity across the common business dimensions as much as possible.
Surrogate Keys

Surrogate keys are meaningless, auto-generated numeric links between the dimension record and its associated facts/measures. Not only do these links improve performance in a data warehouse, but more importantly, they allow the storage of historical dimensional data. For example a client can have multiple records in the client dimension tracking address and marital status changes. Even if the requirements don’t need historical dimensional data, it is always advisable to design for a change in those requirements.

Data Grain

The grain of a fact table or dimension table is the atomic level of the data stored. If data is stored at its lowest atomic level, then there is no question from business that cannot be answered. For example in a retail store the lowest grain at point of sale is: What was sold? Who bought it? At what time was it bought (down to seconds)? Who sold it? From which store was it bought? How much was sold and at what price?
Store the sale at this level and you cannot ask a question about the sale that can’t be answered. Many designers store data at an aggregate level because they don’t want to deal with larger volumes, or departments want to save a few bucks on hardware. This will prove far more costly when business wants a more atomic report at a later stage.


Next month we will take a fictitious client and test them against the above principles to see if they can be rescued or not...

1st Floor Block B
Granger Bay Court,
Beach Road, V&A
Waterfront
Cape Town 8001
021 425 3430
info@abg.co.za
Microsoft Gold Partner
Microsoft Partner Programme Awards 2009 Winner - Winning Customers Initiative Award
Finalist 2011 - Microsoft Social Responsibility Partner of the Year
Find us on Facebook