|Data Warehouse Design
the design, development, implementation,
and operation of even a single corporate data warehouse
can be a difficult and time consuming task. In this
article, we present the primary steps to ensure
a successful data warehouse
development effort. Here, at
Horsburgh.com, we have used this
approach successfully on our client's
data warehouse and data mart
The definition and recording of the problem to be solved is one
of the most often overlooked step of any development effort. A
problem needs to be solved, so the tendency is to jump right in
a solve it. For small, negligible cost efforts this is fine. For
Data Warehousing design, ignoring this step can lead to disaster.
Write down and widely publish the answers to the following questions
and all other questions that are appropriate for your specific
effort. Remember to keep the questions targeted to DEFINING the
problem NOT solving it.
Do I Need a Data Warehouse?
This is an obvious question, but should be taken seriously. For
some businesses the answer is an easy yes, but for others, there
may be better solutions. It is wise to seek professional advice
when answering this question. Having an outside professional examine
the question may cost some money up front, but they are far less
costly early on in the development.
What specific Problems will it solve?
Write down the four, five, ten, whatever, number of problems that
having a data warehouse will solve. The problems should be clearly
stated, be very specific, and have testable criteria for success.
Make sure you publicize these problems and get user and management
What are my available resources (time, money, and personnel)?
Knowing what your actual resources are at the beginning is critical
for defining the development path. If your budget is low, consider
down-scaling the effort. If time is short, consider using off-the-shelf
products extensively. If your personnel resources are thin, consider
outsourcing. Being realistic about your actual resources will
help you prevent overruns and project disappointments. Promising
a full-blown data warehouse when you only have resources for a
database, will always doom a project. Also, don't be afraid to
tell upper management that the resources are too small for solving
the problem. Believe me, they would rather know up front than
get a surprise during deployment.
What criteria will I use to measure success?
This is an often overlooked step in the problem definition. For
every problem stated, you must define a means for determining
the success of the solution. If you can't think of a success criteria,
then the problem is not defined specifically enough. Stay away
from problem statements such as "The data warehouse must
hold all our accounting data." Restate the problem in quantifiable
terms, like "The data warehouse must handle the current
20GB of accounting data including all metadata and replicated
data with an expected 20% growth per year."
Should I outsource all, some, or none of the development and
If you have in-house personnel that are under-utilized or have
time to be assigned to the development process, then keeping most
of the development in-house makes sense. If not, then you can
either hire additional staff or outsource some or most of the
development. I recommend that you do not outsource all of the
development. You must have some in-house expertise available or
at least strong upper management support. Otherwise you may end
up with a very nice system that does not solve your problems.
Strategic outsourcing makes sense in most medium to large data
warehouse projects. The outsourcing contractor can supply the needed
expertise and personnel at the various development phases. And
when a particular phase is finished, you are not left with a staff
member looking for something to do. You will probably find
the up-front costs of an outsourcing firm to be higher than hiring
in-house personnel. But the long-term savings will be far greater
with a professional outsourcing firm than by retaining in-house personnel.
Remember to make sure you feel comfortable with the outsourcer's
style and abilities. You will working with them very closely.
Don't just choose the largest or best-known source. How you and
your outsourcer "mesh" is far more important than their
list of clients.
Am I upgrading an existing system, converting from a legacy
system, or developing from scratch?
Developing a data warehouse from scratch, as strange as it sounds,
is by far the easiest. If you are in this situation, count your
blessings. If not, upgrading existing databases or converting
from one or more legacy systems will be your lot. Fortunately,
you will have a long list of "things that don't work right"
to begin with. Make sure that you fully understand what systems
will still be in place after the migration and how they will be
integrated into your intranet. If your budget is low, then consider
using middleware and "web-like" products to layer on
top of the existing system. With a more moderate budget, you can
replace inefficient systems with newer and more powerful ones.
Remember that computer hardware, memory, and disk space are cheap.
It's the software and operations that are expensive. Powerful hardware
can make even today's bloated software work faster. With a higher
budget, consider migrating legacy databases to a single powerful
commercial database. If you are not sure what the "latest
and greatest" data warehousing products are, hire a professional
data warehousing consultant. Their fee will be well worth it.
Performing a requirements analysis is critical to the success
of any project. Without a clear goal in mind, success is dubious.
There are a number of different philosophies about requirements
analysis: top down, bottom up, inside out, etc. The method I have
found to work the best is as follows:
- Clearly state the problem(s) you wish to solve.
- Identify all data sources and formats.
- Identify the users of the completed system.
- Formulate a specific budget - time, money, personnel.
- Ask identified users to specifically state what they expect
the system to do.
- Ask management to specifically state their success criteria
- Separate their requirements from their "desirements."
Only design to requirements. The enhancement phase is where you
address the "desirements."
- Group and "bubble-up" requirements.
- Generate a prioritized requirements table listing the requirement,
where it came from, the success criteria, and priority. Keep this
table high-level. A table with a dozen requirements will be much
easier to manage than one with hundreds.
- Produce a detailed development schedule including hardware,
software, personnel, documentation, and reviews. Include outsourcing
requirements and long lead-time items.
- Get a sign-off of the requirements, resource allocation, and
schedule from top management before you go any further.
Note that items 5 & 6 will be asked throughout the development
cycle since their responses will change when they see prototypes
and when they are being trained. Be sure to update items 7 through
10 each time.
Beware of getting caught in the cycle of
It's always best to ask very specific questions. Don't worry if
their responses change each time you ask the question. It will
happen, so plan for it.
You: What are your requirements?
Them: I don't know, what can you do?
Information and Data modeling, along with the definition of the
metadata, is the single most important activity in the design of
a data warehouse. If this step is done correctly, success is almost
ensured. If not, then areas such as flexibility, scalability,
and usability will suffer. Information modeling is best left to
the experts. I recommend that you hire a professional information
modeling firm or consultant. The consultant should be brought
in at the tail end of the requirements analysis and be kept though
the design and prototyping and then be called in as necessary
during the development and documentation phases. Don't skimp on
this phase or you will regret it.
Design & Prototyping
There are many design methodologies. The ones I've used
most successfully are 1. Rapid Prototyping (for small to medium
projects) and 2. Structured Development (for large or very complex
The rapid prototyping method is illustrated in figure X. There
are 5 keys to a successful rapid prototyping methodology:
- Assemble a small, very bright team of database programmers,
hardware technicians, designers, quality assurance technicians,
documentation and decision support specialists, and a single manager.
- Define and involve a small "focus group" consisting
of users (both novice and experienced) and managers (both line
and upper). These are the people who will provide the feedback
necessary to drive the prototyping cycle. Listen to them.
- Generate a user's manual and user interface first. You will
be amazed at what you will find out by producing a user's manual
- Use tools specifically designed for rapid prototyping. Stay
away from C, C++, COBOL, SQL, etc. Instead use the visual development
tools included with the database.
- Remember a prototype is NOT the final application. Prototypes
are meant to be copied into production models. Once the prototypes
are successful, then begin the development processing using development
tools, such as C, C++, Java, SQL, etc.
When a project has more than 10 people involved or when multiple
companies are performing the development, a more structured development
management approach is required. Note that rapid prototyping can
be a subset of the structured development approach. This approach
applies a more disciplined approach to the data warehouse development.
Documentation requirements are larger, quality control is critical,
and the number of reviews increases. While some parts may seem
like overkill at the time, they can save a project from problems,
especially late in the development cycle. For more information
about how the structured development approach works and for detailed
technical and management information, you can contact me via e-mail.
(I've written a handbook on the topic :-)
Development & Documentation
Once the requirements analysis is well underway, the prototypes
are working, and the focus groups are becoming happy, it's time
to begin the development. Coordinating hardware and software purchases
and upgrades, server and hardware installation, software and database
development, documentation guides and manuals, reviews, and testing
can become a full-time job. The key to keeping a handle on all
of this to maintain a good written schedule that everyone can
view and to have periodic "all-hands" reviews. Remember
that working with vendors can be a frustrating experience. Hardware
incompatibilities, data format incompatibilities, software bugs,
late deliveries, etc. are more the norm than the exception. Outsourcing
can help, but you must be continually involved to ensure success.
Test & Review
Testing and Reviews take place throughout the development cycle,
including prototyping, development, deployment, operations, and
enhancements. It never ends. It's wise to place a single individual
in charge of testing and reviews. This is not a popular job, but
it is critical for developing a system that works and meets each
of the requirements. Be sure to empower this person (usually a
quality assurance engineer) with the appropriate authority. Also,
provide them with an appropriately sized staff. Testing is time
consuming, tedious work and preparing for reviews and analyzing
results can take much longer than you might think. Fortunately
this person can save you from being surprised at budget review
time and usually catches most problems before they become too
big. If you outsource this task, make sure that you make it clear
to the others on the team what the outsourcer's role is and what level of
authority they have.
Deployment & Training
OK, the development is complete, quality assurance is satisfied,
the documentation is ready, and all the "off-the-shelf"
products have arrived. Now it's time to put everything together.
This can be a highly disruptive time. Make sure that you have
full management support and that they understand the nature and
effect of the installation and deployment disruption. Scheduling
training sessions concurrently with the installation can be an
effective use of time. Don't skimp on the training. Make sure
you have training in the budget from the beginning and don't dip
into it. The best way to ensure success is to effectively train
the users so that they will actually use the system and possibly
sing its praises. Also remember that training is ongoing. New
employees or employees being moved or promoted will need to be
trained. Each time enhancements are added, new training sessions
must be scheduled.
Data warehouses usually contain two or more servers. Tasks such
as backups, bug fixes, software updates, hardware maintenance
and upgrades, media services, account maintenance, security patches,
and other similar tasks must be performed regularly. Operation
and maintenance of such services requires an operations staff.
It is not enough to "let the users take care of it."
If you are providing these services in-house then you will need
on-site support from either an outsourcing agency or in-house
staff. The current trend is to outsource most of these services.
More and more companies are outsourcing the entire data warehouse
and access it via the Internet or private network. Outsourcing
can result in a substantial savings. Just make sure that your
provider can supply the services you require and is available
when you need them. Also, be sure to discuss security requirements
with them before you hire them.
There is always one thing you can count on: "Requirements
Creep." The more successful the data warehouse, the faster
requirements creep will occur. As your users become more sophisticated
they will want more and more capabilities. If you can respond
quickly and efficiently, your users will again sing your praises
(and upper management will definitely take notice :-). Make sure
that you have designed in the ability to add features from the
very beginning. Remember to design in scalability and flexibility
at all phases of development.
You might think that good manuals and good training would be sufficient
to effectively use your data warehouse. Not so. A knowledgeable,
available, responsive help desk is critical to the overall success
of the project. Users will always find new uses for a well-designed
system and problems will inevitably occur. Without a help desk,
a data warehouse can become dated and under-utilized. In my experience,
deployment of an excellent help desk (with telephone, fax, online,
and e-mail capabilities) is the single most important function
that ensures the continued success of a data warehouse.