In this chapter we will look at the concepts of data warehousing, data marts and data mining.
As companies have created both flat files and databases over the years, there have been many instances where data hasn’t been analyzed, created and stored in a cohesive manner. Many companies have created databases without using the E/R and normalization process. Data from flat files have been simply moved into a database format with no redesign. Thus, much of the data stored in current databases does not give us the results we expect. One way to deal with this data problem is to create data warehouses.
DATA WAREHOUSE – a data warehouse is a subject-oriented, integrated, time-variant, nonupdatable collection of data used in support of management decision-making.
The need for data warehouses include: 1) a company requires an integrated, company-wide view of high-quality information, and 2) information systems must separate informational from operational systems to improve performance in managing company data
PROCESS
DATA MARTS
These may be either logical of physical. Data marts may be certain data that pertain to a particular area (customers, patients, etc.) pulled from a data warehouse. Logical data marts are views of the main data warehouse.
CHARACTERISTICS OF DATA
SCHEMAS
Star schema is a simple database design in which dimensional data are separated from fact or event data – also called a dimensional model.
Two types of tables are in a star schema – Fact tables contain factual or quantitative data that are numerical, continuously valued, and addivite; dimensional tables hold descriptive data
Essential Rules of Dimensional Modeling:
OLAP Tools (Online analytical processing)is the use of a set of query and reporting tools that provides users with multimensional views of their data. Types of activities include Slicing a Cube, Drilling Down, and Summarizing more than 3 dimensions.
Data Mining allows users to look for patterns and trends in data. It is referred to as knowledge discovery. The goals of data mining are:
Data mining techniques include: regression, decision tree induction, clustering and signal processing, affinity, sequence association, case-based reasoning, rule discovery, fractals and neural nets.