In this chapter we will examine how we create and maintain data quality and how we integrate all types of data and data models.

Once databases are created we need to look at data management and database management. Data governance is a set of processes and procedures to manage the data within an organization that have as goals availability, integrity and compliance with regulations. Risk and security exposures are also a critical set of processes that must be developed.

The Sarbanes-Oxley Act of 2002 said that an organization must undertake actions to ensure data accuracy, timeliness and consistence. One way in which companies deal with managing their data is to create positions of a data steward – a person assigned the responsibility of ensuring that organizational applications properly support the organization’s enterprise goald for data quality. Data quality is important to:

  1. Minimize IT projecdt risk
  2. Make timely business decisions
  3. Ensure regulatory compliance
  4. Expand the customer base

Characteristics of Quality date include:

  1. Uniqueness
  2. Accuracy
  3. Consistency
  4. Completeness
  5. Timeliness
  6. Currency
  7. Conformance
  8. Referential integrity

Reasons for Deteriorated data quality:

  1. External data sources
  2. Redundant data storage and inconsistent metadata
  3. Data entry problems
  4. Lack of organizational commitment

Key Steps in a Daa Quality Program include:

  1. Get the business buy-in
  2. Conduct a data quality audit
  3. Establish a data stewardship program
  4. Improve data capture processes
  5. Apply modern data management principles and technology
  6. Apply TQM (total quality management) principles and practices

General Approach to Data Integration include:

  1. Application integration
  2. Business process integration
  3. User interaction integration

ETL – consolidation of data has the following characteristics:

  1. Detailed
  2. Historical
  3. Normalized
  4. Comprehensive
  5. Timely
  6. Quality controlled

The ETL process is where data is reconciled in two states: the initial Enterprize Data warehouse (EDW) and subsequent updates on a periodic basis.   The initial process includes:

  1. Mapping data from various sources and meta data management
  2. Extracting the relevant data
  3. Cleansing the data (misspelled data, erroneous data, missing data, duplicate data, inconsistencies, etc.
  4. Load and index – also refresh mode is filling in a data warehouse that rewrites target data at periodic intervals; update mode only changes in the source data are written in the data warehouse

Data transformation involves converting data from the format of the source operational systems to the format of the enterprise data wqrehouse. Levels include record-level functions, field-level functions.