In this chaper we will be looking at how we administer databases and data itself.

After we have created our databases through well-designed tables/relations, we then need to examine how to maintain and control the databases. We need to look at the concepts of data administration and database administration. Data administration is a high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards.

 

The influences of data and database administrators include:

  1. Proliferation of proprietary and open source technologies on diverse platforms
  2. Rapid growth in the size of databases with complex data types
  3. The embedding of business rules in databases in the forms of triggers, stored procedures and user-defined functions
  4. The explosion of e-business applications that link with various databases

 

The traditional roles of data administration include:

  1. Data policies, procedures and standards
  2. Planning
  3. Data conflict resolution
  4. Managing the information repository
  5. Internal marketing

 

The roles of database administration:

  1. Analyzing the database
  2. Selecting the DBMS and related tools
  3. Installing the upgrading the DBMS
  4. Tuning database performance
  5. Improving database query processing performance
  6. Managing data security, privacy and integrity
  7. Performing data backup and recovery

 

One of the problems with keeping databases current is because of the various changing business conditions. Some of the important trends in database administration are:

  1. Increased use of procedural logic in SQL
  2. Proliferation of e-business applications
  3. Increase use of smart phones

 

Data warehouses themselves have also grown thus increasing the need for data warehouse administration. The data warehouse administration include two major areas:

  1. The data warehouse administrator does many of the same roles as a data or database administrator – but with a focus on decision-making database creation.
  2. The emphasis is on integration and coordination of metadata and data across many data sources:
    1. Build and administer an environment supportive of decision support applications
    2. Build a stable architecture for data warehouses
    3. Develop service-loevel ageements with supploiers and consumers of data for the data warehouses

 

Open source software has influenced database management. An open source DBMS is free software that provides core functionality of a DBMS. Open source software has advantages and disadvantages. Some advantages include:

  1. Large pool of volunteer testers and developers create reliable, low cost software in a short period of time
  2. Availability of source code
  3. Not proprietary from one vendor
  4. Multiple versions of the same software to tailor it to your system
  5. No additional costs for licenses for distributed systems

 

Some of the disadvantages include:

  1. Not complete documentation
  2. Special and proprietary systems may not be able to be met with open source
  3. Different types of open source licenses
  4. Open source tool may not have all needed features
  5. Don’t have certification programs

 

Data security is al a priority for data and database administrators. Protecting the data from various types of intrusion – accidental and intentional- and use is critical to make sure that the data is protected. Threats to data security include creation of a data security plan. Such a data plan needs to include areas of:

  1. Accidental losses, including human error, software, and hardware-caused breaches
  2. Theft and fraud
  3. Loss of privacy or confidentiality
  4. Loss of data integritiy
  5. Low of availability

 

Additional areas for security protection include server security and network security. In a three-tiered architecture, we also need to protect our applications. Database software security features include:

  1. Restriction of views and subviews
  2. Integrity controls of domains, assertions and checks
  3. Authorization rules
  4. User-defined procedures
  5. Encryption procedures
  6. Authentication schemes
  7. Back-ups, journals (logs) and check points

 

The Sarbanes-Oxley (SOX) Act regulates the integrity of public databases. The Act addresses three primary areas of protection:

  1. IT change management
  2. Logical access to data (personnel controls, physical access controls)
  3. IT operations

 

Database backup and recovery operations are a primary mechanism to maintain good database use and availability. Recovery facilities include:

  1. Backup facilities
  2. Journalizing facilities (logs – audit trails of database changes and transactions) – includes checkpoint facilities
  3. Recovery manager

 

Recovery and restart procedures are critical when bringing a database back to a steady-state readiness. Some techniques used are:

  1. Disc mirroring
  2. Restore/rerun processes

 

Two types of recovery processes include:

  1. Backward recovery
  2. Forward recovery

 

Data transactions themselves should follow the ACID properties:

  1. Atomic
  2. Consistent
  3. Isolated
  4. Durable

 

Types of database failures and their solutions are:

  1. Aborted transactions                                      rollback recovery

Roll forward recovery

  1. Incorrect data                                                  roll back recovery

Reprocess transaction without bad data

Compensating transactions

  1. System failure                                                 switch to duplicate database

Rollback

Restart from checkpoint

  1. Database destruction                                      switch to duplicate database

Roll forward

Reprocess transactions

 

One of the other important maintenance activities for database administration includes controlling concurrent access with database transactions. One of the major problems with transaction processing includes “Lost updates”. This is where more than one database transaction is accessing the same data at the same time and one of the transactions may be lost. One of the ways to prevent this is to use “serializability” where we prevent one of the transactions from processing until the other transaction finishes. We “lock” the transaction. Locking levels include database locking (lock the whole DB), table locking, block or page locking, record locking, and field locking.

 

Types of locks include: shared locks – or read locks that allow a transaction that is just reading the data to proceed even though the other transaction is also running; exclusive locks only let one transaction run at a time. “Deadlock” is where both transactions are waiting for the other transaction to “release” a lock that is has on the data and nothing can proceed in either transaction. We can manage deadlock two ways: deadlock prevention – a method for resolving deadlocks in which user programs must lock all records they require at the beginning of a transaction (rather than one at a time); and deadlock resolution where we back out of one of the transactions and let the other continue before continuing the other transaction.

 

Versioning is an approach to concurrency control in which each transaction is restricted to a view of the database as of the time the transaction started, and when a transaction modifies a record, then the DBMS creates a new version of the record and overwrites the old record.

 

Data dictionaries and data repositories are the ways in which the DBMS system keeps track of what’s in the database itself. A data dictionary is a repository of information about a database that documents data elements of a database. A system catalog is a sysem-creted database that describes all database objects, including data dictionary information, and also includes user access information. A respository is a component that stores metadata that describe an organization’s data and data processing resources, manages he total information processing environment, and combines information about an organization’s business information and its application portfolio.

 

Database performance tuning has various parts. First installation of the DBMS is critical. Then monitoring and correcting memory and storage space usage, input/output contention, CPU usage and application tuning are all pieces of database performance.