July 25, 2008
Search Minimize

Print  


Articles & VideosPerformance Tuning Data Models    

Average Rating: 
Whole StarWhole StarWhole StarWhole StarHalf Star
Total number of ratings: 3

Subtypes and Performance Tuning Minimize

Performance Tuning Data Models

by Walter Howard
19 October 2005

Introduction

If you’ve made it alive through the technology downturn over the past few years, chances are you’ve had to do more for less.  One of the things I’ve found myself doing more often than before is performance tuning my physical data models instead of handing them off to a DBA team for physical design.  Sure all of the “fox in the hen house” analogies apply, but coming from the logical modeling side of the house, I tend to be more conservative in making tuning changes due to the inevitable loss of data integrity.  Nevertheless, most shops today are still willing to trade off data integrity for performance gains.  As a result, I’ve compiled a list of some data model performance-tuning candidates for consideration when you start your physical data model tuning process.  If you still handoff your logical model to another team for physical design, hopefully, this article will help you (and your DBA teammate) make intelligent decisions.

Now it just makes sense that this article is primarily applicable for OLTP (online transaction processing) data models and not OLAP (online analytical processing) data models.  This is primarily due to two facts; all warehouses that I’ve worked on are not responsible for upholding the enterprise business rules and dimensional models are designed for easy read access, not normalized data.

The other ground rule in this article is that the logical data model always reflects the business rules.  Therefore when we discuss tuning we are really discussing de-normalization or derivations of the physical data model.  

In this article, I’ll discuss Super and Sub type performance considerations.

Where to Start?

image002.gifWell the data model, of course.  More specifically, the normalized logical data model.  This should be the starting point for all your model performance enhancements.  Why?  Because the normalized logical data model leaves the data in a consistent state.  If all performance goals are met with the normalized logical model, why make performance enhancements? 

Ease of query you say?  Certainly some queries on normalized models can be complex.  But as you will see later articles, making reads easier usually makes inserts, updates, and deletes harder.  Furthermore, if you design your code for reuse, you only need to write the SQL once.

For the sake of brevity of this article, the rules of normalization will not be covered.  If you’d like to see some good articles on normalization I suggest reading up on Tom Johnston’s many articles dealing with the topic.

A Few Prerequisites to Performance Tuning

Before we start tuning the physical data model, there are a few things you should first consider.  

First, understand that the data model may not be the source of your performance problems.  Performance problems can be difficult to locate and identify.  Some potential performance bottlenecks are:

  • The application – The source code may be written inefficiently causing bad performance. 
  • The network – Many of today’s applications utilize overworked LAN’s, WAN’s, or even worse, the Internet with its many inherent problems. 
  • The Database – The complexity of some of today’s leading relational database products is astounding.  While brushing up on some Oracle tuning procedures I referenced Oracles 9i Performance Tuning manual weighing in at a whopping 810 pages.

So before you start deriving and de-normalizing the data model, make sure the other parts of the puzzle are put together properly.

Noted modeling expert Graeme Simsion recently stated in a column;

“It’s much easier to build performance into the design than to try to build it on.”1

Which brings me to my second point, understand your performance requirements.  Performance requirements are just as important and valid as other business requirements.  The earlier you identify the performance requirements, the earlier you can make design decisions to improve your performance.  If you don’t have a documented performance requirement you can’t measure your success at the end of the tuning process. 

Third, make sure you have a defined testing methodology.  Your testing methodology should be repeatable, defined, managed, and finally optimized.  Sound familiar?  It should, it’s straight from CMM. 

Don’t Make Your Problem Someone Else’s

If I remove a business rule from my database and put the responsibility for that business rule in the application, have I improved my performance? 

If you have narrow-minded scope, you may be inclined to answer yes to this question.  However, if your scope includes the application, as it should, then the answer may be no.  Simply moving business rule responsibility from one application group to another may seem like a good performance tune, but you maybe making your problem someone else’s. 

Even worse, you maybe removing business rules that belong in the database and thereby inversely impacting your data integrity.   

As the data architect, always keep in mind, you must have enterprise scope when designing your databases.  It is not reasonable to expect the data requirements for an application won’t be needed by many other applications throughout the enterprise.  It is for this important reason that I strive to keep my data structures application independent.

Performance Tune #1 – Sub Types

Sub types are the perfect way to show attribute optionality or relationship rules that vary by entity occurrence.  That is, if I have an attribute or relationship that is mandatory for a subset of entity occurrences, I can reflect those business rules using a Super/Sub type data structure.  Take the Employee model fragment below for example.   For all Employees, I store the Employee name and address.  I also store the Employee Type Code, known as a classifying attribute or discriminator, which tells me the Employee type (Full Time or Temporary).  For Full Time Employees, I store the Hire Date, Job Grade, Salary Amount, and SSN.  For Temporary Employee’s, I store the Contract Identifier, Hourly Pay Rate, and Tax Identifier.  

Sub types allow me to show the attribute (and relationship) optionality by entity occurrence.  I can clearly show that Contract Identifier, Hourly Pay Rate, and Tax Identifier, are only applicable for Temporary Employee’s and furthermore, that these attributes are mandatory (when displayed in the physical model).  If I lump all of these attributes into one entity, I have to make all of the sub type attributes optional.

 

Figure 1 – Employee Super/Sub Type

image006.gif
 


There are three physical implementations for this logical model.

1.      Create a table for the super type and each sub type entity

2.      Roll up the sub type attributes into the super type entity and create one table

3.      Roll down the super type attributes into each sub type and create a table for each sub type entity

Not surprisingly, each solution has advantages and disadvantages.

Option 1 - Create a Table for the Super Type and all Sub Type Entities

This option reflects the logical data structure.  In the figure 1 example, you would create an Employee super type table as well as a Full Time and Temporary sub type table.  As with all super/sub type relationships, the cardinality is one-to-one.  

Here are the advantages of this approach:

Column optionality maintained I can enforce the optionality of the columns by using NOT NULL column constraints.
Only applicable columns are stored
If the Employee is a Full Time Employee, the Temporary Employee table is not populated.
Table scans faster for super type table
This is due to the reduced row length for the Super type table.
Physical data model reflects the true business rule This advantage is never given is just due.  Many modelers are too lazy to maintain the Logical and Physical views of a data model. 
If there are relationships that only apply to one of the sub types of Employee, they can be accurately and easily enforced
Adding a foreign key from (or to) the subtype is all that is necessary
Data integrity can be fully met within the database
This does require the creation of some simple triggers, but the importance of data integrity cannot be overstated—wrong data is still wrong, no matter how fast it is retrieved.

And here are the disadvantages:

Two inserts always required
Assuming a fully enumerated super type, at least one sub type table will always be required for each super type row.
Possible multiple updates If the set of columns to be updated includes super and sub type tables, then two updates (or more if the sub types are not mutually exclusive) are required.
Read from sub and super type table requires a join or a view
If the query requires columns from both super and sub type tables, then a join is required. (Unless a view is created).
This implementation requires one trigger per table to enforce - one in each of the sub type tables to validate the discriminator, plus one in the super type table to handle updated discriminator values. This is code that needs to be maintained as part of the database
Full table scan with super and sub type tables is slow due to join This is the worst performing case scenario; a full table scan requiring a join between the super and sub type tables.

 Option 2 - Roll Up the Sub Type Columns into the Super Type

 This is a popular option implemented for obvious reasons; there are less database objects to maintain and the queries are simple.  In this option, you take all the columns from each of your sub type tables and “roll up” or move them to the super type table.  Figure 2 shows the result of Option 2 from our previous example. 

Figure 2 – Rolled up sub types

image008.gif

As I’ve previously noted, I am now forced to make all of my sub type columns optional (or nullable).  Full Time Employee columns that were NOT NULL are now NULL since I combined Full Time and Temporary Employees in the same table.  I have to use another means (e.g. check constraint, application code) to enforce the column optionality.  Certainly a check constraint can enforce the business rules, but these cannot be shown in a data model.  I’ve lost the ability to reflect some business rules in the physical model.  

Here are the advantages of this approach:

Only a single SQL statement is needed for read, update, and delete actions
In a high transaction environment, this can pay big dividends.
Good performance on full table scans with super and sub type columns
The table row length is the longest of all solutions, but no join is required.
Less database objects to maintain, simpler SQL
Not a database performance issue per se, but worthwhile to mention.

 And here are the disadvantages:

Column optionality lost using NOT NULL constraint
As mentioned, I need to use a database check constraint or program code to enforce column optionality.
Poor performance for full table scans on super type columns Since I added all of the sub type columns to the super type, I’ve degraded the performance due to the increase in row length.
Physical data model doesn’t reflect business rules.
This can be circumvented if the logical model is maintained
If there is a relationship that only applies to one type of Employee, it is impossible to reflect it in the model and difficult to enforce If the sub type is a parent, a foreign key with a trigger on the child table is required where the trigger validates the discriminator value.  If the sub type is a child, an optional non-identifying relationship is required (even if the relationship is mandatory for the sub type) and a check constraint on the employee table is necessary.

  
Roll Down the Super Type Columns into each Sub Type

The last option is to “roll down” the super type entity into each of the sub type entities.  By “roll down”, I mean to take all of the super type attributes and copy them to each of the sub type entities.  Using our Employee example, the resulting data model would look like figure 3.

Figure 3 – Rolled down sub types

image010.gif
 

The benefit of this approach is I get to keep my column optionality.  I don’t need to use database triggers, check constraints, or application code to enforce the NOT NULL column constraints.  The downside of this approach is that I’ve redundantly stored my super type columns in each of my sub type tables.  Instead of storing one fact one place, I now store the same fact for each sub type table.  Add super type relationships that I have to redundantly create for each sub type to the mix and my model has become unnecessarily complex.  For this simple example, the redundancy is easy to manage.  On more complex examples, this could become a maintenance nightmare even with assistance from today’s modeling tools.  That said, there are performance advantages to this model approach. 

Advantages

Only a single statement needed for insert, update, delete, and read Remember, subtypes are normally mutually exclusive, so only one sub type will apply
Fair performance for full table scans on super type columns
Not as bad as all the columns in one table as in Option 2, but not as good as Option 1
Good performance for full table scan with super and sub type columns Smaller row length outperforms Option 2
Column Optionality maintained My personal favorite as the Data Modeler
Easy maintenance For a new sub type simply create a new table

And disadvantages:

Redundant attributes, relationships More opportunity to introduce modeling errors over time
Fair performance for full table scan with super and sub type columns
Since I added all of the sub type columns to the super type, I’ve degraded the performance due to the increase in row length.
Need Union statement to read all Employee types Depending on the number of sub types, this can be significant
If there are relationships that apply to all employees, they must be created redundantly (one for each subtype) This can add an order of magnitude of complexity to the design and implementation.

Conclusion

As you can see, there are advantages and disadvantages to each approach.  The correct approach for your situation requires more thought than simply roll the sub types up to improve performance.  Make sure you thoroughly understand the application access paths and how often they are performed to gain insight to the best approach for your application.

Performance tuning is always about tradeoffs.  Where one option can improve update and delete performance, it likewise can degrade read performance.

Acknowledgments


1 Understanding Data Model Quality – Graeme Simsion

Performance Tuning Oracle 9i – Oracle

Oracle Performance Tuning Tips and Techniques - Richard J. Niemiec

For contributions to this article – Lee Leclair and Jean Anne Brown

About the Author

Walter Howard is president of WallStreet Consulting Services, Inc., a company specializing in managing data assets.  He has ten years of data modeling experience in designing OLTP and OLAP databases from business requirements.  He can be reached at Walter@WallStreet-Consulting-Services.com.

Print  



Social Bookmarks -  Share this page - email email | del.icio.us del.icio.us | digg digg | technorati technorati | facebook facebook


  Home|Groups|About Us|Bookstore|Services|Articles & Videos|Member Profiles|What's New
Copyright 2006-7 InfoAdvisors, Inc. Terms Of Use Privacy Statement