July 25, 2008
Search Minimize

Print  

Social Bookmarks -  Share this page - email email | del.icio.us del.icio.us | digg digg | technorati technorati | facebook facebook
Articles & VideosUsing UML & Ratrional Rose    

Using UML and Rational Rose for Data Modeling

By Walter Howard

Introduction

If you’re like me, you’ve been intrigued by Rational Software’s push to enter the data modeling market.  Rational has extended their Rose product to offer data modeling using the UML (Unified Modeling Language) notation as opposed to the traditional notations such as IDEF1X and Information Engineering (IE).  Is it worth your time to learn data modeling with UML?  I’ll explore this question and others as I take a look at the UML based Rational Rose Data Modeler.

UML_Ra1.jpgIt’s been hard to miss the UML’s penetration into the marketplace.  As Dr. Paul Dorsey of Dulcian states “UML is the emerging standard.  We no longer operate in a development environment where the will be many competing modeling standards.”  Rational Software has taken dead aim at the database modeling market by releasing a Data Modeling product.  The Rose data modeler offers round trip engineering of databases using the UML notation. 

But what aspects of the UML make it a better selection for data modeling than the traditional IDEF1X or IE notations?  For one, most large companies have a database group that is separate from the development groups.  This team structure, while good for leveraging Data Analysts' (DA) and Database Administrators' (DBA) time, doesn’t do much for project teamwork.  In addition to pitting competing ideas of data quality and application performance against each other, the analysis artifacts produced by the different teams are usually in different case tools using different notations.  I’ve seen this very scenario at a previous client.  The Database group had a staff of DA’s whose job was to perform data modeling and database design.  The DA’s primarily used ERwin to document their data requirements in logical and physical data models.  The application development teams, on the other hand, used Rational Rose for building their analysis and design artifacts.  The application teams understood their class diagrams, but drop an entity relationship diagram in front of them and their faces went blank.  The Rose Data Modeler aims to solve this problem by having all models built using the UML notation.  Per Rational, when all application and design artifacts are built using the same notation, “communication will flow more freely and open the barriers between teams, improving quality and reducing the risk of error. ”  

Rose’s second biggest benefit comes from its process modeling capabilities.  Instead of burying the stored procedures, triggers, and other database code in the database, Rose allows you to model your database code just as you would your application code.  Imagine a comprehensive suite of application analysis and design artifacts that include all of your referential integrity triggers and stored procedures to boot!

So Where Do I Start?

Let me start by saying all Data Analysts need to learn the UML.  With the multitudes of object oriented development projects underway, a DA who cannot read the UML artifacts will quickly find their skills and marketability passé.  You can download the UML 1.3 specification from the Rational website (www.rational.com).  Don’t try reading it unless, of course, you’re suffering from insomnia.  What is important to remember is that the UML was designed with process and data in mind, not just data.  To that end, the UML is very flexible and extendable to handle the complexities of modeling business processes.  In order to model data, only a subset of the UML specification is required.  Unfortunately, it’s not painfully obvious as to what is superfluous and what is not.  This is readily apparent in the following example.  In data modeling, there are two types of relationships, which both indicate a meaningful business relationship: 

Identifying: The child entity is dependent on the parent entity for its identify and cannot exist without it

Non-Identifying: The child entity is not dependent on the parent for its identity

In the UML, there are many flavors of relationships that can show more meaning than just identification.  They also can show dependency and navigation.  In the following Figure 1, the CUSTOMER entity is shown to be able to navigate to, or have a reference to the ORDER entity.  The arrow on the end of the relationship indicates navigability.

UML_Ra2.gif

Figure 1 – Class Model Fragment

Figure 2 shows the forward engineered data model. As it turns out, the ORDER table has the reference (i.e. foreign key) to the CUSTOMER table. The navigation arrow shown in the class model has no meaning in a data model and only obfuscates the meaningful objects in the model.

UML_Ra3.gif

Figure 2 – Transformed Data Model

The model fragment shown in Figure 3 is the same model with the ORDER entity shown with the navigation adornment to the CUSTOMER entity.  As you can see in the forward engineered data model (Figure 4), no matter where the relationship navigation adornment is placed, the transformed data model is not impacted.  (And I won’t even nitpick about the way the cardinality symbols changed in the class model from 0..n to 0..* in the data model.)

UML_Ra4.gif

 Figure 3 – Class Model Fragment

UML_Ra5.gif

Figure 4 – Transformed data model

You’re right. I could live with that problem because it doesn’t impact the accuracy of the data model. But what about problems that are blatantly wrong in the data but might be feasible in the process world. The following model fragment (Figure 5) shows two new entities, AIRPLANE and SEAT. This time, there is a composite aggregation relationship instead of an association relationship. In data modeling terms, a composite aggregation transforms to an identifying relationship. It just so happens I inadvertently placed the dependency on the wrong side of the relationship. That is, the AIRPLANE is shown dependent on the SEAT instead of the SEAT being dependent on the AIRPLANE. I was, however, able to get the cardinality of the relationship correct. I could even add the navigation adornment, but you already know what value that adds! Don’t bother trying to draw this structure in ERwin, because it is impossible. I forward engineered my Rose Class model to the data model and you can see the results in Figure 6. Even though the cardinality indicates that the AIRPLANE entity is the parent entity, the backwards dependency relationship caused the primary key of the SEAT entity (T_SEAT_ID) to migrate “down” the relationship to the AIRPLANE entity.  To exacerbate matters, Rose decided to change the SEAT cardinality from 0..n (0, 1, or more) to 0..1 (0 or 1).  It could be that Rose recognized that a cardinality of greater than one is against the rules per the UML 1.3 specification or perhaps it figured out that it was impossible to support the 0..n cardinality via a foreign key.  Whichever, what I am left with is a compound primary key for AIRPLANE where one of the columns is nullable (T_SEAT_ID).  Last time I checked, Oracle couldn’t implement a null foreign key column for a primary key. 

UML_Ra6.gif

Figure 6 – Transformed Data Model with Identifying Relationship

So, in fact dependency does matter some of the time and navigation none of the time.  In association relationships, dependency is meaningless, but in identifying relationships, dependency indicates the parent entity regardless of what cardinality is specified.  I attribute this type of error to Rose trying to be too many things for too many people.  I also attribute it to Rose not faithfully implementing the UML specification.  Either way, Rose will let you model data structures that can’t be implemented in a relational databaseWhere To From Here

The Rose product is an excellent software-modeling tool.  Unfortunately when it comes to data modeling, it’s not quite ready for prime time.  The feature rich UML specification is actually a drawback when it comes to data modeling.  Although Rose offers some advantages to modeling database code and team cooperation, my recommendation is to keep building those databases with tried and true IDEF1X and IE methodologies.  I’ve just touched on a few points in this column.  I’ll dive into logical and physical modeling differences between Rose and ERwin in my next column.  Until then, remember the golden rule, “Data is the asset!”

Discuss this Article
About the AuthorWalter Howard is president of WallStreet Consulting Services, Inc., a company specializing in managing data assets.  He has over nine years of data modeling experience primarily in designing enterprise and application OLTP databases from user and business requirements.  He can be reached at Walter@WallStreet-Consulting-Services.com.



Social Bookmarks -  Share this page - email email | del.icio.us del.icio.us | digg digg | technorati technorati | facebook facebook


  Home|Groups|About Us|Bookstore|Services|Articles & Videos|Member Profiles|What's New
Copyright 2006-7 InfoAdvisors, Inc. Terms Of Use Privacy Statement