Is a Generic Data Model Like Generic Peas?

In a previous article, ISO 15926 – A Decades-long Overnight Success, we told how ISO 15926 was born from the decision to split the large, comprehensive data model of STEP into two parts; Part 2, a generic data model, and Part 4, a library of reference data, or just the RDL. We have written several articles about the RDL; here we introduce you to a generic data model.

If you are like the author your first thought is “What on earth is a generic data model?” We are all familiar with generic drugs. In Western Canada where the author lives, you can even buy generic peas in bright yellow cans. In this context generic means plain, good enough, and less expensive.

 

Generic Peas

Generic Peas

But what is a “generic” data model? We might guess (correctly) that it has something to do with a database, but “plain, good enough, and less expensive” are not adjectives we normally associate with a database. And after we figure out what a generic data model is we still have to figure out why we should care.

We will start by comparing generic data models with conventional data models, then talk about why a generic data model is a good thing for information exchange.

What is a Data Model?

A data model shows the structure of a database and by studying it you can figure out what the data means. A data model is like the blueprint to a building. Even if the blueprint is written in a foreign language, you can infer the purpose of the various rooms just by looking at it. Similarly, you can infer what the data in a database means just by looking at the data model. (Download An Introduction to ISO 15926 and read chapter 3 for an example of inferring meaning from a data model.)

Conventional vs. Generic Data Model

Computer Science instructors have killed a lot of trees writing about the difference between conventional and generic data models. (If you want a gentle introduction you can check out the Wikipedia page.) But to this poor correspondent, it boils down to what the database is optimized for. Conventional data models are optimized for query speed within a particular software application; generic data models are optimized for flexibility.

Most commercial software applications use a conventional data model.

Conventional Data Models

When the range of data is tightly constrained developers can create a data model that is very specific to that set of data. They know that they can control the data that is entered through the software and that any new data will always conform to the data model.

For instance, if they are writing an application for industrial instrumentation design, the developers decide where the values for tag numbers go and how to store things like manufacturer name. Sometimes a particular data item will be deliberately duplicated in order to make queries faster (in database jargon: denormalized.) Since the developers have a fixed data structure and know what everything means they can create fast queries.

But when your goal is direct, machine-to-machine information exchange with all of your business partners, the very thing that gives a conventional data model its speed–the very specific data model–becomes a roadblock. A specific, fixed data model makes it difficult to exchange information with systems designed by others. For instance, process design software, 3D engineering design software, and construction management software all deal with the same real-world equipment and piping systems. But because each suite of software has a different purpose, the data models will be quite different and direct machine-to-machine information exchange is impossible without human intervention. (In fact there is a whole subculture of system integrators who make a very good living making different databases talk to each other.)

Generic Data Models

A generic data model, on the other hand, assumes as little as possible about the field of data, more-or-less letting the data describe itself through the use of reference data in the RDL. So generic in this sense means non-specific, able to describe anything. The main advantage here is that when the domain of information expands we do not have to re-engineer the data model; all we do is add values to the RDL.

The main disadvantage of a generic data model is query speed. With a conventional data model you know exactly what you are querying for; with a generic data model you may in fact have to create several queries just to know what you are querying for. For instance, in a conventional database for level instruments, there will be a fixed number of attributes (columns in database jargon) in a fixed relationship with each other. But a generic data model for level instruments might not even know it’s for level instruments until someone selects a value from the RDL that says “Level Instrument”. The attributes may in fact vary from instrument to instrument depending on the reference data that is used. From an end-user performance point of view the queries will not respond as quickly.

So Why use a Generic Data Model?

The reason the founders of iRING (then, ISO 15926) used a generic data model is because with it, iRING can represent different real-world objects more easily. Instead of modifying the data model whenever we need to add something new, all we have to do is extend the reference data. We can much more easily represent different real-world objects with a generic data model simply by extending the reference data. (For instance, the iRING data model can be used for real estate transactions simply by using real estate reference data.)

Why this is important to you, dear reader, is that when you are exchanging information with a business partner, preserving the meaning of the data and being able to easily accommodate the data models of different software systems is more important than query speed.

We can show how a generic data model can adapt to change faster than a conventional model by comparing STEP to iRING.

The STEP Approach

The immediate forebear of iRING (ISO 15926) was STEP (ISO 10303). Its approach was to create a very comprehensive conventional data model that described everything about a product. For instance, if someone needed a temperature range (“This part is rated for -30C to 100C”), he would have to make a data model for a temperature range. If someone else needed a range for pressure she would have to make a data model for pressure range.


STEP Conventional Data Model
Temperature Range
Pressure Range
Roughness Range
"This" Range
"That" Range

With STEP, every range requires a different data model.

The approach of STEP, of representing things using a conventional data model, works well for exchanging information about manufactured products with a life-cycle measured in years. (In fact the author has imported “STEP models” from manufacturers into 3D plant models with great success.) But this approach proved to be unwieldy for exchanging information about industrial plants, with a life-cyle measured in decades.

The iRING Approach

In contrast, the iRING approach is to use a generic data model for ranges and to populate it with the appropriate property class and related units of measurement from the RDL. Generic in this case means that the data model would be identifiable as a range of something, but wouldn’t really make sense on its own. To create a “temperature range” you would take the generic model for a range and populate it with the class of Temperature Range and the appropriate temperature units from the RDL.


iRING Generic Data Model
RDL Reference Data
Property Class
iRING Generic Data Model
RDL Reference Data
Unit of Measurement
Property RangeTemperature RangeScaleCelsius, Fahrenheit, Kelvin
Pressure Rangepsig, barg, kPa
Volumetric Flow Rangegpm, m3/hr
"This" Range"this"
"That" Range"that"

If you need an entirely new kind of range, all you have to do is add the new property class and unit of measurement to the RDL.

A friendly Martian

A friendly Martian

 

Adding a New Range

We can see how this approach keeps up to a fast-paced market when we attempt to add a new range. For instance, what if Curiosity, the rover currently wandering around on Mars, were to find actual, real, live Martians? (Please, stay with us here. This example came to us during a very interesting and informative talk at the recent Fiatech conference in San Antonio, Texas, given by Dr. John Grotzinger, Chief Scientist of the 2012 Curiosity mission.)

If Curiosity were to discover Martians after running around as long as it has been, our first question would be “How come we haven’t seen any tracks?” So we take a second look and, lo and behold, they don’t have any legs! They’re just floating a few centimeters off the ground! After we get to know them a bit, the Martians tell us they get around simply by deciding to do so. They just use their minds! They haven’t used legs for millennia and they’ve atrophied.

But what if humans on Earth, after seeing the Martians, can do it too! What if all we needed was someone to tell us that we could! Pretty soon we would all be floating around a few centimeters off the ground just like them. But being good engineers we would probably feel a need to quantify this new force of mind and the first thing we would need is a good name. Someone suggests “Egos”.

So the question is, after all that, how would you implement a range of “Egos” with STEP and iRING?

Adding “Egos” with STEP

The STEP approach is to add a new data model for a range of “Egos”. To do this the organizers of STEP would get everyone together and develop a new Application Protocol with a new data model that includes a range for Egos.


STEP Conventional Data Model
Temperature Range
Pressure Range
Roughness Range
"This" Range
"That" Range
EGO Range

Expected duration? Years.

Adding “Egos” with iRING

How would iRING handle this?

Someone would add the new property class, perhaps something like “Telekinesis Range”, and the related new unit of measurement “ego” to the RDL.


iRING Generic Data Model
RDL Reference Data
Property Class
iRING Generic Data Model
RDL Reference Data
Unit of Measurement
Property RangeTemperature RangeScaleCelsius, Fahrenheit, Kelvin
Pressure Rangepsig, barg, kPa
Volumetric Flow Rangegpm, m3/hr
"This" Rangethis
"That" Rangethat
Telekinesis Range ego

That’s it.

Expected duration? Minutes if someone wanted to, but practically speaking, we’d probably bounce it around for a few weeks first.

This shows the flexibility of iRING, which is why the founders of iRING chose a generic data model. When we are describing the physical assets of capital projects all over the world, with the increasing pace of technological change, we are always going to run in to new things. With iRING we can accommodate these simply by extending the reference data in the RDL. So, for instance, if the color of a common object became important all we would have to do is extend the definition of the object to include color; we would not have to engage in a world-wide discussion.

Some Questions

Based on this explanation you may have a couple of questions:

1. Is this a criticism of the International Organization for Standardization and their manner of doing things?

A. Not at all. ISO exists for the purpose of getting the world together cooperating on standard ways of doing things. When you involve literally everyone in the world who wants to be involved, the process of getting consensus takes a while–as it should. Overall, the process works well for many things, just not keeping up with new products and terminology as they hit the market. What iRING ends up with, then, is a good combination. The basic data model is created within the framework of the ISO standards, but the actual content is left to the industry to manage.

2. If just anyone can add stuff to the RDL, won’t it collect a whole bunch of garbage?

A. Yes–and that is a good thing! (But that’s a subject for another post.)

Stay tuned.

, , ,

No comments yet.

Leave a Reply