In an earlier post, Is a Generic Data Model Like Generic Peas? we said that because anyone can update the Reference Data Library (RDL) that it will collect garbage, and that this was a good thing. If you’ve ever been involved with cleaning up a mature database this was probably quite a surprise. People whose job it is to manage large production databases (which category happens to include the author) are used to working hard to keep their data reliable and free of garbage, and now someone comes along and says garbage is a good thing. Hmm….
The fear is that if everyone can add definitions to the RDL, it will accumulate so much “stuff” that it will be hard to sort the good from the bad.
- If just anyone can create their own RDL and expose it to the world, won’t there be a proliferation of frivolous and special purpose RDL’s that will be hard to sift through? (Not to mention the possibility of RDL’s created with malicious intent.)
- If RDL’s are to be immutable (a concept we will discus shortly) how do we fix things when there is a legitimate change in industry? (Not to mention fixing mistakes made by people on their learning curves.)
The natural thought, then, is to put someone in charge. But with rapidly changing technology and the continual introduction of new things, how can we possibly have a body that is both effective and quick enough to keep up? If you are in the middle of a large information exchange project and need a new term, you need it now, not two years from now.
So it seems that we are creating a world with such a proliferation of RDL’s that it will take a PhD to figure out the good from the bad.
The answer is actually quite simple.
Let’s put the question into another perspective. How many dictionaries are there in the world? Hundreds? Easily. Thousands? Probably. Perhaps now that anyone can set up a web page there may even be hundreds of thousands. Most of them will have words you won’t understand, let alone use in polite conversation.
But does this bother you, dear reader, when you write a report for your day job? With all the dictionaries in the world, do you have any trouble figuring out which one to use? In Canada, where the author lives, the Oxford English Dictionary is the one that is generally used to settle arguments and break ties. Every country in the world has one of similar stature. You know which one to use and simply ignore the rest.
Now occasionally you might need to use a word that is not in your normal dictionary. For instance, suppose one of your kids brings a Goth friend over after school one day. (If you are like the author you would probably do a double-take, and then remember the things you did in high school yourself, and that the least of these was a hair style your parents found shocking!) Perhaps in the course of the visit you overhear a couple strange words. The next day at work, while writing a report, you remember one that just fits. To make sure you have the spelling correct you search for Goth dictionaries online and find the East Los Angeles Dictionary of Goth Slang. So it is possible that you might use words from a non-standard dictionary in a formal business report, and when you need to you will find the right one.
The same sort of thing will be true of iRING RDL’s. In a previous post, Understanding the ISO 15926 RDL we described the federated nature of the individual RDL’s that make up the overall iRING RDL. What this means is that we will likely end up using several RDL’s.
- Industry-standard terms from the POSC Caesar Association (PCA) core RDL
- Reliable but specialized RDLs from organizations like the American Petroleum Institute (API) and similar bodies in other countries
- An RDL specific to a particular manufacturer
- Personnel titles and site locations from a corporate RDL
- Tag numbers for equipment and instruments from a site-specific RDL
- A project-specific RDL containing milestones and productivity rates
Each of these RDL’s in the list above will be developed and maintained to suit their constituencies. There may even be some Darwinian evolution since the federated nature of the overall iRING RDL sets up a sort of natural selection of terms. Better, more reliable terminology gets picked up by others and migrates up, eventually to the PCA RDL. Less reliable terminology doesn’t and may even go extinct.
For instance, the PCA’s core RDL will always have only a small group of people that can enter new terms and may never be “on demand”. Individual standards organizations and manufacturers, with smaller constituencies, may be able to adapt faster so new terminology may emerge in their RDLs first. As the industry gets comfortable with a new term it will migrate upwards, whereupon it might be removed from its original RDL.
An RDL does not have to be exposed to the world to be an iRING RDL. Obviously, the top level RDLs from the PCA and standards organizations will have to be public and free to use in order to be useful. But other RDLs, such as the corporate RDL in the list above, will probably only be of interest to that company internally. The fifth one, with the site-specific tag numbers, may have a public façade that external contractors and suppliers might be given read-only access to for a limited time.
One important issue is immutability. If you wish an information exchange to be reliable over time, one requirement is that each term used in the exchange must be immutable; that is, it must always exist and must always mean the same thing. (Imagine what would happen if the Oxford English Dictionary changed the meaning of all the words every year, or arbitrarily left some out from one edition to the next?)
Against the notion of every term must be immutable is the reality that some terminology genuinely has meaning for only a finite period of time. Other terms have one meaning over here and another over there. (For instance, how many “P-101′s” do you think there are in the world?)
The immutability of a term will emerge naturally due to the federated nature of the RDL. The lower level private sandboxes can each have a different meaning for a given term, and can change the definition of the term to suit the needs of their individual constituencies. As a consensus of the term’s meaning emerges, it can migrate up the certification scale.
For instance, in the list of examples above, terms in the PCA RDL will be removed only after careful deliberation, if at all. The RDL of equipment tag numbers might be synchronized to the physical assets of the plant as they change over time. The sixth one might be updated hundreds or thousands of times a day by reporting software and the whole thing might be deleted after the project is turned over and all the contractors paid.
So is Garbage Actually “Good”?
Whether or not garbage is good depends on one’s definition of the words “garbage” and “good”. We are not saying that garbage is “good” in the sense of “That piece of chocolate cake was good! May I have another?” We are saying that the old cliche One man’s garbage is another man’s treasure applies. We are saying that if we all tried to eliminate garbage, the cure would be worse than the disease. Who is going to make the decision of garbage/not-garbage? Looking at the examples above, it really depends on who is looking at a given RDL. Some of the examples in the list above would be garbage to all but a very small constituency. And in the last case, the RDL of project-specific metrics, it might be garbage the instant the last cheque is cashed, but be valuable a few years later to someone comparing current metrics with historical metrics.
When we write reports at work we don’t even think of all the dictionaries in the world like The East Los Angeles Dictionary of Goth Slang, we just use the one or two (or three) that are relevant to us. Similarly, when people configure information exchanges the choice of whether to use an existing RDL or create a new one will be obvious. If there is controversy over a few items, well, that’s why project managers are paid the big bucks.