Vladimir Patryshev

Generic Data Model

Introduction

 

4 years ago I had a problem designing a database that was supposed to hold data from various questionnaires; the questions were slightly varying every time and had no regular structure, reflecting the weird ways of thinking of marketing people. The solution was to produce something more or less generic, storing “entities” and “attributes” separately.

 

Later, one lucky day, I met XML, now a ubiquitous format able to store virtually all worth storing in computer readable form. Well, yes, XML has DTD. DTD, I think, was invented by Engineers, not by linguists. DTD is intrinsically anti-diachronic. If there were DTDs in real life, we would have to throw away our dear old books, would not we?

 

However, if we discard DTD, XML has enough flexibility to enable storing evolving data, without bothering to specify them strictly in advance. XML has certain limitations. An element can be a member of only one collection. As a result, it is hard to distinguish between the key that uniquely identifies the element and the element identifier within the collection. The approach used by Naming Service in CORBA looks more generic and simple.

 

Tables

The model consists of three tables: ENTITY, ATTRIBUTE, CONTEXT.

ENTITY Table

This is the main table in the database:

 

create table ENTITY (PK long, CLASS varchar, CREATED timestamp);

 

All the entities that we are going to have stored in our database have a representative in ENTITY table. An entity represents an object of our model. An object has an inalienable property, class. The term ‘type’ would probably suit better. An entity cannot change its class; actually, all the fields in entity record are read-only.

ATTRIBUTE Table

This table contains attributes of the entities. Records consist of name-value pairs and reference ENTITY table:

 

create table ATTRIBUTE (ENTITY long, NAME varchar, VALUE varchar);

 

The usage of this table is probably obvious. Note that if NAME is null, it means that the attribute contains the value of the entity itself. Two other fields are non-null.

 

CONTEXT Table

On one hand, this table can be considered as containing all the collections of the database; but on the other hand, the purpose of this table is wider. It contains all the naming contexts, and it can be used for navigation as well as for storing master-detail relationships.

 

create table CONTEXT (NAME varchar, OWNER long, MEMBER long, MEMBERID varchar);

 

Here NAME is the name of the context. If you are interested in collections, it is probably the name of collection owned by OWNER. OWNER is obviously the entity that owns the context, the “master”. MEMBER points to the member of the collection; in this case MEMBERID is irrelevant. But we can consider this as a Hash, in which case MEMBERID becomes a key, and MEMBER is the corresponding value. In the Naming Service paradigm, MEMBERID is the name of the entity MEMBER in the context NAME.

 

Import From XML

XML data model is a submodel of this generic model, so that for any XML there is a monomorphism importing it into Generic Database. I am currently implementing a Java class that will be able to do this operation. Details seem to be obvious: nodes (elements) become entities, element type becoming the entity class; attributes go to attribute table, including the element value which become an anonymous attribute. The document structure is reflected in CONTEXT table, NAME and MEMBERID are irrelevant (but can be generated using a kind of heuristics).

 

Export To XML

 

In general, it is impossible. There is no obvious way to reproduce in XML multiple contexts, context names and member ids. But the data imported from XML are exported to the same XML (up to an isomorphism).

Conclusion

What you are reading is a draft, containing only general ideas. I only implore you not to bother about efficiency. Physical implementation can be a very different issue. First, all the data that need fast access can be stored in additional tables. Second, these three tables can be a view, and the implementation can be done reflecting the actual data structures (this idea was proposed by V.S.)

 

 

Monday, April 09, 2001