Advanced Run Time Type Identification in C++'�Part I: Requirements

Environment: C++

May 3, 2003

Abstract

Run Time Type Identification (RTTI) provides some information about an object at run time, such as the name of its type. The C++ language has RTTI support, which fulfills the minimal requirements, but it is not enough for many applications of RTTI, such as object persistency. Other languages (Java and C#) have a better RTTI system, making it possible to declare properties for accessing the objects and the language implements persistency, but these languages have other disadvantages. C++ programs may also need persistency and the advanced features of RTTI systems. The C++ language is so powerful that it is possible to implement properties and an advanced RTTI system for persistency. The 1^st part of this article summarizes the application and requirements of an advanced RTTI system, the 2^nd part will show how to implement it, and the 3^rd part will describe how to use such an RTTI system for persistency.

RTTI Supported by C++

The standard C++ provides the typeid() operator for getting type information. Its argument is an expression (a reference or a pointer of an object) or a type name. It returns a constant reference to a type_info object containing some information of the object’s type.

The type_info class has only a few member functions:

const char *name() for getting the string representation of the type (for example, .int., or .MyClass.)
bool before( const type_info & ) for ordering
operator==() and operator!=() for comparing type_info objects

This information does not help to find the relations of objects and does not solve most of the problems that arise in applications. It is not designed for that. All applications have different requirements and the same standard type description cannot fulfill all the requirements.

However, the type_info class can be used as a key in a map storing more detailed type information [Stroustrup 15.4.4]. This way, every application can define and use its own RTTI system, which is a very flexible solution. The problem is that someone has to define the structure of the RTTI record and fill the map with records for every type. This job is not trivial and some code has to be written for every application for doing that.

Are the applications of RTTI so different? What are the requirements and what are common problems in most applications? How can we implement a useful RTTI system? How and for what can we use it? These questions will be discussed in three parts.

The first part of the article discusses two typical applications of RTTI. It collects and classifies the requirements of a general purpose RTTI system.

The second part describes how to implement an RTTI system that fulfills these requirements. The C++ language is probably the most powerful programming language. The implementation of such an RTTI system is not possible in most other programming languages. This fact demonstrates the power of C++ very well. The implementation uses a lot of advanced programming tricks and design patterns; therefore, it might be interesting even if you are not very much interested in RTTI systems.

The third part of the article adds some further idea to persistency. It describes a Stream Library using the presented RTTI system. The modularity and flexibility of this solution leads to many advantages discussed at the end of the article.

Applications of RTTI

There are many possible applications of RTTI, but two of them, persistency and application generators, are widely known. They are probably the most difficult applications and hopefully they need the most services of the RTTI system.

Persistency

The basic task of persistency is simple. The application or some important data of the application is represented as a set of objects with some references to each other. Persistency means that these objects can be saved to permanent storage—a file—and later the program will be able to restore the original state from that file. More precisely, an application or data is persistent if its lifetime is longer than the running time of the application.

When the application reads the file, the type of the objects has to be read from the file and a new object has to be created and initialized with the data read from the file. This is where we need RTTI. The basic process of saving and loading objects seems to be fairly simple, but if you consider some details and robustness, it became quite complicated.

The values of every data type should be transformed between its internal representation (binary) and file representation (text or binary). Who is responsible for this conversion? Generally, the stream object has functions or operators for writing and reading every type to and from the stream. These functions or operators have an overloaded version for every type the application may want to save and load. Well, this is some kind of RTTI implemented with overloaded functions, but it does not solve the problem of creating new objects with the required type. We may assume that the object already exists, and the stream object only has to fill the variables.

When all objects are created and all data are loaded, only half of the job is done. Some objects may have variables that are not worth saving because they can be computed from other variables. These variables have to be updated somehow. The references of objects to each other also have to be updated because the objects are loaded to different addresses.

Finally, let’s consider error handling. What happens if the stream is wrong? For example, what if one value is missing, or the order of values is different? This is a common mistake because different versions of am application may have different data structures. It also may be required that the file format of different versions should be compatible; different versions should understand the file saved by previous versions. Moreover, it would be nice if an older version could load a data file created by a newer version, make some changes, and save it again without losing information. The older version cannot process and understand the new types and features, but it should be able to store them somewhere and save them again without knowing anything about their meaning.

There are many solutions and libraries available on the market for persistency, but all of them provide only the basic requirements. They are able to save and load objects, but they do not tolerate any mistakes in the stream. There are two common solutions:

Objects have virtual methods for saving and loading the object. The argument is a reference to a stream. The stream has overloaded operators or functions for writing and reading the value of base types. It is the responsibility of the programmer how the variables are saved and loaded. For example, the Microsoft Foundation Class library provides persistency this way.
Some libraries save and load the object’s memory to a binary file. These solutions require very little programming effort, but special tricks are used to validate memory addresses and virtual method tables. It is not possible to read or edit the data files. Any damage to data files leads to serious problems. Another drawback is that all variables are saved. There is no way to make a distinction between persistent and temporary data.

The authors have not seen any solution for persistency in C++, which provides readable data streams, robustness, and tolerance in the structure of the stream. There are some implementations of properties and persistency in other systems and languages that are worth being mentioned here:

CORBA
COM/OLE
Java
C#
Delphi/C++ Builder

The detailed comparison and analyzes of these systems go beyond the frame of this article. All of them have advantages and disadvantages, and as far as we know, none of them provides the required flexibility.

Applications of Persistency

Saving and loading an application’s data

Saving documents in a text editor is a typical example.

Distributed applications

When different parts of the applications run on different computers or nodes, data objects have to be sent through the network. Before sending the object, it has to be packed to a message, and when the message has been delivered, the object has to be constructed again on the target node.

Saving and loading the current state of the application

The end user wants to continue his job, where he has dropped. Therefore the program has to save all relevant information when the user exits, and it has to start in the same state when the user starts it next time.

Nowadays, this problem is generally solved manually by writing the important variables to INI files or to the Registry. Using persistent objects is more elegant and easier solution.

Configuring applications

The real functionality of the program can be defined in configuration files. When the program is started, persistent objects are loaded from the configuration file. These objects determine how the program looks and what it does. Different configuration files may lead to different applications. This topic leads to the next chapter, where a special program, the Application Generator, is used to create such a configuration file.

Application Generators

What is an Application Generator? There are so many application generator programs that it is not easy to define what it exactly means. Let’s highlight some features that are important for us.

Application Generators are program development tools that make the program development process quick and easy. They provide a set of components and a nice graphical user interface where someone can build an application by adding and configuring components.

The main advantage of such systems is that the developer does not need deep programming knowledge and the application can be built quickly. On the other hand, it has two disadvantages: lower performance and limited capabilities.

Lower performance means that the same application developed in C++ could run much faster because the internal communication between components is not efficient enough, or the developer cannot use a more efficient structure due to the limitation of the system. Nowadays, this is not a big handicap. The speed of computers is growing, and the most expensive resource is the time of the software developers, but in some applications (embedded systems, data acquisition, and control systems), speed is still important.

The limited capabilities are the most important disadvantage. All Application Generators are designed for a specific field such as database application, data acquisition, or graphical user interfaces. If the application needs a component that is not available, the developer is in big trouble. Sometimes, the problem can be gotten around somehow, or the system may provide some support for writing application specific components, but these solutions destroy the original advantages. The worst is that the development of a large application is not predictable. At the beginning, it seems that an Application Generator will fulfill all requirements, but when the application is almost ready, the developers may realize that one important requirement cannot be implemented—just because a component is missing or it does not behave as expected.

Application Generator systems are very popular despite the above-mentioned disadvantages. One of the reasons is that they provide the most reliable way of program development, the component based development process. This is probably the most important advantage. The application developer can work on a higher abstraction level and he is not required to go into the details of component implementation. The components are developed by experts and are tested in several applications.

The best Application Generator would combine the advantages of both ways of software development. Let’s imagine a system where the components are objects written in C++ and the Application Generator is a program that represents the objects graphically, adds new objects, and sets their properties. Some properties provide connections between objects, while others describe and determine how the object looks and behaves. The users of the Application Generator can work only with the abstraction level represented by the components, while a small team of C++ developers can make new components as required.

A C++ RTTI system would make it possible to build such an Application Generator. The run time system has a set of objects having detailed RTTI information, and the application is built using these components. Another program, the Application Generator, has access to the same set of components. It can investigate the object hierarchy of the run time system, add new objects, and change their properties. A nice graphical user interface makes it easy to use while the full control over the components makes it very efficient.

There are two possible ways of communication between the Application Generator and the Run Time System:

The Application Generator may have the same set of components and it can build an object hierarchy alone. When the application is ready, it is saved to a file and the Run Time System will load it. This solution requires recompiling the Application Generator when new components are added.
The Runtime System may provide an interface for the Application Generator programs for accessing the internal object hierarchy. This makes it possible to develop commercial Application Generator Programs while the components stays proprietary. This interface is quite complicated and has to have a mechanism for freezing the application while the Application Generator changes the structure of the program.

Both solutions have almost the same requirements for the RTTI system. Application Generators use persistency for saving and loading the components; therefore, they have all the above-mentioned requirements against the RTTI system. Above that, the Application Generators need further features:

More and redundant properties may exist. The Application Generator may want to use redundant properties to provide more possibilities for the user, while only the minimal number of properties should be saved into a data stream. For example, a rectangle may have six different properties: the coordinates of the corners, and the height and width of the rectangle. These properties provide 10 numbers, but they are not independent. Four numbers are enough to describe the rectangle and all other properties can be computed.
The Application Generator has to provide a component selection tool for the user. Therefore, it needs to investigate the list of available components, gets some information of the available types (type name, inheritance), and displays the list of applicable types. There are special components, called containers, for storing other components. (A dialog box window is a trivial example. It stores other control objects such as buttons, input fields, and labels.) The Application Generator has to be able to handle the inheritance of components, and it has to provide only the list of those components, which type is compatible with the container.

The requirements of these applications are similar. It is probably clear what it is expected, but the next chapter will collect and discuss the requirements in detail.

Requirements

The previous chapter gave a short overview about Persistency and Application Generators from the RTTI system’s point of view. Some relations and requirements for RTTI systems were also discussed, as both Persistency and Application Generator programs need run-time type identification. Now these requirements will be clearly described in detail.

Shortly speaking, the requirements are the following:

The RTTI system has to provide enough information for Persistency and Application Generator programs.
It has to be easy to add the RTTI information to C++ types and classes.
The RTTI system has to be able to describe all features of the C++ language, including multiple inheritance, polymorphism, abstract classes, and template classes.
The RTTI system has to provide some support for using the standard libraries. The most difficult question is how to handle the standard containers.

The compiler can implement an RTTI system and probably this would be the simplest solution for the users, but it would make the C++ standard more complicated and probably would lead to arguments for the necessary features. Different applications may need slightly different RTTI systems and these differences cannot be covered by a standard RTTI system. The C++ language makes possible to write an RTTI system as a library. This solution requires some additional work and knowledge, but it is more flexible.

Types

The RTTI system describes both user-defined types and the built-in types of the language, including the types defined in libraries and used in the applications, and types defined by the application itself. These types have three different groups:

Base Types
Compound Types
Container Types

Base Types

Base Types are not the same as the built-in types of the language. All built-in types are Base Type, but there are many other types handled as Base Types. Any type, structure, or class may be described and handled as Base Type. For example, strings can be handled as an array of characters, but the essence of the string is better represented if it is described as Base Type.

The main point is that Base Types are the atomic components of the RTTI system. Base Types cannot be described as a composition of other types in contrast to compound and container types integrating several other members or elements for making new types and new objects.

Compound Types

C++ structures and classes are Compound Types. The most important difference of Base Types and Compound Types is that the RTTI system has to describe the members of the Compound Types. Instead of writing new data conversion functions for classes, a description of the members is given.

A simple structure for describing colors is a good example. The Color structure has three integer members for the RGB components. The RTTI description of the color structure describes that the Color structure is a Compound Type and it has three integer type properties called R, G, and B. This way, it is much simpler to make the type descriptor of Compound Types than the Basic Types.

Container Types

Container Types require special attention. Containers are special types storing several objects. STL containers (vector, list, set, map) and arrays are good examples of Container Types.

In addition, the RTTI system has to be able to:

Insert elements into the container
Delete elements of the container
Iterate through the elements of the container

The definition of a type descriptor for a given container should be as simple as the type descriptor of Compound Types.

Interface

All the above-mentioned categories are different, but they must be described in a consistent framework. Therefore, the RTTI system has to provide a consistent interface for accessing the type information and the members of a given object hierarchy. The interface must be independent of the actual type and structure of the objects and must be able to describe all features and possibilities of the C++ language. On the top of that, it should be easy to use.

The RTTI system has to have an interface for getting the description of types, something similar to the type_id() operator. For example, a function called GetTypeInfo() returns the address of the type descriptor of the given object. All types including base, compound, and container types have one and only one type descriptor record, which has an interface for accessing all information stored by the RTTI system.

Even if this type description record exists for all types, the compound and container types have to store some additional information about their members and they have to provide another interface for iterating through the tree of their members. The well-known iterators can be used here as well. Property Iterators can be created for traversing the object hierarchy, while the actual implementations of the iterators are hidden.

The interface of the RTTI system ensures that the Application Generator or the Persistent Streaming Library does not need to know anything about the C++ classes. They can get all necessary information through the RTTI interface and navigate through the object hierarchy by using Property Iterators.

The RTTI system consists of three parts:

RTTI description of Base Types
Property (member) Description of compound and container types
Property iterators

The following chapters describe these parts.

RTTI Description of Base Types

All types including base, compound, and container types must have a basic type description. This is a static object called Type Info record.

Every type of the application has a Type Info record. The Type Info records are instances of the Type Info classes. Every type has a Type Info class and that class has one and only one instance. The Type Info classes are written manually for Base Types and created automatically for Compound Types. The Type Info record provides the following:

Type Name. The human readable name of the type.
Type Identifier. A unique, binary identifier of the type.
Size. Function for returning the size of the object (in bytes).
Creating objects. Functions for creating an object or an array of objects.
Destroying objects. Functions for destroying an object or an array of objects.
Accessing the object’s value. Functions for accessing the value of the object (e.g. GetVal(), SetVal() )
Functions for getting information of the type:

Is it compound type?
Is it container type?
Is it abstract type?

Functions for iterating through all Type Info records.

The Type Info record makes it possible to use and investigate any given type. It is the base of the RTTI system and it must exist for all types the application wants to access at run time.

Description of Compound and Container Types

Compound and container types require additional information beyond the Type Info record. They are not simple types, where the GetVal() and SetVal() functions can handle a single value. They contain a list of other objects and the RTTI system has to provide a description of these objects. Compound Types contain members, whereas Container Types contain elements. The types of the members and elements may be different as well, when the elements of the container are pointers to polymorph objects. These members and elements are called properties.

Note: Not all member variable of a class are property, and not only member variables but also member functions can be defined as properties.

The number of properties is fixed for Compound Types, but for most Container Types it depends on the number of elements actually stored in the container. Therefore, the RTTI system cannot depend on the number of properties to be consistent, but it has to iterate through them.

The description of the members can be implemented with an array of Property Descriptor records called the Property Descriptor Table. Property Descriptors are simple data records describing a single member of a Compound Type. Members can be anything: a base type, a compound type, or a container.

The Property Descriptor has several sub-types, depending on the type of the property it belongs to, but it provides the following common services:

Access to the Type Info record.
Store a human-readable, unique name of the property.
Provide a binary identifier of the property.
Store some flags for describing the property:

Is it referring to an object? (Is it a pointer or a reference?)
Is the property the owner of the referenced object? (If the property is pointer and the pointer owns the object, the object has to be deleted when it is replaced with another object.)
Is it possible to extend the number of elements of the property? (If the property is a container, it is possible to insert new elements?)
Public, protected, or private
Readable or writeable

Provide some mechanism for getting the address of the memory storing the member or the element. This may be the offset of the member variable or the address of a gate function.
Creating an iterator to the property (see later).
Functions for getting and setting the value of the property.
Functions for adding new objects to extendable properties.

The Property Descriptor is quite tricky. It has to hide all the differences of members and provide a common interface to any possible property type.

Property Iterators

The Type Info records and the Property Descriptor Tables store all information of the Run Time Type Identification system. The Property Iterators do not add new information; they just provide a well known and easy-to-use interface for the RTTI system.

Property Iterators can be created for pointing to any part of the object hierarchy and then the iterator can be used to traverse through the properties. Compound or container properties create new iterators for accessing their properties, so the new iterator opens a branch of the property tree. The member functions of Property Iterator make it possible to access all information and services of the Type Info and the Property Descriptor records.

On the top of the Property Iterators, the Property Interface contains other functions. Some member functions are added to the classes having property description and some global, static functions are used to access the list of the Type Info Records. These functions are required for getting the first Property Iterator and for accessing the list of available types.

How to Use the RTTI System

Hopefully, we already have some idea about the services and structure of an RTTI system. Before going into the details of implementation, let’s see how it can be used for Persistency and Application Generators.

Persistency

The library implementing the RTTI system is called Property Library because it implements properties for C++ objects. A new component of the system, the Stream Library, has to be introduced for Persistency. The Stream Library is responsible for handling the data stream or file. It uses the services of the Property Library to access the application’s data and uses an internal representation of the data stream. Different implementations of the Stream Library can support different stream formats. The third part of the article describes the details of Stream Library.

It is one of the main advantages of this solution that the Property Library (the RTTI system), the Stream Library (the data stream) and the application are independent. Using the interface of the Property Library the Stream Library can save and load C++ objects without knowing their type. The actual format of the stream only depends on the Stream Library. It may support text, XML, or binary formats, so the application selects the format of the stream when it is opened. Changing the file format does not require you to change the source code of the application’s classes, the RTTI description of them, nor the Property Library.

Saving Objects

First, a Property Iterator pointing to the beginning (root) of the object hierarchy is created. It is passed to the Save() function of the Stream Library. The Stream Library does not know the type of the object, but it can use the Property Iterator for getting all necessary information. The Save() function iterates through the properties and decides whether the type of the property is base type or not. If it is base type, the name, type, and value of the property are written to the stream. If it is a container or a compound type, the name and type of the property are written to the stream, and a new block of values is opened for the list of sub-properties. A text stream may look similar to a C program:

Obj1 = {
    int A = 23;
    int B = 46;
    RBG_c Color = {
        unsigned R = 255;
        unsigned G = 255;
        unsigned B = 255;
    }
}

This file is human readable and can be created or changed with a simple text editor. When the system is developed and the resource or data files can be changed only with a special editor, which is still under development, this feature is very important. Any bug can be fixed in the stream files and a missing feature of the editor does not delay the development of the application. Later, when the application and the editor are ready and tested, the stream format can be quickly and easily changed to a more efficient binary format.

Loading Objects

The Load() function of the Stream Library reads the type, name, and value of the objects. It creates the object if it has not been created before and reads its properties. Then, it searches for the property by name and sets its value. Please note that the stream drives the reading sequence and not the program or the structure of the class! Therefore, the loading process tolerates it if some value is missing or the order of values is different. The Stream Library may be able to handle unknown properties; this makes it possible to load streams created by a newer version of the program.

References or Pointers

References or pointers require special attention. When the objects are loaded, they are placed to a different address, while the references (pointers) contain the addresses of the objects when they were saved. The Stream Library has to build an address translation table and replace all address with the correct values.

The most critical part of the Stream Library is how it handles the pointers. When the objects are saved, all objects have to be saved once even if several pointers reference it. The objects are loaded first, and then the references are resolved by using the address translation table. This algorithm can solve circular references of objects. A circular reference happens when several objects have pointers to each other. For example, object A points to object B, it points to object C, and object C has a pointer to object A.

Default Values, and Validation

When a variable is not a property or the value of a property is missing from the stream, the variable will not be initialized when the object is loaded from the stream. Therefore, the developer of the class must pay special attention to initializing every variable in the constructors.

The Stream Library may support another feature for handling these variables. Every class may have a virtual function for validating the object. The Stream Library can call these Validate() functions at the end of the loading process for checking the validity of objects and giving a chance for the object to set some un-initialized variables.

Application Generator

The Application Generator program probably uses the Stream Library for saving and loading the created and edited application. It also uses the Property Interface for iterating through the object hierarchy, displays the properties for the user, and makes it possible to view and edit them. The Application Generator is also able to create new objects and insert them into any extendable container.

For providing the list of available object types, the Application Generator has to access the Type Info records of the application and display the types in a list. A container probably cannot store any kind of object; therefore, the list must be filtered for the element type and its descendants. The list of types can be displayed as a tree of types regarding the object hierarchy.

The user interface displays a tree representation of the object hierarchy. This is a common representation, but if the Application Generator knows more about the objects and the meaning of their properties, much more sophisticated representations can be provided. For example, graphical objects (windows, buttons, input lines) can be represented in a dialog editor, or some relation of objects represented by pointers can be displayed graphically. These are just some simple examples of the infinite possibilities.

The Application Generator saves the application to a file, and the file can be loaded by the run-time system. This way, the developed application becomes independent of the Application Generator.

Conclusions

The first part introduced the Run Time Type Identification system implemented by the Property Library. All the requirements are collected and discussed in detail. It was shown how application data could be saved and loaded by the Property and Stream Library without knowing anything about the application’s classes themselves.

The second part of the article will describe how the Property Library is implemented, and the third part will describe the Stream Library. The following parts of the article go into the details, and describe many programming tricks used for getting a clear system. The reader will need C++ programming knowledge to understand it.

Part II

http://www.rcs.hu/Articles/RTTI_Part2.htm

Part III

Part III will be posted later.

References

Bjarne Stroustrup: The C++ Programming Language Special Edition, AT&T, 2000.
Paul Jakubik: Callback Implementations in C++, http://www.primenet.com/~jakubik/callback.html
Vladimir Batov: Persistency Made Easy, C++ Report, August 12, 2002. http://www.adtmag.com/joop/crarticle.asp?ID=849 originally appeared in the August 2000 issue of the Journal of C++ Report.

Advanced Run Time Type Identification in C++’�Part I: Requirements

Abstract

RTTI Supported by C++

Applications of RTTI

Persistency

Applications of Persistency

Saving and loading an application’s data

Distributed applications

Saving and loading the current state of the application

Configuring applications

Application Generators

Requirements

Types

Base Types

Compound Types

Container Types

Interface

RTTI Description of Base Types

Description of Compound and Container Types

Property Iterators

How to Use the RTTI System

Persistency

Saving Objects

Loading Objects

References or Pointers

Default Values, and Validation

Application Generator

Conclusions

Part II

Part III

References

More by Author

News & Trends

Get the Free Newsletter!

Must Read

Advertisers

Menu

Our Brands