Reduce Compilation Dependencies in Large Scale C++ Projects: Factory Pattern

by Zeeshan Amjad

Changes are inevitable in large projects. This article introduces some useful techniques to minimize compilation time during development.

1. Introduction

Most large-scale projects start from a small project, and gradually evolve into larger ones. The issues one might face in a large-scale project may not be very prominent when the project size is small; therefore most of the projects, which initially start small, may not handle those issues properly when its size grows. One such problem that may arise in large-scale C++ project is physical dependencies, also known as compilation dependencies, of a project. Compilation dependencies, if not managed properly, can increase the compilation time of a project unnecessarily.

Design patterns [2] are usually used to discuss the logical design of the project, but are also helpful to manage the physical design. Although prototype hierarchy [1] was the first design pattern to discuss the compilation dependencies, there were already some techniques, and idioms [3], which discuss this issue. The PImpl principle [4], also known as pointer to implementation, is also one of that, which can be said a variant of Handle/Body idiom [3]. Changes are inevitable in large projects. Here we are going to introduce some techniques, which are useful to minimize the compilation time during the development.

2. Separate Compilation

It is common practice of C++ Programmers to break the code in multiple implementation files (usually extension with .c, .cxx, .cpp, etc.), and definition files (usually extension with .h, .hxx, .hpp, etc.). It is the responsibility of the preprocessor of a language to make the contents of all the required definition files available in the implementation file before compilation.

We used to do this because we wanted to reduce the compilation time during the development as well as reuse the code written in different files. For example, if we want to develop a project, which has 10,000 lines of code, now during the development of the project or after it if we change any single line, then the compiler has to recompile all the 10,000 lines. In today's computers this might not be a big problem, but it will eventually become a nightmare when projects become larger and larger. On the other hand if we split our project into more than one file, such as 10 files each contain roughly 1000 lines, then any change in one file ideally should not affect the other files. It is very common in large-scale projects to have some general-purpose classes, which are useful in other projects too. So the natural solution to use those classes in other projects is to make the classes in separate files.

On the other hand, if we don't develop the program carefully then sometimes it is impossible to just include these two files in another project, and use it. One of the most common problems that may arise is to also include some other definition files in our project, which we might not need; and other files may also need some other files, therefore at the end we may have to include a bunch of files to just use one single class.

From a compiler prospective, an implementation file with all expended preprocessor directives is called translation unit. In other words, the translation unit is an implementation file with all the definition files included, and macro expended. If we change anything in any definition file then all the files in which this definition file is included needs to be recompiled, whether it is definition file or implementation file.

If one definition file is included in other definition file, then changes in the first definition file will alter all the files that include either first file or second file. The situation becomes even worse when a definition file is included another definition file, which includes another definition file and so on. Now changes in one file may mean that the compilation is not limited to one file only, but it may involve recompiling the whole project. This diagram shows this concept clearly.

Figure 1: Physical

It doesn't matter that our camera class does not include Point.H or ViewPort.H directly; it is included in the camera translation unit. A change in point header file will compile not only camera translation unit, but also all translation units in this example.

3. Applying Patterns to Minimize Compilation Dependencies

The above dependencies can be minimized with the help of forward decelerations [4]. However, sometimes it is impossible to use classes with only forward deceleration. Let's look at an example to better understand this. It is not unusual for a program to communicate with different databases such as Oracle, Sybase, and SQL Server, etc. at a time, and change the database at run time. To gain the maximum speed benefit, we can use the native APIs of these databases. To give the similar and polymorphic interface to the client, we make an abstract base class called Database, which contain the pure virtual functions of all required interfaces, and inherit all of the database specific classes from it. We can also keep all these classes in separate components, if necessary. Just to keep things simple, here is our database class.

class __declspec(dllexport) Database
        virtual ~Database(void);
        virtual bool OpenConnection(std::string connectionString) = 0;
        virtual void CloseConnection(void) = 0;
        virtual void ExecuteCommand(std::string command) = 0;

We inherited three classes from it for Oracle, SQL Server and Sybase implementation. Here is the code of the Oracle class; others are very similar to this.

class __declspec(dllexport) Oracle :
        public Database
        bool OpenConnection(std::string connectionString);
        void CloseConnection(void);
        void ExecuteCommand(std::string command);

In implementation of these methods, I simply display the message whose function is called. Here is our implementation.

bool Oracle::OpenConnection(std::string connectionString)
        std::cout << "Oracle::OpenConnection" << std::endl;
        return false;
void Oracle::CloseConnection(void)
        std::cout << "Oracle::CloseConnection" << std::endl;
void Oracle::ExecuteCommand(std::string command)
        std::cout << "Oracle::ExecuteCommand" << std::endl;

This is a class diagram of our classes.

No Factory Pattern
Figure 2: No Factory Pattern

In this design we have to include the definition file of a child class in the client program, because without that we won't be able to create an object of it [5]. If the client of these classes does not know in advance which database to communicate with, or wants to give this flexibility to the user, then it has to include definition files of all the child classes. Here is a simple client code to demonstrate this.

Database* pDataBase = NULL;
switch (choice)
case 1:
        pDataBase = new Oracle();
case 2:
        pDataBase = new SQLServer();
case 3:
        pDataBase = new Sybase();
if (pDataBase != NULL)
        pDataBase->OpenConnection("This is connection string");
        pDataBase->ExecuteCommand("This is command");
        delete pDataBase;

In addition, if we want to add one more database support, then we need to inherit its class from Database, and also include its definition file in the client, which results in a lot of recompilation.

We can reduce the dependencies between these classes and clients by introducing indirection. We introduce a Factory method [2] to create the object of the child classes instead of client. Now client only communicates with the factory method to create instances of the required class. We create the DatabaseFactory class with one static method CreateObject. Now it is the responsibility of this method to create the object of appropriate class and return its address. Here is the code of our factory method (CreateObject method in DatabaseFactory class).

Database* DatabaseFactory::CreateObject(int databaseType)
        if (databaseType == 1)
               return new Oracle();
        else if (databaseType == 2)
               return new SQLServer();
        else if (databaseType == 3)
               return new Sybase();
               return NULL;

Here is a class diagram of this.

Factory Pattern
Figure 3: Factory Pattern

The client of the database classes will need to create the instances appropriate database with CreateObject methods of DatabaseFactory class depending on the information passed in the form of parameters. The advantage of this technique is that the client of the database classes now needs the definition files of only two classes, i.e. DatabaseFactory and Database. Here is the client code using the factory method.

Database* pDataBase = NULL;
pDataBase = DatabaseFactory::CreateObject(choice);
if (pDataBase != NULL)
        pDataBase->OpenConnection("This is connection string");
        pDataBase->ExecuteCommand("This is command");
        delete pDataBase;

In the future, if we want to add support of one more database such as DB2, MySql, etc., then we don't need to include its definition file at client side.

With the addition of new database support the only thing we need to change is the implementation of the CreateObject function in the DatabaseFactory class. If this function is not made in-line, then it will not affect the client of the database, and reduce compilation. It is also a better practice to write the function body in the implementation file, even if it is an in-line function, to reduce the physical dependencies [6]. If performance is concerned, then this function can be declared inline explicitly. If there is any change in the implementation of the function, then compiler will only recompile that translation unit. On the other hand, the change of implementation of function means the recompilation of all the translation units that contains this definition file.

4. Conclusion

Most of the compile time dependencies can be removed with the proper use of design patterns. Design patterns are not only useful to improve the logical design of the project, but can also make the physical design of a project better to minimize the compilation time of the project. There is a rule written in "The Elements of Style", "Omit needless words" [7]. We can apply a similar rule here, "Omit needless headers".

Most of the things discussed here are used to reduce the compile time dependencies of a project. This work can be further enhanced to minimize the link time dependencies too.

5. Reference

  1. Large Scale C++ Software Design
    John Lokos
  2. Design Pattern, Elements of Reusable Object Oriented Software
    Erich Gamm, Richard Helm, Ralph Johnson, John Vlissides
  3. Advance C++ Programming Style and Idioms
    James O Coplien
  4. Exceptional C++
    Herb Sutter
  5. The C++ Programming Language 3rd edition
    Bjarne Stroustrup
  6. 6. Manage Physical Dependencies of a Project to Reduce Compilation
    Zeeshan Amjad
  7. 7. The Elements of Style
    William Strunk Jr, E.B. White, Roger Angell
This article was originally published on Thursday Jun 9th 2011
Mobile Site | Full Site