Using “Thin Templates” to Reduce Application Size

Using “Thin Templates” to Reduce Application Size

A template itself is a very useful C++ concept that allows you to use a single declaration for a number of classes or functions with the same logic but with different types used. The primary usages of templates are type-neutral containers such as “list,” “map,” “queue,” and so on. Once coded, their logic can be applied to any object beginning from simple a “int” to the complex classes. However, templates have one big drawback: They produce large-size executables. Because each declaration of a template will produce a brand-new class or function, the original template code is multiplied numerous times across the application, producing a size boost. Of course, a simple 100-lines-size “list” container used for “int” and “char *” will increase the application size a mere 100-200 bytes, but a complex container used across an application for dozens of different objects can increase an application’s size by many kilobytes.

If it’s preferable to keep application size small, a handy technique called “Thin Templates” can be used to nearly eliminate the templates’ size-boost effect. This technique is based on the fact that most optimized compilers will generate different code only for template functions that actually use different types. For example, if the following template class is defined:


template <class T> class CTest
{
int Test1(void)
{
return 1;
}

T Test2(void)
{
return 1;
}
}

And then used two times with different types:


CTest<char> oClass1;
CTest<int> oClass2;

Only one code will be generated for function Test1(), because this function doesn’t use template argument T and contains the same code in both versions. The compiler’s ability to generate one code for the template with different types is the first and only requirement for successful use of “thin templates.” Today’s C++ compilers, such as “Microsoft C++ compiler” or “GCC,” have such ability.

The main idea of “thin templates” is to move all code that doesn’t use template arguments into separate functions so only one instance of code will be generated for them. Let’s see how it works. For example, we will use a very simple list class:


template<class T> class MyList
{
public:

explicit MyList()
{
m_poFirst=0;
m_poLast=0;
}

~MyList()
{
for(SItem *poCur=m_poFirst; poCur;)
{
SItem *poNext=poCur->m_poNext;
delete poCur;
poCur=poNext;
}
}

MyList(const MyList &); // must not be used
MyList &operator=(MyList &); // must not be used

class iterator
{
public:
iterator()
{
m_poCur=0;
}
private:
friend class MyList<T>;
void *m_poCur;
};

bool Add(const T &a_oSrc)
{
SItem *poNewItem=new SItem();
if(!poNewItem) return false;
poNewItem->m_poPrev=m_poLast ? m_poLast : 0;
poNewItem->m_poNext=0;
poNewItem->m_oData=a_oSrc;
if(m_poLast) m_poLast->m_poNext=poNewItem;
m_poLast=poNewItem;
if(!m_poFirst) m_poFirst=poNewItem;
return true;
}

T *GetFirst(MyList<T>::iterator &a_oIt)
{
a_oIt.m_poCur=(void *)m_poFirst;
return a_oIt.m_poCur ? &(m_poFirst->m_oData) : 0;
}

T *GetNext(MyList<T>::iterator &a_oIt)
{
if(!a_oIt.m_poCur) return 0;
SItem *poNext=((SItem *)a_oIt.m_poCur)->m_poNext;
if(poNext) a_oIt.m_poCur=poNext;
return poNext ? &(poNext->m_oData) : 0;
}

bool Remove(MyList<T>::iterator &a_oIt)
{
SItem *poCur=(SItem *)a_oIt.m_poCur;
if(!poCur) return false;
if(poCur->m_poPrev) poCur->m_poPrev->m_poNext=poCur->m_poNext;
else m_poFirst=poCur->m_poNext;
if(poCur->m_poNext) poCur->m_poNext->m_poPrev=poCur->m_poPrev;
else m_poLast=poCur->m_poPrev;
delete poCur;
a_oIt.m_poCur=0;
return true;
}

private:

struct SItem
{
T m_oData;
SItem *m_poNext,
*m_poPrev;
} *m_poFirst,
*m_poLast;
};

As you can see, nearly all of the functions in this simple “list” container use template argument T. But, if you look closer, you will find that this argument is used in one or two lines of code in each function. So, most of the functions can be divided to separate template-dependent code from template-independent code. For example, let’s divide the Add() function that adds an object into the list and calculates sizes in bytes of template-independent and template-dependent code in both cases:

Before using “thin templates:”


struct SItem
{
T m_oData;
SItem *m_poNext,
*m_poPrev;
} *m_poFirst,
*m_poLast;

bool Add(const T &a_oSrc)
{ // prolog, 9 bytes
SItem *poNewItem=new SItem(); // 17 bytes
if(!poNewItem) return false; // 10 bytes
poNewItem->m_poPrev=m_poLast ? m_poLast : 0; // 33 bytes
poNewItem->m_poNext=0; // 7 bytes
poNewItem->m_oData=a_oSrc; // 10 bytes
if(m_poLast) m_poLast->m_poNext=poNewItem; // 21 bytes
m_poLast=poNewItem; // 6 bytes
if(!m_poFirst) m_poFirst=poNewItem; // 18 bytes
return true; // 2 bytes
} // epilog, 4 bytes

The total function size is 137 bytes and all code is template-dependent. If we use only one template, this function will take 137 bytes of code. Two templates with different templates arguments will require 2*137=274 bytes of code.

After using “thin templates:”


struct SThinItem
{
SThinItem *m_poNext,
*m_poPrev;
} *m_poFirst,
*m_poLast;

struct SItem: public SThinItem
{
T m_oData;
};

void ThinAdd(SThinItem *a_poDst)
{ // prolog, 8 bytes
a_poDst->m_poPrev=m_poLast ? m_poLast : 0; // 33 bytes
a_poDst->m_poNext=0; // 6 bytes
if(m_poLast) m_poLast->m_poNext=a_poDst; // 20 bytes
m_poLast=a_poDst; // 9 bytes
if(!m_poFirst) m_poFirst=a_poDst; // 18 bytes
} // epilog, 4 bytes

bool Add(const T &a_oSrc)
{ // prolog, 9 bytes
SItem *poNewItem=new SItem(); // 17 bytes
if(!poNewItem) return false; // 10 bytes
poNewItem->m_oData=a_oSrc; // 11 bytes
ThinAdd(poNewItem); // 11 bytes
return true; // epilog, 4 bytes
}

The total function size is 160 bytes; that seems to be 23 bytes larger than our original function! But, the magic is that the template-dependent part, Add(), is only 62 bytes long, and main template-independent code moved to ThinAdd() is 98 bytes long. Additional bytes are due to the second function’s prolog and epilog, and actual function call. If we add a second template with different template arguments, this will increase the total code size only on 62 bytes, and total amount of code will be 222 bytes long. And, each new template will increase the total code size by only 62 bytes, instead of 137.

So, the “thin templates” method is useful if the template class is used more than once with different template arguments. There are different implementations of “thin templates.” The simplest implementation was demonstrated in the example above. Code is divided into template-dependent and template-independent sections by adding new functions for template-independent code. A better practise is to use a parent class for template-independent structures and code, and inhenerit a template class from it, adding template-dependent structures and code. For example, I will completely transform our simple template “list” so it will use the “thin template” technique:


class CListThin
{
public:

explicit CListThin()
{
m_poFirst=0;
m_poLast=0;
}

virtual ~CListThin()
{
for(SThinItem *poCur=m_poFirst; poCur;)
{
SThinItem *poNext=poCur->m_poNext;
delete poCur;
poCur=poNext;
}
}

struct SThinItem
{
SThinItem *m_poNext, // pointer to next list item
*m_poPrev; // pointer to previous list item
};

class iterator
{
public:
iterator()
{
m_poCur=0;
}
void *m_poCur;
};

void ThinAdd(SThinItem *a_poDst)
{
a_poDst->m_poPrev=m_poLast ? m_poLast : 0;
a_poDst->m_poNext=0;
if(m_poLast) m_poLast->m_poNext=a_poDst;
m_poLast=a_poDst;
if(!m_poFirst) m_poFirst=a_poDst;
}

bool Remove(CListThin::iterator &a_oIt)
{
SThinItem *poCur=(SThinItem *)a_oIt.m_poCur;
if(!poCur) return false;
if(poCur->m_poPrev) poCur->m_poPrev->m_poNext=poCur->m_poNext;
else m_poFirst=poCur->m_poNext;
if(poCur->m_poNext) poCur->m_poNext->m_poPrev=poCur->m_poPrev;
else m_poLast=poCur->m_poPrev;
delete poCur;
a_oIt.m_poCur=0;
return true;
}

protected:

SThinItem *m_poFirst,
*m_poLast;
};

template<class T> class list: public CListThin
{
public:

list()
{
}

list(const eye::list &); // must not be used
eye::list &operator=(eye::list &); // must not be used

bool Add(const T &a_oSrc)
{
SItem *poNewItem=new SItem();
if(!poNewItem) return false;
poNewItem->m_oData=a_oSrc;
ThinAdd(poNewItem);
return true;
}

T *GetFirst(eye::list<T>::iterator &a_oIt)
{
a_oIt.m_poCur=(void *)m_poFirst;
return a_oIt.m_poCur ? &(((SItem *)m_poFirst)->m_oData) : 0;
}

T *GetNext(eye::list<T>::iterator &a_oIt)
{
if(!a_oIt.m_poCur) return 0;
SItem *poNext=(SItem *)(((SItem *)a_oIt.m_poCur)->m_poNext);
if(poNext) a_oIt.m_poCur=poNext;
return poNext ? &(poNext->m_oData) : 0;
}

private:

struct SItem: public CListThin::SThinItem
{ // list item description
T m_oData; // data, saved in list
};

};

Of course, in the thin simple example, using “thin templates” will give only a few hundreds bytes’ advantage if the template is used 2-3 times with different template arguments. But, in percents, it will be more then 50% code economy. If a project frequently uses complex template containers, the “thin templates” technique can greatly reduce executable size.

Note: Some compilers can be configured to automatically inline class member functions if they are called from class itself. This can eliminate “thin templates” effects. In the “Microsoft C++ compiler,” you can use the /Ob0 command-line switch to disable inline function expansion or use functions with variable number of arguments (for example – void ThinAdd(SThinItem *a_poDst, …) ); such functions can’t be inlined.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read