Writing Code in a Natural Way with C++/CLI

Introduction

This article is based on a beta release and the content can be prone to changes. It is targeted to present some of the changes in the managed extensions for the C++ language, especially in the syntax of most important elements.

CLI, which stands for Common Language Infrastructure, is a multi-tiered architecture supporting a dynamic component programming model. It’s a specification on which the Microsoft .NET is based. The Microsoft implementation of the CLI is the CLR (the Common Language Runtime). The C++/CLI refers to a binding between C++ and CLI, that is meant to provide C++ support for the CLI. Visual C++ 2005 is the Microsoft implementation of the C++/CLI. C++/CLI is currently pending for standardization under European Computer Manufacturers Association (ECMA), since 2003, and the latest working draft was issued in August 2005 (and, according to Herb Sutter, it can become a standard in a short, with the final draft issued on October 11).

The language was designed to be the lowest-level language for the CLR, and on the other hand to ease the code writing, by making it as natural as C++. Nevertheless, it was intended to be used to write verifiable code that could, for instance, be hosted as an SQL stored procedure. The result was a more robust and developer-friendly language. The following addresses some issues that are most likely to arise when migrating from Managed C++ in .NET 7.x to C++/CLI in .NET 8.0.

Before going further, you should notice that this article assumes familiarity with Managed Extensions for C++. Also, on several occasions C# code is listed, but you should not face any problems understanding the code even if you are not familiar with this language.

Keywords

In MC++, you had to use a lot of Microsoft-specific keywords, such as __gc, __nogc, __value, __property, __abstract, __sealed, __box, __pin, and so forth. In C++/CLI, these still exist, but are obsolete and have been replaced by other keywords. There are two new categories (actually only the first are considered keywords):

  • spaced keywords: these are keywords that are actually formed by the use of two keywords separated by a white space. The following are spaced keywords:

    • enum class
    • enum struct
    • interface class
    • interface struct
    • ref class
    • value classs
    • ref struct
    • value struct
    • for each

    During the translation phase 4, a white-spaced keyword is replaced by a single token.

  • context-sensitive keywords: these are not actually keywords, but identifiers that bear a special meaning within a certain context. For instance, abstract and sealed have special meaning within a virtual function declaration.

Additional keywords added to C++/CLI are nullptr and gcnew.

By default, the use of old keywords raises the C4980 warning, which is always issued as an error (but can be turned of with the /wd compile option). If you still want or need to use the old syntax, you must use the /clr:oldSyntax command line option (instead of /clr).

Accessibility

In MC++, the access specifiers are the same from C++. These can be used in pairs, in which case the more restrictive one refers to the external access (from outside the assembly) and the more permissive one to the internal access (from within the assembly). The order of the two specifiers is not important. When only one access specifier is given, the effect is the same as using the specifier twice. The same applies to C++/CLI, which, to accommodate the notion of assemblies, has introduced several new specifiers (already familiar to the C# programmers). The additional specifiers are (according the standard working draft):

  • internal: name can be used in its parent assembly. This is referred to as assembly access.
  • public protected or protected public: name can be used in its parent assembly or by types derived from the containing class. This is referred to as family or assembly access.
  • private protected or protected private: name can be used only by types derived from the containing class within its parent assembly. This is referred to as family and assembly access.

In MC++, the three member access specifiers could be use in any combination, so there are nine possible combinations. The use of the same specifier twice, such as public public, raises a warning in C++/CLI, and should be replaced with a single public. Also, the used of public private (or private public) is obsolete and the compiler warns you to replace it with internal (warning C4376).

Objects and Types

This is one of the areas with the most important (and welcomed) changes. In C++, objects (whether they are instances of classes or structs, which are basically the same in C++) can be created on the stack (a dedicated memory zone for static allocation, that expands and unwinds as functions get called and return) or on the heap (memory reserved for dynamic allocation that is under the direct control of the programmer, who is responsible for allocating and freeing it).


class Point {};

Point pt; // object on the stack

Point *pPt = new Point; // object on the native heap

Because the heap must be managed manually, often the applications leak memory because of poor heap management. In .NET, the garbage collector was introduced as a part of the runtime environment, which is responsible for managing the CLR heap (or managed heap). Developers no longer needed to take care of heap memory deallocation. The .NET framework is based on the paradigm that everything is an object, but there are two kinds of types an object can instantiate: value types and reference types. Value types are like C++ built-in types, providing efficient allocation and access. Reference types are like C++ classes, providing all the features required in an object-oriented language. In .NET, instances of value types are always allocated on the stack, whereas the instances of reference types are allocated on the managed heap.

In .NET, Point is a value type, and Process is a reference type. Looking at this code in C#:


// value type allocated on the stack
System.Drawing.Point point = new System.Drawing.Point(1, 0);

// reference type allocated on the CLR heap
System.Diagnostics.Process process = new System.Diagnostics.Process();

it’s hard to say which object in on the stack and which is on the heap unless you know how Point and Process are declared, which determines where the objects are placed. The syntax is transparent; it doesn’t make the difference.

In MC++, the above code would look like this:


// value type allocated on the stack
System::Drawing::Point point (1, 0);

// reference type allocated on the CLR heap
System::Diagnostics::Process __gc* process =
__gc new System::Diagnostics::Process();

// object on the native heap
UnmanagedFoo* foo = new UnmanagedFoo();

In this example, process (and especially in the situations when __gc can be omitted) looks like a C++ pointer, but actually it is not. That can lead to confusion, especially because being an object on the managed heap, the CG can relocate it during memory recollection. When managed and unmanaged code are mixed, you don’t know whether a pointer is actually a C++ pointer or a reference type object reference unless you know how the type object instantiates were declared. And, this is the place where C++/CLI brings order by adding handle types and null values. Handles are used to point to managed-heap objects; they are automatically updated by the garbage collector when the pointed object is moved in memory, and they can be rebound to other objects. Handles are defined using a ^ and tracking references with % (it can be said that in C++/CLI % is to ^ what in C++ & is to *).


// value type allocated on the stack
System::Drawing::Point point (1, 0);

// reference type allocated on the CLR heap
System::Diagnostics::Process^ process =
gcnew System::Diagnostics::Process();

The changes also involve the allocation operators. Operator new is no longer overloaded for both managed and unmanaged types, and is used to allocate objects only on the native heap. To allocate objects on the managed heap operator, gcnew is used.


class FooNative
{
public:
int Value;
};

ref class FooManaged
{
public:
property int Value;
};

// handle to an object on the managed heap
FooManaged^ mfoo = gcnew FooManaged();
// tracking reference to a gc-lvalue
FooManaged% rmfoo = *mfoo;

// change the object state using the handle
mfoo->Value = 10;
// access the object state using the tracking reference
Console::WriteLine(“managed foo value = {0}”, rmfoo.Value);

// pointer to an object on the native heap
FooNative* nfoo = new FooNative();
// reference to a native object
FooNative& rnfoo = *nfoo;

// change the object state using the pointer
nfoo->Value = 20;
// access the object state using the reference
Console::WriteLine(“native foo value = {0}”, rnfoo.Value);

(For the property syntax, see below.)

C++/CLI defines the nullptr, referred to as the null pointer constant, that acts as a universal null and is a compile-time expression that evaluates to zero in accordance with the C++ standard. nullptr is a literal having the null type (but objects of this type cannot be created). The null value constant can be implicitly converted to a pointer or a handle type, becoming a null pointer value or null value.


String^ str = nullptr; // handle str has the null value
char* ch = nullptr; // pointer ch has the null pointer value
if(str == nullptr)
str = gcnew String(“managed heap object”);

You can’t apply the typeid operator on nullptr, nor sizeof (because the null type doesn’t have a size), nor can it be thrown using a throw expression.

The way types are declared in C++/CLI also has changed. The __gc, __nogc, or __value keywords are obsolete and have been replaced with an adjective class form:


struct FooNativeS {}; // native struct
class FooNativeC {}; // native class

ref class FooReferenceC {}; // CLR reference type
ref struct FooReferenceS {}; // CLR reference type

value class FooValueC {}; // CLR value type
value struct FooValueS {}; // CLR value type

interface class FooInterfaceC {}; // CLR interface type
interface struct FooInterfaceS {}; // CLR interface type

enum class FooEnumC {}; // CLR enumeration type
enum struct FooEnumS {}; // CLR enumeration type

Freeing Resources

The CLR does not support the C++ concept of destructors, which are placeholders for freeing resources. Instead, a disposed patter was promoted. Types that manage resource that must be freed when objects are no longer needed have to implement the IDisposable::Dispose() method, that should be called when the resource is no longer needed. But, because this method must be called explicitly, objects that must free resources also have to implement a finalizer, which is called automatically by the garbage collector before claiming the memory allocated for the object. But, when Dispose() is called manually, the finalizer should be prevented from being called, which is done by calling CG::SuppressFinalize() in the implementation of Dispose(). A typical implementation of the dispose pattern in MC++ looks like this:


__gc class Foo: public IDisposable
{
// unmanaged resource
// managed resource

// indicates whether Dispose has been called.
bool bDisposed;
public:
Foo()
{
}

// implementation of IDisposable::Dispose()
void Dispose()
{
Dispose(true);
// make sure the GC won’t call Dispose again
GC::SuppressFinalize(this);
}

void Dispose(bool disposing)
{
// Dispose was not yet called
if(!this->bDisposed)
{
// if disposing == true, the call is made from user code
// unmanaged and managed resources can be disposed
//
// if disposing == false, the call is made from the
// finalizer
// managed resources should not be referred, only
// unmanaged resources
if (disposing)
{
// Dispose managed resources.
}

// dispose unmanaged resources
}

this->bDisposed = true;
}

// not a C++ destructor, but an implementation of the
// Object::Finalize method
~Foo()
{
Dispose(false);
}
};

The Dispose() method produces deterministic clean-up code, whereas the Finalize() produces non-deterministic behavior because it’s up to the garbage collector, not to the programmer, to decide when to call it.

All this has changed in C++/CLI. If you try to implement IDisposable directly, like this:


ref class Foo: public IDisposable
{
public:
virtual void Dispose()
{
// free resources
}
};

you get an error message, C2605: ‘void Dispose()’ : this method is reserved. Instead, in C++/CLI you have a destructor that implements the Dispose method, and a finalizer that implements Finalize().


ref class Foo
{
public:
~Foo() // destructor, implements/overrides
// IDisposable::Dispose
{
// free managed and unmanaged resources
}

!Foo() // finalizer, implements/overrides Object::Finalize
{
// free only unmanaged resources
}
};

The finalizer’s syntax is very similar to the C++ syntax for destructors, but instead of using a ~, you use an exclamation mark, !. The compiler takes care of implementing the dispose pattern, as presented above. To call the disposer/destructor, you use the delete operator.


Foo^ foo = gcnew Foo();

delete foo; // calls handle destructor

The finalizer is called automatically by the GC before reclaiming the memory.

Moreover, you can use stack semantics to create reference types objects.


Foo f; // object on the managed heap

f.DoSomething();

In this case, you don’t call delete. Because the object is actually on the managed heap, the garbage collector will call the finalizer automatically and the unmanaged resources will be freed (but this will employ non-deterministic behavior).

Properties

In .NET, properties can be viewed as object-oriented fields that promote encapsulation by hiding the internal representation of data. In C#, a property would look like this:


public int Age
{
get {return nAge;} // nAge is a private integer that
// represents the age of a person
set {nAge = value;}
}

In fact, this property is compiled into a get_Age and set_Age method in MSIL. The simple property accessors are inlined by the JIT-compiler, which means that in terms of performance, accessing a field and a property are the same.

Of course, MC++, offered support for properties also, but writing a property was a little bit cumbersome. The equivalent property in MC++ would be:


__property int get_Age() {return m_nAge;}
__property void set_Age(int value) {m_nAge = value;}

The biggest problem with this code is the lack of a strong indication that the two methods get_Age and set_Age actually belong together. In C++/CLI, this problem was solved by replacing the Microsoft specific keyword __property (which was used to indicate that a method is implementing property semantics) with the keyword property and a bind between the get and set accessors.


property int Age
{
int get() {return m_nAge;}
void set(int value) {m_nAge = value;}
}

This feels much more natural and the compiler is the one that takes care of generating the get_Age and set_Age MSIL methods (just as with the C# compiler) and the other metadata needed.

In this example, the property Age is a trivial property, it only returns or sets the value of an internal variable. For such trivial properties, the compiler can provide the necessary implementation automatically.

property int Age;

For this property, the compiler adds an int private field called <backing_store>Age and the two accessor methods, get_Age and set_Age.

And, last but not least, in C++/CLI you can specify different accessibility rights for the property accessors. For instance, the get method can be accessed by anywhere outside the assembly, but the set method only from within the assembly.


property int Age
{
public:
int get() {return m_nAge;}
protected public:
void set(int value) {m_nAge = value;}
}

Boxing and Unboxing

Sometimes, a value type object must be treated as a reference type object (for instance, to pass it to a function that takes a reference type argument). But, because value types are allocated on the stack and reference types on the heap, a mechanism to create objects on the heap from values on the heap and vice-versa was created, and called boxing. In C#, it looks transparent:


int i = 44; // value on the stack
object obj = value; // implicit boxing
int j = (int)obj; // explicit unboxing

i is allocated on the stack. What happens next is that an object is created on the heap, and the value from the stack is copied in the object on the heap. Of course, if the value of i changes to 55, the value of the object on the stack remains unchanged. Unboxing is the reversed process, when a value type is created on the stack and the value of the heap object is copied in it.

In MC++, as opposed to C#, the boxing and unboxing are done explicitly using the keyword __box for boxing and dynamic_cast or __type_cast for unboxing, or simply by dereferencing the pointer.


int i = 44;
Object* obj = __box(i); // explicit boxing
int j = (int)*obj; // explicit unboxing

In C++/CLI, boxing is done by using handles:


int i = 44;
Object^ h = i; // implicit boxing
int j = (int)h; // explicit unboxing

or


int j = safe_cast<int>(h); // explicit unboxing

Iterating over Collections

In C#, you can use a foreach statement to iterate over the elements of a collection (that implements the IEnumerable interface). Here is a very simple example:


int [] anArray = new int[5] {0, 1, 2, 3, 4};
foreach (int i in anArray)
{
Console.WriteLine(“{0}”, i);
}

Enumerators cannot be used to alter the collection. In MC++, the for each statement was not supported, but it’s another deal in C++/CLI.


#include <vector>
#include <iostream>

class FooCollection
{
std::vector<int> m_vec;
public:
double Average()
{
int total = 0;
for each(int i in m_vec)
total += i;

return total/(double)m_vec.size();
}

void Add(int n)
{
m_vec.push_back(n);
}
};

FooCollection foo;
foo.Add(1);
foo.Add(2);
foo.Add(3);
foo.Add(4);

std::cout << foo.Average() << std::endl;

The best part here is that this sample code can be compiled natively, because for each works with any STL-compliant container in native C++.

Conclusions

This article tackled some of the most important changes in the managed extensions for C++, a language that is now called C++/CLI. The new syntax is much more developer friendly. Writing code in C++/CLI feels as natural as C++, which was one of the goals of Microsoft in developing this language. Moreover, C++/CLI can be considered the most powerful language targeting the CLR. Fore more insights into the language, see the additional references.

References

To get a complete view on the C++/CLI syntax, you should download the C++/CLI Standard Working Draft.

Additional readings:

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read