Introduction to Using the XML DOM from Visual C++

.

Assumptions About the Reader

This article assumes that you are familiar with the basics of what XML is and
what it can be used for. If you are new to XML, I would suggest reading one of
the many fine tutorials on the subject first and then returning to this document.

Introducing the XML Document Object Model (DOM)

The XML Document Object Model, or DOM, is a very powerful and robust programmatic
interface that not only enables you to programatically load and parse an XML file,
or document, it also can be used to traverse XML data. Using certain object in the
DOM, you can even manipulate that data and then save your changes back to the XML
document. A full and comprehensive look at all the DOM’s functionality would be impossible
in the space provided here. However, in this article, we’ll hit on the hight notes of using
the DOM to load an XML document and to iterate through its elements.

The key to understanding how to use the DOM is realizing that the DOM exposes
(through its COM interface) XML documents as a hierarchical tree of nodes.
As an example, take a look at the following sample XML document.

<?xml version="1.0"?>
<autos>
  <manufacturer name="Chevrolet">
    <make name="Corvette">
      <model>2000 Convertible</model>
      <price currency="usd">60,000</price>
      <horsePower>420</horsePower>
      <fuelCapacity units="gallons">18.5</fuelCapacity>
    </make>
  </manufacturer>
  <manufacturer name="Mazda">
    <make name="RX-7">
      <model>test model</model>
      <price currency="usd">30,000</price>
      <horsePower>350</horsePower>
      <fuelCapacity units="gallons">15.5</fuelCapacity>
    </make>
  </manufacturer>
</autos>

The DOM would interpret this document as follows:

  • <Autos> – This is a NODE_ELEMENT (more on this later) and is referred to
    as the documentElement
  • <Manufacturer>, <Make>, <Model>, <Price> <HorsePower>
    and <FuelCapacity> –
    Each one of these is also a NODE_ELEMENT. However, please note that only the top
    level NODE_ELEMENT, or root node is referred to as the documentElement.
  • currency=”usd”, units=”gallons”- When a NODE_ELEMENT contains an attribute/value pair
    like this, the value is referred to as a NODE_TEXT

As you will see shortly, there a number of COM components that part of the XML DOM.
Here’s a list of the some of the more interesting components and their purpose.

  • XMLDOMDocument – The top node of the XML document tree
  • XMLDOMNode – This represents any single node in the XML document tree.
  • XMLDOMNodeList – This is the collection of all XMLDOMNode objects
  • XMLDOMNamedNodeMap
  • – The collection of all the XML document tree attributes

Accessing IE5’s XML Support from Visual C++

I’m a firm believer in a tutorial-style, “let’s walk through the code” approach so let’s get started seeing
just what the COM can do for us by cranking up the Visual C++ development environment and writing some code to
load an XML document and navigate through its elements.

Create the Visual C++ Project

While we can do this utilizing MFC or ATL, we’ll keep things simple (for me at least 🙂 and use MFC. Therefore,
perform the following steps to create the test project and incorporate IE5 XML support into your application.

  1. Create a new Visual C++ project called XMLDOMFromVC.
  2. In the MFC AppWizard, define the project as being a dialog-based application.
  3. Once the AppWizard has completed its work, add a call to initialize OLE support by inserting a call to
    ::AfxOleInit in the application class’
    InitInstance function. Assuming you named your project the same as mine, your code should now look like
    this (with the AfOleInit call highlighted here):

    BOOL CXMLDOMFromVCApp::InitInstance()
    {
     AfxEnableControlContainer();
    
     // .. other code
    
     ::AfxOleInit();
    
     // Since the dialog has been closed, return FALSE so that we exit the
     //  application, rather than start the application's message pump.
     return FALSE;
    }
    
  4. At this point, you’ll need to import the Microsoft XML Parser typelib (OLE type library). The simplest
    way to do this is to use the C++ #import directive. Simply open your project’s
    stdafx.h
    file and add the following lines before the file’s closing #endif directive.

    #import <msxml.dll> named_guids
    using namespace MSXML;
    
  5. At this point, we can start declaring some variable to use with the DOM. Open your dialog class’ header file
    (XMLDOMFromVCDlg.h) and add the following smart pointer member
    variables where the IXMLDOMDocumentPtr is the pointer to the XML document itself and the
    IXMLDOMElement is a pointer to the XML document root (as explained above).

    IXMLDOMDocumentPtr m_plDomDocument;
    IXMLDOMElementPtr m_pDocRoot;
    
  6. Once you’ve declared the XML smart pointers, insert the following code in your dialog class’
    OnInitDialog member function (just before the return statement). This code
    simply initializes the COM runtime and sets up your XML document smart pointer (m_plDomDocument).

    // Initialize COM
    ::CoInitialize(NULL);
    
    HRESULT hr = m_plDomDocument.CreateInstance(CLSID_DOMDocument);
    if (FAILED(hr))
    {
     _com_error er(hr);
     AfxMessageBox(er.ErrorMessage());
     EndDialog(1);
    }
    

Loading an XML Document

Now that you’ve done the preliminary work for include XML support into your Visual C++ applications, let’s
do something useful like actually loading an XML document. To do that, simply add the following code to your
dialog (just after the initialization code entered above). I’ve sprinkled comments through the code to explain what I’m doing each step of the way. I would recommend
putting this code into your dialog’s OnInitDialog member function.

// specify xml file name
CString strFileName ("XMLDOMFromVC.xml");

// convert xml file name string to something COM can handle (BSTR)
_bstr_t bstrFileName;
bstrFileName = strFileName.AllocSysString();

// call the IXMLDOMDocumentPtr's load function to load the XML document
variant_t vResult;
vResult = m_plDomDocument->load(bstrFileName);
if (((bool)vResult) == TRUE) // success!
{
 // now that the document is loaded, we need to initialize the root pointer
 m_pDocRoot = m_plDomDocument->documentElement;
 AfxMessageBox("Document loaded successfully!");
}
else
{
 AfxMessageBox("Document FAILED to load!");
}

Don’t believe it’s that easy? Add the following call to have the contents of your entire XML document displayed
in a message box.

AfxMessageBox(m_plDomDocument->xml);

Now, build and run the application and you should see results similar to Figure 1.




Loading and displaying an XML document can be done from Visual C++ with just a few lines of code using the DOM.

Ok. Ok. This doesn’t really count as reading through an XML document, but I wanted to show you that
you had successfully loaded a document and that you can easily get the entire document’s contents with a single
line of code. In the next section, we’ll see how to manually iterate through XML elements.

Iterating Through an XML Document

In this section, we’ll learn about a couple of method and properties that you’ll use quite often when iterating
through a document’s elements: IXMLDOMNodePtr::firstChild and IXMLDOMNodePtr::nextSibling.

The following reentrant function shows a way by which you can do this quite easily. In fact, if you insert
this code into the dialog’s OK button handler it will display each element in your document:

void CXMLDOMFromVCDlg::OnOK()
{
 // send the root to the DisplayChildren function
 DisplayChildren(m_pDocRoot);
}

void CXMLDOMFromVCDlg::DisplayChildren(IXMLDOMNodePtr pParent)
{
 // display the current node's name
 DisplayChild(pParent);

 // simple for loop to get all children
 for (IXMLDOMNodePtr pChild = pParent->firstChild;
      NULL != pChild;
      pChild = pChild->nextSibling)
 {
  // for each child, call this function so that we get 
  // its children as well
  DisplayChildren(pChild);
 }
}

void CXMLDOMFromVCDlg::DisplayChild(IXMLDOMNodePtr pChild)
{
 AfxMessageBox(pChild->nodeName);
}

If you were to build and run the project at this point, you would definitely notice something peculiar.
The first few message boxes will appear as you might expect. The first one displaying the value “autos”, followed by
by “manufacturerer” and then “make” and finally “model”. However, at that point (after the message box displaying
the value “Model”) things will get a little strange. Instead of a message box displaying the value “price”, the
value “#text” will be displayed! The reason for this is simple.

Let’s look at an excerpt from the XML document:

  ...
  <manufacturer name="Chevrolet">
    <make name="Corvette">
      <model>2000 Convertible</model>
      <price currency="usd">60,000</price>
      <horsePower>420</horsePower>
      <fuelCapacity units="gallons">18.5</fuelCapacity>
    </make>
  </manufacturer>
  ...

As you can see in the highlighted line above, a value succeeds the model tag, These “values” are still treated
as nodes in XML when using the IXMLDOMNodePtr::firstChild and IXMLDOMNodePtr::nextSibling
methods. Therefore, how do you know what type of node you have?

By using the IXMLDOMNodePtr::nodeType
property. Simply modify your dialog’s CXMLDOMFromVCDlg::DisplayChild member function
based on the highlighted portions below. When you’ve done that and run the code, you will see the expected
values instead of the literal “#text”.

void CXMLDOMFromVCDlg::DisplayChild(IXMLDOMNodePtr pChild)
{
 if (NODE_TEXT == pChild->nodeType)
 {
  AfxMessageBox(pChild->text);
 }
 else
 {
  AfxMessageBox(pChild->nodeName);
 }
}

You no doubt also noted the “magic” constant used above (NODE_TEXT). All the node types are defined with an
enum in the msxml.tlh file that was generated with the
#import directive you used earlier. This enum structure is listed below:

enum tagDOMNodeType
{
    NODE_INVALID = 0,
    NODE_ELEMENT = 1,
    NODE_ATTRIBUTE = 2,
    NODE_TEXT = 3,
    NODE_CDATA_SECTION = 4,
    NODE_ENTITY_REFERENCE = 5,
    NODE_ENTITY = 6,
    NODE_PROCESSING_INSTRUCTION = 7,
    NODE_COMMENT = 8,
    NODE_DOCUMENT = 9,
    NODE_DOCUMENT_TYPE = 10,
    NODE_DOCUMENT_FRAGMENT = 11,
    NODE_NOTATION = 12
};

Summary

In this article, you discovered the XML DOM and learned how to access its features from Visual C++ / COM. The demo
we built illustrated the following basic DOM functions:

  • Loading an XML document
  • Iterating through a document’s nodes
  • Determining a node’s type
  • Displaying NODE_TEXT node values

There is obviously much more to DOM than what you’ve seen here, but hopefully what you’ve learned will whet your
appetite to dig into the documenation and to see all the great things you can do with XML documents using the DOM.

Downloads

Download demo project – 15 Kb

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read