Taking Advantage of XML: Serialization in .NET

Welcome to the first installment of my regular column on the .NET framework. Every two weeks, I’ll discuss the core of the .NET framework—addressing topics like the runtime, cross-language compatibility, remoting, and ECMA standardization. In my first column, I discuss a couple of the ways that you can take advantage of the XML support in .NET. To be sure, XML in .NET is a large topic. There’s enough surface area here to last for at least a dozen columns, but this week we’ll just cover one of the ways you can use XML—by serializing your objects into XML documents.

XML and the Common Language Runtime

XML is everywhere in .NET. Whenever data is exchanged, XML is typically used. Whenever data is navigated, the navigation can be performed using XML. Configuration files use XML to store their data. One of the big-ticket items in .NET—Web Services—relies completely on XML and the SOAP protocol. In short, to get the most out of .NET, you’re going to need to be on friendly terms with XML.

What is XML?

First, a review of exactly what XML is.
XML is the Extensible Markup Language. XML is a subset of SGML (Standard Generalized Markup Language), an ISO-standard document markup language that has been in use for a number of
years. HTML is also a subset of SGML, but XML and HTML differ in a number of fundamental ways:

  • XML describes data, without any regard for how the information is displayed. Display of your XML content can be managed using XSL/XSLT, which will be touched on in a future column. Although XML is not meant to be read directly, it is useful to have an understanding of XML structure in order to effectively use XML. XML has rigorous rules defining the syntax of an XML document. Where HTML browsers are notoriously lax about displaying poorly written HTML, an XML parser may not parse an XML document that has syntax errors. Such a document is said to be “poorly formed,” and is not even considered an XML document. This serves to simplify the design and improve the efficiency of XML parsers, especially compared to their HTML cousins. The structure of an XML document is discussed below in the section “Basic XML Document Structure.”
  • XML can be validated against an XML schema, whichdefines the structure required for an XML document. For example, if you have an XML document that tracks vehicles, you can use a schema to require that each automobile have a VIN number. An XML document that contains entries
    without the required VIN number will fail to be validated. This greatly reduces the amount of code that you must write as a developer—you can let the XML parser handle this work for
    you.

The above properties make XML an ideal way to exchange information between loosely coupled systems. If I can send you an XML document that is validated against an XML schema, we can
reliably exchange information without regard for cross-platform considerations that often come into play when integrating diverse systems.

Basic XML Document Structure

Most XML documents are easily understood once you recognize the basic structure that XML documents share. XML documents contain data that is separated primarily into tags and
attributes, much like HTML. An example of a tag is <Temperature>, where a word enclosed by brackets. An attribute has the form, units=”Celsius”, with an attribute always located within a tag, such as:

<Temperature units=”Celsius”>78</Temperature>

XML documents describe a tree structure, with one root node that usually contains additional nodes, with each node consisting of either
additional nodes or text data:

<Locations>
   <Location>
       <Name>Goleta</Name>
       <Temperatures>
           <Temperature units="Celsius">12</Temperature>
       </Temperatures>
   </Location>
</Locations>

A typical XML document is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<Locations>
   <Location>
        <Name>Goleta</Name>
        <Temperatures>
            <Temperature units="Celsius">12</Temperature>
            <Temperature units="Fahrenheit">78</Temperature>
            <Temperature units="Kelvin">324</Temperature>
            <Temperature units="Celsius">39</Temperature>
        </Temperatures>
   </Location>
   <Location>
       <Name>Laguna</Name>
        <Temperatures>
            <Temperature units="Kelvin">458</Temperature>
            <Temperature units="Celsius">21</Temperature>
        </Temperatures>
   </Location>
</Locations>

The first line of the XML document, <?xml version=”1.0″ encoding=”UTF-8″?>, is known
as the XML declaration, which is part of the document’s prolog. Technically, the prolog is not required for a well-formed XML document, but its presence is useful, and it’s recommended by
the W3C, if for no other reason than it specifies the version of the XML specification that you’re using. The prolog can only occur once in an XML document, and it must occur before any other XML.

In the document above, the XML declaration specifies the XML version (which must be 1.0 at the time of this writing), as well as the
encoding used by the document. Most documents are UTF-8, which is a compressed version of Unicode that uses 8-bits for character representations.

The next line defines <Locations> as the root element of the document. Each XML document contains exactly one root element. A document
that contains multiple root nodes is not considered well formed, and will be rejected by an XML parser.

Moving on, the Locations node contains two <Location> elements, which each contain one <Name> element, and multiple <Temperature> elements. Loading the entire document will imply a tree structure where the root of the tree is formed by the Locations node, with the Location nodes forming branches off the main body. Each Location node has further branches for the Name and Temperature elements.

.NET Support for Serializing Objects With XML

Writing XML documents manually has a number of drawbacks. Not only is it tedious, but it’s error-prone. A more efficient and robust way to generate your XML documents is to generate them programmatically. In .NET, you can take this a step further, and serialize objects directly using XML. The output may be to a stream, or to an instance of XmlWriter or TextWriter.

In just a few lines of code, you can easily serialize most objects to and from disk files.
With a few more lines of code, you can achieve fine-grained control over the serialization process.

The .NET framework offers several ways that you can serialize objects and data. One way to take advantage of the serialization support in .NET is to use the Serializable attribute:

[Serializable]
class Sailboat
{
 [...]
}

Decorating a class with the Serializable attribute enables all class members, including private ones, to be serialized. Another method that is commonly used is to implement the ISerializable interface, which enables you to take a higher degree of control over how your object is serialized. The next column will discuss serialization using the Serialization attribute and the ISerializable interface.

There is an even simpler method that you can use to serialize your objects in .NET—you can use the XmlSerializer class to serialize your public properties and fields directly to a stream. First, a quick caveat—the XmlSerializer class will only serialize your public properties and types—it’s useful for exposing the public view of your objects in an XML document. If you need to serialize a high-fidelity view of an object including private fields, this type of serialization is not for you—you should use one of the serialization
techniques that will be discussed in the next column.

When using the XmlSerializer class, an instance of XmlSerializer is constructed, passing information about the object type to be serialized. The most common way to create an XmlSerializer object is to simply pass the type of the object to be serialized:

serializer = new XmlSerializer(myType.GetType());

Alternatively, you may want to define a default namespace for the XML document, which can be accomplished by passing the namespace as the second parameter to the constructor:

serializer = new XmlSerializer(myType.GetType(),
                  http://www.dotnetexperts.com/examples/surfreport);

Once the serializer has been created, you can serialize an object to the XML document by calling the Serialize member function in the
XmlSerializer class:

serializer.Serialize(stream, myBoat);

Serialize has six overloads in Beta 1, which allow you to serialize using the Stream, TextWriter, or XmlWriter classes to handle the output.

An example of a C# function that writes an object to a disk file in XML format is shown in the example below. If you remove the error-handling code, there are really only three lines
of code that handle the XML serialization—one line to construct an XmlSerializer, a second line to create a file stream, and a final line to serialize the object
to the stream.

public WriteToXmlFile(Sailboat aBoat, String filename)
{
    XmlSerializer serializer = null;
    FileStream    stream = null;
    try
    {
        // Create a serializer for the Sailboat type
        serializer = new XmlSerializer(aBoat.GetType());
        // Create a new writable FileStream, using the path passed
        // as a parameter.
        stream = new FileStream(args[0],
                                FileMode.Create,
                                FileAccess.Write);
        serializer.Serialize(stream, aBoat);
    }
    finally
    {
       if(stream != null)
            stream.Close();
    }
}

Note that before using the XmlSerializer class,
you’ll need to add a reference to the System.Xml.Serialization
assembly.

As discussed earlier, the XmlSerializer class only serializes public properties and fields. It can do this work without any additional work on your part, as it uses the .NET Reflection classes to determine how your object should be
serialized. Given the Sailboat class below:

public class Sailboat
{
   private String vesselName;
   private int   hullLength;
   private int   hullBeam;
   /// <summary>
   /// Default constructor required for deserialization
   /// </summary>
   public Sailboat(){}
   /// <summary>
   /// The constructor normally used to create complete
   /// sailboat instances
   /// </summary>
   /// <param name="Name">The name of the sailboat</param>
   /// <param name="Length">The boat length</param>
   /// <param name="Beam">The width of the boat</param>
   public Sailboat(String Name, int Length, int Beam)
   {
       vesselName = Name;
       hullLength = Length;
       hullBeam = Beam;
   }
   /// <summary>
   /// Property used to get/set the name of the boat.
   /// </summary>
   public String Name
   {
       get{ return vesselName; }
       set{ vesselName = value; }
   }
   /// <summary>
   /// Property used to get/set the length of the boat.
   /// </summary>
   public int Length
   {
       get{ return hullLength; }
       set{ hullLength = value; }
   }
   /// <summary>
   /// Property used to get/set the width of the boat.
   /// </summary>
   public int Beam
   {
       get{ return hullBeam; }
       set{ hullBeam = value; }
   }
}

The XmlSerializer class will generate the following XML document by default:

<?xml version="1.0"?>
<Sailboat xmlns_xsi="http://www.w3.org/1999/XMLSchema-instance"
          xmlns_xsd="http://www.w3.org/1999/XMLSchema">
  <Name>AliMack</Name>
  <Length>45</Length>
  <Beam>15</Beam>
</Sailboat>

There are a couple of things to note about the XML document that is generated for you as part of the serialization process. First, the XML declaration is inserted for you automatically. Second, the standard schema namespaces are automatically inserted, even though you don’t need them for this document. They certainly won’t
harm anything, so you can safely leave them in place.

Deserializing an XML Document

So what about the proverbial round-trip? How do you reconstitute an object from an XML document? It turns out that the code required to reconstitute an object graph from an XML document is similar to the code used to serialize the object graph, as shown below:

public static Sailboat ReadFromXmlFile(String filename)
{
   XmlSerializer serializer = null;
   FileStream    stream = null;
   Sailboat      sb = new Sailboat();
   try
   {
       // Create a serializer for the Sailboat type
       serializer = new XmlSerializer(sb.GetType());
       // Create a new readable FileStream, using the path passed as
       // a parameter
       stream = new FileStream(filename,
                               FileMode.Open,
                               FileAccess.Read);
       sb = (Sailboat)serializer.Deserialize(stream);
   }
   finally
   {
       if(stream != null)
           stream.Close();
   }
   return sb;
}

As with the serialization code presented earlier, most of the functionality is performed by just three lines of code:

  • The XmlSerializer instance is constructed, passing the object type to be deserialized as a parameter
  • The XML file is opened for reading
  • The Deserialize function is called in order to create an instance of the Sailboat class from the XML file.

Applications for Objects Serialized as XML

As you have seen, with just a few lines of code, an object can be serialized to and from an XML document. This type of pattern is ideal for configuration files where objects need to be easily serializable and possibly consumed by multiple clients.

Configuration files created in this manner are easily shared among various pieces of code in a distributed application. Since the XmlSerializer
class is part of the .NET framework, you don’t need to write special handling components to control the serialization—in fact, any language that can use the .NET framework can produce or consume objects that have been serialized in this manner. The sailboat class could easily be written in Eiffel, and seamlessly reconstituted into object instances by C# or Visual Basic clients—the sort of thing that wasn’t easily achievable less than a year ago.

Controlling the Serialization Process

As shown above, the serializing an object to and from XML can be achieved with just a few lines of code. But what if you need to exercise control over the serialization process?
Perhaps you need to specify element attributes in your XML document, or you would prefer different element names than the default names provided by default when using XmlSerializer. You can also specify namespaces for your elements, or prevent a field or property from being serialized altogether

Preventing Serialization

To prevent a field or property from being serialized, attach the XmlIgnore attribute to the class member:

[XmlIgnore]
public String Name
{
   get{ return vesselName; }
   set{ vesselName = value; }
}

Use the XmlIgnore attribute when a property or field is not required in order to deserialize an
object successfully. For example, a property that returns transient state information may have no meaning if serialized out to an XML file, and would be ignored when the object is created by deserializing the XML document.

Controlling the Serialized XML Elements

By default, the XML serializer uses the name of the public property or field as the element name in the XML document. If that name isn’t appropriate, you can use the XmlElement attribute to change the element name used in the XML document. The XmlElement attribute also allows you to specify a namespace for the element:

[XmlElement("VesselName", Namespace="http://www.codeguru.com")]
public String Name
{
   get{ return vesselName; }
   set{ vesselName = value; }
}

You usually don’t need to hand-tune the XML serialization in this way, but the capability is always there if you need it. Changing the default serialization is useful when the names used for properties or fields aren’t appropriate for an XML
document, or when you must specify a namespace due to name clashes. A similar attribute, the XmlRoot attribute, is used to define changes necessary for a root node.

Creating XML Attributes

The XML serializer generates elements for each of your public properties and fields by default.
However, there are times when a public member logically represents an attribute rather than an element. The XmlAttribute attribute is used to mark a public property or field as an XML attribute rather than an element. For example, if our Sailboat class had a property that indicated the units of measurement, the property could be serialized as an XML attribute like this:

[XmlAttribute]
public String Units
{
   get{ return measurementUnits; }
   set{ measurementUnits = value; }
}

An XML Serialization Example

An example of a Beta 1 C# application that use the XML serializer is provided with this article. The Serialization project creates an executable that allows you to serialize information about a sailboat to or from an XML file. To serialize the data to an XML file, use this command line:

serialization to <filename>

where filename is the name of the XML file that will hold the XML document.

To deserialize the XML file into a Sailboat instance, use the following command line:

serialization from <filename>

After deserialization is complete, the properties of the Sailboat object will be displayed.

ECMA Standardization

Since this is my first column, I’ll close with some information on the ECMA standardization meetings for C# and the .NET framework. Microsoft
submitted both the C# language and portions of the .NET framework to ECMA in October, 2000. Two working groups are currently defined:
TC39/TG2 is working on the C# submission, and TC39/TG3 is working on the framework. The latest public documents are available on the websites of
companies that are participating in the standardization process, including Microsoft, Toshiba, and ISE, and several others. I’m part of the .NET Experts team at ISE, and we’re hosting the documents at www.dotnetexperts.com/ecma.
If you’re working with C# and the .NET framework, you should download these free documents, and participate in the ongoing discussions on the various .NET mailing lists and newsgroups.

Future Columns

In my next column I’ll discuss how serialization works in
.NET, including the ISerializable and IFormatter interfaces, the Serializable attribute, and other serialization goodies. Also on tap are more articles on XML support in .NET and an article on .NET memory management.

About the Author

Mickey Williams is the founder of Codev Technologies, a provider of tools and consulting for Windows Developers. He is also on the staff at .NET Experts (www.dotnetexperts.com), where he teaches the .NET Framework course. He has spoken at conferences in the USA and Europe, and has written eight books on Windows programming. Mickey can be reached at mw@codevtech.com.

More by Author

Previous article
Next article

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read