Learn to use the EMC Centera SDK and the .NET wrapper being developed as an open source project to store "fixed content" on the EMC Centera storage appliance.
This article is part of a series of articles that I am writing to illustrate the use of the EMC Centera SDK and the .NET wrapper being developed as open source project to store "fixed content" on the EMC Centera storage appliance. But before I start, I want to explain what "fixed content" is and give an overview of the reasoning behind the emergence of this type of storage.
Fixed Content Definition
Fixed content is information that never changes after its creation. It's actively referenced, typically shared among users and must be retained (maintaining a copy of fixed content for a mandatory period of time) for a long period of time. Examples include: electronic documents, presentations, and e-books; rich media such as movies, videos, digital photographs, and audio files; check images and financial statements; bioinformatics, X-rays, MRIs, and CAT scans; CAD/CAM diagrams and blueprints and e-mail messages.
Examples of "Fixed Content:"
- An average enterprise (a 250-person organization) generates approximately 1.5TB of e-mails per year
- A picture archive in a large hospital may generate more than 5TB per year in digital X-rays or MRIs
- Banks are scanning millions of check images per year, requiring multiple terabytes of storage
State of the Industry
A large portion of all digital information is fixed content. It is expected that fixed content is the largest portion of digital content created by the human race in the next century, exceeding all dynamic content put together.
Also, The information life cycle drives to more fixed content. Enterprises embracing things such as email and electronic documents are increasing the need for fixed content storage exponentially. Finally, emerging regulations requiring retention (maintaining a copy of fixed content for a mandatory period of time) in the financial and healthcare industries are creating a huge need for fixed content storage and fixed content solutions.
The EMC Centera appliance is one of the appliances available in the market today to satisfy that need. Other companies such as NETApp has solutions equivalent to the Centera. But, this series of articles is specific to showing how to code using the Centera SDK.
What You Will Need to Develop Against the Appliance
- To start writing content to the Centera Appliance, you will need to have the Centera SDK. You will need to register on the EMC site to download the SDK. There are a number of versions of the SDK available for download. Use 3.1SP1 version. This link will take you to the site to download the SDK.
Note that the only way to save content on most "fixed content" storage devices is through the use of the device-propriety API(s) that the device manufacturer publishes. Some manufactures do offer an open standard (CIFS, NFS, HTTP, and WebDAV interfaces) to read/write to their own devices. But usually, you end up losing a lot of the device's power. Things like WORM (write-once-read-many) functionality or retention capabilities are usually lost with the open standards.
- You also will need the .NET wrapper for the Centera SDK. The latest version of the opensource.net project is on sourceForge. The link is http://sourceforge.net/projects/cosi-dot-net.
- You need to have access to the "Public Centera" appliances. EMC recognized that the Cenetra device is not available everywhere and did set up an appliance on the Internet that developers can develop against. The content of this appliance is purged periodically by EMC. The latest IP(s) can be found on EMC site. As of this writing the valid IP(s) are:
- EMEA1 - 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
- EMEA2 - 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168
- EMEA3 - 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206
- EMEA4 - 220.127.116.11, 18.104.22.168
- EMEA5 - 22.214.171.124, 126.96.36.199
- US1 - 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168
- US2 - 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206
- US3 - 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
- US4 - 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168
- US5 - 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206
Special Architecture Knowledge You Need
- Centera Appliance stores Content. This content is stored using an address. This content/address combination is called CAS (or content addressable storage). So you will hear/read about this term in the industry these days.
- The smallest block of data that can be stored must be housed inside a memory block the SDK calls "C-Clip." In other words, you have to create a C-Clip and place your content inside the C-Clip first. Then, you send the C-Clip to Centera to be saved. The C-Clip itself is made of two other components, the Content Descriptor File (CDF for short) and the BLOB.
- The Content Descriptor File (CDF) is an XML file that holds metadata. The CDF contains TAGS and ATTRIBUTES.
- An XML Tag in the CDF
- A user-defined name
- Example: <Application_Name>ImageStore2004</Application_Name>
The C-Clip also holds a BLOB. The BLOB is usually the content you wanted to store. BLOBs have the following characteristics:
- An XML attribute in the CDF
- A user-defined value
- Example: <My_App name= "ImageStoreServer"/>
- They hold objects stored on Centera.
- They are represented as a distinct bit sequence of the object you are trying to store.
Centera runs an OS called "CenteraStar." This OS is optimized for writing and reading the C-Clip objects.
Centera objects have metadata. The applications you develop create metadata associated with one or more objects. Then, these objects are stored independent of volume/directory information as in the image below:
Over All Process Overview
Centera's Three Modes
Centera acts like a standard magnetic storage unit. An object marked for deletion is deleted immediately.
Active retention protection ensures the availability of objects for a configurable period of time. An object marked for deletion is not deleted until the retention period passes.
Compliance plus mode
Similar to compliance mode, compliance plus mode uses retention periods. The default retention period is infinite. Unlike compliance mode, data never purges.
Benefits of the Compliance Modes
- Retention is set on the clip. This applies to all blobs that are referenced by the clip.
- You cannot delete a clip/blob when retention has not expired.
- Once retention expires, the clip is eligible for deletion.
Data deletion enhancements: Shredding
- Overwrites data multiple times with a random bit pattern.
The Centera-supplied Software Development Kit (SDK) contains C callable libraries:
You will need to create an account with EMC to download the full SDK.
Why It Is Needed
- Provides content addressing framework
- No file system and associated drawbacks
- Applications access the Centera via API calls only
A cluster is a logical CAS archive that appears to your application as a single unit. A cluster can be accessed by one or more applications via a set of node IP addresses and access profiles.
A pool is an SDK object that represents one or more clusters. Your application must OPEN a pool by providing a series of node IP addresses and access profile credentials for the desired set of clusters. The first accessible IP address in the list represents the primary cluster; subsequent IP addresses are considered the secondary clusters (assuming that they represent distinct clusters). The pool object also auto-discovers any replica clusters that are configured via the primary or secondary clusters.
The system administrator creates access profiles to applications. Profiles are a means to enforce authentications and authorization. The system administrator can determine which applications have access to a cluster and what operations they can perform. An application can only log into a Centera if a profile for that application has been created on the Centera cluster and the credentials for that profile have been made available to the application server. Once the profiles have been created on the Centera cluster, the system administrator exports the profile information to a Pool Entry Authorization (PEA) file and copies this file to the application server. The system administrator can set an environment variable that points to the PEA file or can leave it to the application to give the path to this file.
So, when you code your application, you either can ignore the PEA file and the cluster will point the SDK to the location of the PEA file to use or as a developer, your enterprise may have created specific PEA files and distributed them to the development team. At this point, you can give the full path of the PEA file in your code when opening the pool. It is important to note that for these articles, the publicly available ".PEA" profiles will be used. The files have the following naming convention:
For example, "us2_armTest2_rdqeDcwh.pea" translates to:
- Application Profile belongs to Centera Cluster US2
- Profile Test2, Advanced Retention Management (arm) enabled
- Capabilities: All enabled; please refer to the list below
- r: read
- w: write
- d: delete
- q: query
- e: exists
- D: privileged delete
- c: clip copy
- h: retention hold
- monitor: All profiles except "Profile1" are configured to enable the monitor capability
Each profile also comes enabled with a name/secret combination that corresponds to the profile name. Thus, to access a profile defined by us2_armTest2_rdqeDcwh.pea file, the application could alternatively use "name=armTest2,secret=armTest2" in the connect string.
As "Forest Gump" said in the movie with the same name, "That's all I am going to say about that."
This introduction should give you enough knowledge to be able to read the SDK and write code to use the Centera appliance. Because this article is one in a series of articles I am writing about different functionalities, each individual article will have this introduction and then will discuss the specific Cenera functionality the article will address.
How to Set Up the Development Environment
- In Visual Studio, create a new project called "AdrdProjectCentera1" as in the figure below:
Note: I am creating the project on my "E" drive in the "CAS" directory. The project name in this article is "AdrdProjectCentera1." This will create the directory structure needed by Visual Studio. The directory of interest in this solution structure is the debug directory that Visual Studio creates. In this article, the full path of the directory of interest is as follows: E:\CAS\AdrdProjectCentera1\AdrdProjectCentera1\AdrdProjectCentera1\bin\Debug. Your path will be different, depending on your project's location.
- The next step is to unzip the EMC Centera SDK files. The SDK is delivered from the EMC site as a single zipped file. The default zip file name is "3.1_SDK_Windows_gcc.zip" (as of October 13, 2007). Once the file is unzipped, a number of directories will be created. Copy the files in the "lib" directory to the "debug" directory created by Visual Studio in Step 1. The files that you will copy are "FPLibrary.dll", "fpos32.dll", "fpparser.dll", and "pai_module.dll". There is also an "FPLibrary.jar" file that exists in that "lib" directory. You do not need to copy that file. The "FPLibrary.jar" file is the Java wrapper for the "FPLibrary.dll". This ".jar" is the equivalent of the .NET wrapper that the "sourceForge" project is all about. Also, all the ".lib" files are to be used if you are developing using C or C++. Just ignore these files for this article.
- Next, download all the PEA files to be able to develop against the "public Centera." I will use the "US X" PEA files from the EMC web site (as of October 13, 2007). Make sure you copy the ".pea" files to the debug directory described in Step 1 above.
- The next step would be to unzip the .NET wrapper you downloaded from the sourceForge site. The default zipped file that you downloaded would be "FPApi.NET.zip. Once it is fully unzipped, the following directories would be created:
The zip file from sourceForge does not include the binary file of the wrapper (compiled version of the code). So, you will need to compile the code to generate the final wrapper that you will use in this article project. To do so, double-click on the "Wapper.sln" the zip file extraction created. This should start a new instance of Visual Studio and the solution should look as follows:
- Compile the solution by selecting Build->Build Solution menu options as in the next figure:
- Once the build is complete, copy the files "FPSDK.dll" and "FPSDK.pdb" that are generated as a result of the solution build to the debug directory created in Step 1.
- The final debug directory for the solution should look like this:
- The final step is to set a reference to the "FPSDK.DLL" in your solution. To do so, open the original solution you created in Step 1 (if it is not already open).
How To Retrieve the Centera Cluster Capabilities
The following screen shot is this article UI.
To actually retrieve the cluster information, you need to make the following API calls:
- Open the Centera Cluster by creating an instance of the wrapper "FPPool" object.
- Use the FPPool instance you created in the previous step to retrieve the cluster capabilities.
- Close the FPPool.
Open the Pool
To open the Centera Pool, you will need the Cluster "Connection String." This is usually an IP address, if a single Centera, or a number of IP(s) if Centera is configured as a cluster separated by commas. Also, concatenated to the IP list a "?" sign and the the full path of the ".PEA" file.
In the code associated with this article, the ".PEA" files are included in the "debug" directory.
Sample of the Connection Ctring
Retrieve the Cluster Capabilities
#region Build the String to display in the UI
strPoolInfo = ("\nPool Information" + "\n================" +
"\nCluster ID: " + myPool.ClusterID +
"\nCluster Time: " + myPool.ClusterTime +
"\nCluster Name: " + myPool.ClusterName +
"\nCentraStar software version: " + myPool.CentraStarVersion +
"\nSDK version: " + FPPool.SDKVersion +
"\nCluster Capacity (Bytes): " + myPool.Capacity +
"\nCluster Free Space (Bytes): " + myPool.FreeSpace +
"\nCluster BlobNamingSchemes : " + myPool.BlobNamingSchemes +
"\nCluster Capacity: " + myPool.Capacity.ToString() +
"\nCluster CenteraEdition: " + myPool.CenteraEdition +
"\nCluster ClipBufferSize: " + myPool.ClipBufferSize.ToString() +
"\nCluster DeleteAllowed: " + myPool.DeleteAllowed.ToString() +
"\nCluster DeletionsLogged: " + myPool.DeletionsLogged.ToString() +
"\nCluster ExistsAllowed: " + myPool.ExistsAllowed.ToString() +
"\nCluster QueryAllowed: " + myPool.QueryAllowed.ToString() +
"\nCluster RetentionDefault: " + myPool.RetentionDefault.ToString()+
"\nCluster ReadAllowed: " + myPool.ReadAllowed.ToString()+
"\nCluster WriteAllowed: " + myPool.WriteAllowed.ToString());
Close the FPPool
In the sample included, I have opened the pool inside a using statement; therefore, when done, the FPPool will be closed. It is possible to use the following statement:
Explaining the Capabilities
- ClusterID: Unique ID of the cluster.
- ClusterTime: Time on the cluster. Note that all Centera maintain GMT time.
- ClusterName: The name given to the cluster. Most of the time, this value is never used or filled by the Centera adminitrators.
- CentraStarVersion: The version of the OS runing on the Centera.
- SDKVersion: The version of the SDK your application is using. Usually it is the version you downloaded from EMC. Note that newer versions of the SDK can talk to earlier versions of CenteraStar OS.
- Capacity: Total space on the Centera pool you are connecting to.
- FreeSpace: Total available space on the Centera pool you are connecting to.
- CenteraEdition: Is either "basic", "CE", or "CE+". Please see the Centera Modes earlier in this article.
- DeleteAllowed: Is deletion of clips allowed on this pool?
- DeletionsLogged: Is deletion logged? Usually, this is set to true for auditing purposess especially if the pool/cliuster is in "basic" mode.
- RetentionDefault: The default retention period. Most of the public Centera clusters have this value set to 00:00:00. This implies that there is no retention. In other words, C-Clips can be deleted immediately.
For all other capabilities, please see the Centera API reference guid "Centera_SDK_3.1_API_Ref_Guide.pdf" and review the FPPool_GetCapability API.
Also included in the demo code are two classes that are used to serialize the capabilities. The classes are named "AdrdCenteraClusterInfoItem" and "AdrdCenteraRetentionInfoItem" respectivaly. These classes represent most of the capabilities that you will ever use when developing against the Centera. I will use them in my next two articles on how to write to the centera and how to read from the Centera.
You can get the Microsoft or a PDF Version of this article from the following link: http://WWW.ADRDWeb.com/Centera/ListOfArticles.htm.
Points of Interest
Centera Storage and Content Addressable Storage methods