|Home|

 

February 28, 2005 Program Summary

Create Once and Deliver Many:
An Overview of XML and Content Management

Summary and photography by Bruce Mills

 

A cornucopia of information management conceptsSpeaking to a rapt audience, Jeff Deskins, a content management expert (see speaker details below), drilled deeply into the current challenges many companies face when developing, managing, publishing, and delivering large quantities of information.

Often, these companies produce multiple variations of highly dynamic content for cross-media publication. This combination makes staying on top of the revision cycles that much more frenzied.

After providing insight into these information dilemmas, Jeff discussed and demonstrated how XML and content management — two key tools in the arsenal for managing information — help make today's demanding publishing challenges more manageable.

So, who's facing challenges?

Jeff mentioned several large corporations in various industries, including aerospace, industrial manufacturing, medical records management, and banking. To give an idea of the scope of their publishing challenges, manufacturing companies such as Caterpillar and John Deere manage:

  • millions of pages of documentation
  • in several languages
  • with hundreds of new pages added every day.

What happens in traditional publication processes?

Traditional publication is output-centric, Jeff explained. That is, we typically create content with a specific end product in mind. For example, we might write a training manual from scratch for a specific target audience and format it for a specific page layout and printing process.

If the content (all or partial) needs to be used elsewhere, say, as an audiovisual module or for a different target audience, it then must be extracted from the original publication and rewritten and reformatted for new output target.

How complicated can this become? Imagine that when a change occurs:

  • It's usually made first in the original content.
  • It's replicated separately in each publication product that evolved from it, such as in the Web version, Help version, and CD-ROM version.
  • Each version typically has its own revision cycle, timeframe, and implementors.
  • As the number of end-product variations grows, the problem of managing, updating, and republishing accurate content grows even faster. That's why the process is often costly, slow, error-prone, and out-of-date.

What is content management?

Content management is a mission-critical capability for a broad spectrum of publishers — from newspapers to manufacturers to research institutions to government agencies, Jeff told us. In general, any publisher or service organization with high-volume, data-driven, or otherwise complex publishing requirements is either already using some form of content management, or is considering doing so. He highlighted some of the “flavors” of content management, as follows:

File and record management represents a more traditional data processing technology for managing and tracking electronic files. Such systems rely on centralized databases to store and make accessible information about the location and contents of files but do not make the contents themselves directly accessible. Jeff referred to these and DMSs as BLOB (binary large object) management systems. Although they may be networked, they typically operate on private networks and access to the files usually involves a proprietary application. They typically do not provide any specific mechanism for publishing.

DMSs, or Document Management Systems, are essentially file management systems that are publication oriented. Rather than tracking database records or electronic files in general, they track, make accessible, and may even print and distribute page-based content. Like file management, however, they are BLOB-based and do not make the content of the documents accessible.

CMSs, or Content Management Systems, can parse what's inside the BLOBs; they're XML-aware (explained further below). Because they are XML-based, not only are the documents accessible, but atomic units of content within the document can also be made accessible. And, they can be universally accessed via the Internet. CMSs include a wide range of systems, including product data management (PDM) systems, technical documentation systems, training systems, collaborative publishing systems, and more.

How can XML help solve the publishing problem?

How do you spell "XML"?

What if there were a way to define and identify the elements of content — independent of the various ways in which it might be published?

That's where XML comes in. XML is a text-based tagging language with some of the characteristics of the earlier tagging languages, SGML and HTML. Jeff gave us this breakdown of the alphabet soup”:

  • SGML (Standard Generalized Markup Language) was originally conceived as a Rosetta Stone for passing electronic documents between disparate computing environments. It was invented by a lawyer in the 1960's as a way to more easily mark up and format legal documents.
  • HTML (Hypertext Markup Language) evolved from SGML to facilitate the sharing of documents in competing Web browsers. It concentrates primarily on the appearance of information.
  • XML (Xtensible Markup Language) was conceived with higher purpose. The goal is not just to make documents electronically portable but to make each element of content identifiable, portable and accessible — wherever it may be stored and however it might be used — independent of the concept of a page-based document.

How does XML benefit Content Management?

XML makes it possible to manage the elements of content — that is, descriptors, subject matter, illustrations, data, images and audio or video — within unique contextual structures, Jeff told us. Because technologies now exist for porting this content to various publication products, you can produce a dizzying array of outputs from a single source.

And that means...

  • When you change the source information, you can rapidly and accurately deploy that change in each targeted output.
  • Having built a content management system, a publisher like a newspaper could deploy the exact same source for an article to print, to PDF, to the Web, or to a magazine, without re-keying, re-editing or reformatting.
  • A manufacturer could deploy parts, service or training manuals to a hundred different customers while maintaining the content for a single source. The content will always be current as of the publication date no matter where or in what format it is published. Create once and deliver many!

What does XML do that's so special?

Jeff informed us that the key difference between XML and SGML or HTML is the way in which tags are used. Rather than identify the structural elements of a document — headings, sub-headings, text, font attributes, etc. — XML tags identify units of content in the context of a topical framework. The tags are based on their meaning within that framework, not on the structure or appearance of the output. Other key points:

  • An author or content architect creates that framework, or Document Type Definition (DTD). A DTD defines, in a sense, a dialect of XML for that specific category of content. It contains the keys to interpreting the tags used to identify elements of content and their attributes.
  • With XML, we have the means of determining what kind of information it is, where it came from, when it was last updated, what its metrics are, how it should be used, how to validate it and so on. The content is now storable, searchable, processable, and pre-qualified.
  • XML therefore makes it possible to develop content that is completely independent of the various publishing products in which it might be used. XML also can be used with technologies such as SGML or HTML for determining how to structure the appearance of the output.

What are its key components?

XML is more than just a tagging language, Jeff emphasized. It is not only a language in itself as defined by the W3C (the World Wide Web Consortium) but it is also a language for defining new languages or dialects. And, it is a collection of technologies that provide the infrastructure for content management systems.

In its simplest form, an XML document must have two things:

1) a Document Type Definition (DTD), which defines the tags and context to be used, and

2) the actual content that uses those tags to assign elements of content in the proper syntax and hierarchy.

Jeff elaborated that there are two types of document definitions, DTDs and schemas.

  • A DTD is not an XML document but a text file with a strict syntax. It defines the tagging language to be used by documents that conform to its definition.
  • A schema, not unlike schemas used in database development, performs a similar function but is itself an XML document. Unlike DTDs, schemas offer extensive data typing and validation criteria for content. They are also optimized for interacting with databases and data structures. Although DTDs are still widely used, schemas are becoming more and more popular.

But wait — these are just ASCII text files. How does XML actually accomplish its magic? Surely there is more to it!

There are a few more pieces to the XML puzzle!

Yes, there are a lot more! Jeff went on to say that to implement XML, you need to validate the XML document against its DTD using these two components:

  • an XML parser
  • a processor

Parsers and processors can be standalone server-based components or may be incorporated into XML-aware applications, he explained. They enforce content structure and syntax, and validate content against the DTD or schema. They also make the content elements accessible to databases and other applications that might generate final output.

Note that there is no means of directly publishing XML content — and for good reason, he cautioned. XML was designed to store, manage and make content portable and accessible. It was not intended to define a specific type of output or publication product.

That's the whole point — the separation of content from engineering and presentation layers. So, to publish XML, its content must be converted, or transformed, into the appropriate output format of the target publishing environment.

That's where XSL (XML Style sheet Language) and XSLT (XML Style sheet Language Transformation) come in, he said, as follows:

  • XSL is another XML specification from the W3C that provides a structured syntax for building XML transformation documents known as XSLTs. This is not to be confused with output formatting such as CSS (Cascading Style Sheets), which comes later in the process.
  • XSLT documents tell an XML processor (or XML-aware application) how to convert the content elements of the XML source document into a syntax that the target output application understands. For example, you can use an XSL-conforming XSLT to:
    • convert an XML source document into HTML or XHTML for publication on the Internet
    • convert the same XML content to a printable PDF
    • include HTML, CSS, or style tags in the XSLT to make the converted output appear as intended.

As you begin to explore XML, Jeff said, you will find dozens if not hundreds of XML-based syntaxes, document types (XML dialects) and XSLTs already in existence. These include SVG (Scalable Vector Graphics), XSL-FO (XSL Format), MathML, VoiceXML, WML (Wireless ML), XQuery, XForms, MusicML, SMIL (Synchronized Multimedia Integration Language) and XTM (XML Topic Mapping), to name a few.

There are a few extended XML technologies worth special mention, he indicated. XPath, XLink, and XPointer are evolving specifications that enable XML-aware applications to use intelligent synchronous links and incorporate dynamic content from multiple sources.

What tools generate or use XML?

Tools that generate or support XML can be broadly divided into three categories for: 1) authoring, 2) deploying, and 3) displaying XML content, according to Jeff.

  • XML creation and authoring: In its simplest form, you can write XML with a simple text editor and validate it with a standalone parser. You can also use XML authoring applications such as AborText's Epic Editor, or a variety of others such as XMLspy. A quick search of the Internet will produce a very long list of XML development tools.
  • XML deployment: A common use of XML is to generate Web pages dynamically. Other scenarios: A Web server like Apache equipped with an XML parser and processor can use server-side Java applets to extract data from a data source, convert it to XML documents, and transform it using an XSLT into XHTML that is dynamically generated for JSP-based (Java Server Pages) Web sites. The same content could also be formatted using XSL-FO or a PDF processor so that it could be printed. Other XML processors can convert content from one XML dialect to another, or convert a format such as e-mail into a wireless data format that can be delivered directly to a cell phone.
  • XML presentation: Although you can't view XML directly except as text during authoring, you can see it in browsers such as Internet Explorer 6.x and Netscape 7.x, which have built-in parsers and processors to display the content in real time. XML presentation depends on the application or tool that is hosting the final output. For example, if a DTD compliant XML file is opened in InDesign, and the InDesign file has been configured with matching tags that correspond to InDesign formatting styles, the XML content will appear as printable PostScript.

How can you start using XML?

Jeff clarified that to reap the big productivity benefits from using XML with content management, you would most likely be working in a large organization that is creating and managing thousands or millions of pages — versus tens or hundreds. These organizations put in a lot of planning, systems design, development, and testing to build a full-scale implementation from scratch.

However, XML is text-based and readily accessible to individuals at standalone workstations using nothing more than a text editor! In the simplest terms, he explained that the overall process might be to:

1) Define the architecture: Identify elements of your content, their attributes, and their hierarchical relationships.

2) Define the syntax: Create a DTD or schema that encodes your content architecture.

3) Author the content: Create the actual XML documents using the specified syntax for their respective document definitions.

(Caution: Unless you code directly in XML or use an XML authoring tool, such as ArborText's Epic Editor, you might not get the results you expect. XML-aware applications such as MS Word, Adobe FrameMaker, Adobe Illustrator, and Adobe InDesign can export XML documents with varying degrees of success. The XML-aware tools may use proprietary DTDs or schemas that add application-specific elements. These can lead to errors when another application tries to read the output.)

4) Create transforms: Identify and create the XSLTs for your target publishing vehicles.

5) Establish the processing environment: Define and implement an XML-enabled application environment for deployment.

6) Test the system: Simulate the content publication.

7) Deploy the content: Become a content-managed publisher!

Conclusion

Jeff presented the compelling message that XML is an extremely powerful and essential ingredient for solving large scale content management problems. The concepts and implementations of XML technologies offer an intriguing glimpse into the theories and structure of information and the mechanics of managing timely cost-effective communication products on a grand scale. It is abundantly clear that XML is an acronym that will find its way into common usage in any discussion involving technical communication. It deserves the full attention of technical communication professionals.

All in all, XML is shaping up to become the worldwide standard for content management and portability. For a complete reference on current and evolving standards and specifications for XML-based markup languages, see the W3C at www.w3.org/XML/.

A few sources for more information

Jeff's presentation is available in PDF (2.7MB) at http://slostc.org/events/feb28_presentation.pdf.

Other resources include:

 
Create Once and Deliver Many:
An Overview of XML and Content Management
Date: Monday evening, February 28, 2005
Speaker:

Jeff Deskins is a Solutions Architect and Principal Consultant formerly with Arbortext, the makers of enterprise systems for creating, managing, publishing, and delivering information. Jeff transferred to the SLO STC chapter after relocating to Paso Robles from the Bay Area, but has just recently taken a new position in the Seattle area.

Description:

“Create Once and Deliver Many: An Overview of XML and Content Management gave us a look at the current challenges facing how we develop, manage, publish, and deliver large quantities or many versions of information. For example, in today's organizations, we may find ourselves needing to:

  • Collect and merge disparate content from several different sources
  • Develop and maintain content with fewer knowledgeable people
  • Format and publish several variations of the same core material
  • Facilitate sharing of information objects among different parties

After providing insight into the current challenges, Jeff discussed and demonstrate tools and technologies that help address these needs, including:

  • “Single sourcing” — creating information in a single form and delivering it in multiple formats
  • Information taxonomies, which help people convert concepts into reusable objects or chunks
  • XML (extensible markup language), which relies on information taxonomies to create highly reusable content
  • Content management tools that simplify creating, storing, and publishing information objects

 

Disclaimer, credits, copyrightGet Acrobat ReaderSome articles or linked resources may be in Portable Document Format (PDF). To download the free Acrobat Reader, click the icon at right. To install the Reader, double-click the downloaded file.

Disclaimer and credits