Speaking
to a rapt audience, Jeff
Deskins, a
content
management expert (see
speaker details below), drilled deeply into the current challenges many
companies face when developing, managing, publishing, and delivering
large quantities of information.
Often,
these companies produce multiple variations of highly dynamic content
for cross-media publication. This combination makes staying on top of
the revision cycles that much more frenzied.
After providing
insight into these information dilemmas, Jeff discussed and demonstrated
how XML and content management two key tools in the arsenal for
managing information help make today's demanding publishing challenges
more manageable.
So,
who's facing challenges?
Jeff mentioned
several large corporations in various industries, including aerospace,
industrial manufacturing, medical records management, and banking. To
give an idea of the scope of their publishing challenges, manufacturing
companies such as Caterpillar and John Deere manage:
- millions
of pages of documentation
- in several
languages
- with hundreds
of new pages added every day.
What
happens in traditional publication processes?
Traditional
publication is output-centric, Jeff explained. That is, we typically
create content with a specific end product in mind. For example, we
might write a training manual from scratch for a specific target audience
and format it for a specific page layout and printing process.
If the
content (all or partial) needs to be used elsewhere, say, as an audiovisual
module or for a different target audience, it then must be extracted
from the original publication and rewritten and reformatted for new
output target.
How
complicated can this become? Imagine that when a change occurs:
-
It's
replicated separately in each publication product that evolved
from it, such as in the Web version, Help version, and CD-ROM version.
- Each version
typically has its own revision cycle, timeframe, and implementors.
-
As
the number of end-product variations grows, the problem of
managing, updating, and republishing accurate content grows even faster.
That's why the process is often costly, slow, error-prone, and
out-of-date.
What
is content management?
Content
management is a mission-critical capability for a broad spectrum
of publishers from newspapers to manufacturers to research
institutions to government agencies, Jeff told us. In general, any
publisher or service organization with high-volume, data-driven, or
otherwise complex publishing requirements is either already using
some form of content management, or is considering doing so. He highlighted
some of the flavors
of content management, as follows:
File
and record management represents a more traditional data
processing
technology for managing and tracking electronic files. Such systems
rely on centralized databases to store and make accessible information
about the location and contents of files but do not make the contents
themselves directly accessible. Jeff referred to these and DMSs
as BLOB
(binary large object) management systems. Although they may be networked,
they typically operate on private networks and access to the files
usually involves a proprietary application. They typically do not
provide any specific mechanism for publishing.
DMSs,
or Document Management Systems, are essentially file management
systems that are publication oriented. Rather than tracking database
records or electronic files in general, they track, make accessible,
and may even print and distribute page-based content. Like file
management, however, they are BLOB-based
and do not make the content of the documents accessible.
CMSs,
or Content Management Systems, can parse what's inside
the BLOBs; they're XML-aware
(explained further below). Because they are XML-based, not only are
the documents accessible, but atomic units of content within the document
can also be made accessible. And, they can be universally accessed
via the Internet. CMSs include a wide range of systems, including
product data management (PDM) systems, technical documentation systems,
training systems, collaborative publishing systems, and more.
How
can XML help solve the publishing problem?
How
do you spell "XML"?
What
if there were a way to define and identify the elements of content
independent of the various ways in which it might be published?
That's
where XML comes in. XML is a text-based tagging language
with some of the characteristics of the earlier tagging languages, SGML
and HTML. Jeff gave us this breakdown of the alphabet
soup:
- SGML
(Standard Generalized Markup Language) was originally conceived as
a Rosetta
Stone
for passing electronic documents between disparate computing environments.
It was invented by a lawyer in the 1960's as a way to more easily
mark up and format legal documents.
- HTML
(Hypertext Markup Language) evolved from SGML to facilitate the sharing
of documents in competing Web browsers. It concentrates primarily
on the appearance of information.
- XML
(Xtensible Markup Language) was conceived with higher purpose. The
goal is not just to make documents electronically portable but to
make each element of content identifiable, portable and accessible
wherever it may be stored and however it might be used
independent of the concept of a page-based document.
How
does XML benefit Content Management?
XML makes
it possible to manage the elements of content that is, descriptors,
subject matter, illustrations, data, images and audio or video
within unique contextual structures, Jeff told us. Because technologies
now exist for porting this content to various publication products,
you can produce a dizzying array of outputs from a single source.
And that
means...
- When
you change the source information, you can rapidly and accurately
deploy that change in each targeted output.
- Having
built a content management system, a publisher like a newspaper could
deploy the exact same source for an article to print, to
PDF, to the Web, or to a magazine, without re-keying, re-editing
or reformatting.
- A manufacturer
could deploy parts, service or training manuals to a hundred
different customers while maintaining the content for a single
source. The content will always be current as of the publication date
no matter where or in what format it is published. Create once and
deliver many!
What
does XML do that's so special?
Jeff informed
us that the key difference between XML and SGML or HTML is the
way in which tags are used. Rather than identify the structural elements
of a document headings, sub-headings, text, font attributes,
etc. XML tags identify units of content in the context
of a topical framework. The tags are based on their meaning
within that framework, not on the structure or appearance of the output.
Other key points:
- An author
or content architect creates that framework, or Document Type Definition
(DTD). A DTD defines, in a sense, a dialect of XML for that specific
category of content. It contains the keys to interpreting the tags
used to identify elements of content and their attributes.
- With
XML, we have the means of determining what kind of information
it is, where it came from, when it was last updated, what its
metrics are, how it should be used, how to validate it and so on.
The content is now storable, searchable, processable, and pre-qualified.
- XML
therefore makes it possible to develop content that is completely
independent of the various publishing products in which it might
be used. XML also can be used with technologies such as SGML or HTML
for determining how to structure the appearance of the output.
What
are its key components?
XML
is more than just a tagging language, Jeff emphasized.
It is not only a language in itself as defined by the W3C (the World
Wide Web Consortium) but it is also a language for defining new
languages or dialects. And, it is a collection of technologies
that provide the infrastructure for content management systems.
In its
simplest form, an XML document must have two things:
1)
a Document Type Definition (DTD), which defines the tags and context
to be used, and
2)
the actual content that uses those tags to assign elements of
content in the proper syntax and hierarchy.
Jeff
elaborated that there are two types of document definitions, DTDs
and schemas.
-
A
schema, not unlike schemas used in database development, performs
a similar function but is itself an XML document. Unlike DTDs, schemas
offer extensive data typing and validation criteria for content.
They are also optimized for interacting with databases and data
structures. Although DTDs are still widely used, schemas are becoming
more and more popular.
But
wait these are just ASCII text files. How does XML actually
accomplish its magic? Surely
there is more to it!
There
are a few more pieces to the XML puzzle!
Yes,
there are a lot more! Jeff went on to say that to implement
XML, you need to validate the XML document against its DTD using
these two components:
Parsers
and processors can be standalone server-based components or may
be incorporated into XML-aware applications, he explained. They
enforce content structure and syntax, and validate content
against the DTD or schema. They also make the content elements accessible
to databases and other applications that might generate final output.
Note
that there is no means of directly publishing XML content
and for good reason, he cautioned. XML was designed to store,
manage and make content portable and accessible. It was not intended
to define a specific type of output or publication product.
That's
the whole point the separation of content from engineering
and presentation
layers. So, to publish XML, its content must be converted, or
transformed, into the appropriate output format of the target
publishing environment.
That's
where XSL (XML Style sheet Language) and XSLT (XML Style
sheet Language Transformation) come in, he said, as follows:
As you
begin to explore XML, Jeff said, you will find dozens if not hundreds
of XML-based syntaxes, document types (XML dialects) and XSLTs
already in existence. These include SVG (Scalable Vector Graphics),
XSL-FO
(XSL Format), MathML, VoiceXML, WML (Wireless ML), XQuery, XForms,
MusicML, SMIL (Synchronized Multimedia Integration Language) and XTM
(XML Topic Mapping), to name a few.
There
are a few extended XML technologies worth special mention, he indicated.
XPath, XLink, and XPointer are evolving specifications that
enable XML-aware applications to use intelligent synchronous links
and incorporate dynamic content from multiple sources.
What
tools generate or use XML?
Tools
that generate or support XML can be broadly divided into three
categories for: 1) authoring, 2) deploying, and 3) displaying
XML content, according to Jeff.
-
XML
creation and authoring: In its simplest form, you can write
XML with a simple text editor and validate it with a standalone
parser. You can also use XML authoring applications such as
AborText's Epic Editor, or a variety of others such as
XMLspy. A quick search of the Internet will produce a
very long list of XML development tools.
-
XML
deployment: A common use of XML is to generate Web pages
dynamically. Other scenarios: A Web server like
Apache equipped with an XML parser and processor can use server-side
Java applets to extract data from a data source, convert it
to XML documents, and transform it using an XSLT into XHTML
that is dynamically generated for JSP-based (Java Server Pages)
Web sites. The same content could also be formatted using XSL-FO
or a PDF processor so that it could be printed. Other XML processors
can convert content from one XML dialect to another, or convert
a format such as e-mail into a wireless data format that can
be delivered directly to a cell phone.
-
XML
presentation: Although you can't view XML directly except
as text during authoring, you can see it in browsers such as
Internet Explorer 6.x and Netscape 7.x, which have built-in
parsers and processors to display the content in real time.
XML presentation depends on the application or tool that is
hosting the final output. For example, if a DTD compliant XML
file is opened in InDesign, and the InDesign file has been configured
with matching tags that correspond to InDesign formatting styles,
the XML content will appear as printable PostScript.
How
can you start using XML?
Jeff
clarified that to reap the big productivity benefits from using
XML with content management, you would most likely be working in a
large organization that is creating and managing thousands
or millions of pages versus tens or hundreds. These organizations
put in a lot of planning, systems design, development, and testing
to build a full-scale implementation from scratch.
However,
XML is text-based and readily accessible to individuals at standalone
workstations using nothing more than a text editor! In the simplest
terms, he explained that the overall process might be to:
1)
Define the architecture: Identify elements of your content,
their attributes, and their hierarchical relationships.
2)
Define the syntax: Create a DTD or schema that encodes your
content architecture.
3)
Author the content: Create the actual XML documents using the
specified syntax for their respective document definitions.
(Caution: Unless you code directly in XML or use
an XML authoring tool, such as ArborText's Epic Editor, you
might not get the results you expect. XML-aware applications such
as MS Word, Adobe FrameMaker, Adobe Illustrator, and Adobe InDesign
can export XML documents with varying degrees of success. The XML-aware
tools may use proprietary DTDs or schemas that add application-specific
elements. These can lead to errors when another application tries
to read the output.)
4)
Create transforms: Identify and create the XSLTs for your target
publishing vehicles.
5)
Establish the processing environment: Define and implement an
XML-enabled application environment for deployment.
6)
Test the system: Simulate the content publication.
7)
Deploy the content: Become a content-managed publisher!
Conclusion
Jeff
presented the compelling message that XML is an extremely powerful
and essential ingredient for solving large scale content management
problems. The concepts and implementations of XML technologies offer
an intriguing glimpse into the theories and structure of information
and the mechanics of managing timely cost-effective communication
products on a grand scale. It is abundantly clear that XML is an acronym
that will find its way into common usage in any discussion involving
technical communication. It deserves the full attention of technical
communication professionals.
All
in all, XML is shaping up to become the worldwide standard
for content management and portability. For a complete reference on
current and evolving standards and specifications for XML-based markup
languages, see the W3C at www.w3.org/XML/.
A
few sources for more information
|