to a rapt audience, Jeff
management expert (see
speaker details below), drilled deeply into the current challenges many
companies face when developing, managing, publishing, and delivering
large quantities of information.
these companies produce multiple variations of highly dynamic content
for cross-media publication. This combination makes staying on top of
the revision cycles that much more frenzied.
insight into these information dilemmas, Jeff discussed and demonstrated
how XML and content management two key tools in the arsenal for
managing information help make today's demanding publishing challenges
who's facing challenges?
several large corporations in various industries, including aerospace,
industrial manufacturing, medical records management, and banking. To
give an idea of the scope of their publishing challenges, manufacturing
companies such as Caterpillar and John Deere manage:
of pages of documentation
- in several
- with hundreds
of new pages added every day.
happens in traditional publication processes?
publication is output-centric, Jeff explained. That is, we typically
create content with a specific end product in mind. For example, we
might write a training manual from scratch for a specific target audience
and format it for a specific page layout and printing process.
content (all or partial) needs to be used elsewhere, say, as an audiovisual
module or for a different target audience, it then must be extracted
from the original publication and rewritten and reformatted for new
complicated can this become? Imagine that when a change occurs:
replicated separately in each publication product that evolved
from it, such as in the Web version, Help version, and CD-ROM version.
- Each version
typically has its own revision cycle, timeframe, and implementors.
the number of end-product variations grows, the problem of
managing, updating, and republishing accurate content grows even faster.
That's why the process is often costly, slow, error-prone, and
is content management?
management is a mission-critical capability for a broad spectrum
of publishers from newspapers to manufacturers to research
institutions to government agencies, Jeff told us. In general, any
publisher or service organization with high-volume, data-driven, or
otherwise complex publishing requirements is either already using
some form of content management, or is considering doing so. He highlighted
some of the flavors
of content management, as follows:
and record management represents a more traditional data
technology for managing and tracking electronic files. Such systems
rely on centralized databases to store and make accessible information
about the location and contents of files but do not make the contents
themselves directly accessible. Jeff referred to these and DMSs
(binary large object) management systems. Although they may be networked,
they typically operate on private networks and access to the files
usually involves a proprietary application. They typically do not
provide any specific mechanism for publishing.
or Document Management Systems, are essentially file management
systems that are publication oriented. Rather than tracking database
records or electronic files in general, they track, make accessible,
and may even print and distribute page-based content. Like file
management, however, they are BLOB-based
and do not make the content of the documents accessible.
or Content Management Systems, can parse what's inside
the BLOBs; they're XML-aware
(explained further below). Because they are XML-based, not only are
the documents accessible, but atomic units of content within the document
can also be made accessible. And, they can be universally accessed
via the Internet. CMSs include a wide range of systems, including
product data management (PDM) systems, technical documentation systems,
training systems, collaborative publishing systems, and more.
can XML help solve the publishing problem?
do you spell "XML"?
if there were a way to define and identify the elements of content
independent of the various ways in which it might be published?
where XML comes in. XML is a text-based tagging language
with some of the characteristics of the earlier tagging languages, SGML
and HTML. Jeff gave us this breakdown of the alphabet
(Standard Generalized Markup Language) was originally conceived as
for passing electronic documents between disparate computing environments.
It was invented by a lawyer in the 1960's as a way to more easily
mark up and format legal documents.
(Hypertext Markup Language) evolved from SGML to facilitate the sharing
of documents in competing Web browsers. It concentrates primarily
on the appearance of information.
(Xtensible Markup Language) was conceived with higher purpose. The
goal is not just to make documents electronically portable but to
make each element of content identifiable, portable and accessible
wherever it may be stored and however it might be used
independent of the concept of a page-based document.
does XML benefit Content Management?
it possible to manage the elements of content that is, descriptors,
subject matter, illustrations, data, images and audio or video
within unique contextual structures, Jeff told us. Because technologies
now exist for porting this content to various publication products,
you can produce a dizzying array of outputs from a single source.
you change the source information, you can rapidly and accurately
deploy that change in each targeted output.
built a content management system, a publisher like a newspaper could
deploy the exact same source for an article to print, to
PDF, to the Web, or to a magazine, without re-keying, re-editing
- A manufacturer
could deploy parts, service or training manuals to a hundred
different customers while maintaining the content for a single
source. The content will always be current as of the publication date
no matter where or in what format it is published. Create once and
does XML do that's so special?
us that the key difference between XML and SGML or HTML is the
way in which tags are used. Rather than identify the structural elements
of a document headings, sub-headings, text, font attributes,
etc. XML tags identify units of content in the context
of a topical framework. The tags are based on their meaning
within that framework, not on the structure or appearance of the output.
Other key points:
- An author
or content architect creates that framework, or Document Type Definition
(DTD). A DTD defines, in a sense, a dialect of XML for that specific
category of content. It contains the keys to interpreting the tags
used to identify elements of content and their attributes.
XML, we have the means of determining what kind of information
it is, where it came from, when it was last updated, what its
metrics are, how it should be used, how to validate it and so on.
The content is now storable, searchable, processable, and pre-qualified.
therefore makes it possible to develop content that is completely
independent of the various publishing products in which it might
be used. XML also can be used with technologies such as SGML or HTML
for determining how to structure the appearance of the output.
are its key components?
is more than just a tagging language, Jeff emphasized.
It is not only a language in itself as defined by the W3C (the World
Wide Web Consortium) but it is also a language for defining new
languages or dialects. And, it is a collection of technologies
that provide the infrastructure for content management systems.
simplest form, an XML document must have two things:
a Document Type Definition (DTD), which defines the tags and context
to be used, and
the actual content that uses those tags to assign elements of
content in the proper syntax and hierarchy.
elaborated that there are two types of document definitions, DTDs
schema, not unlike schemas used in database development, performs
a similar function but is itself an XML document. Unlike DTDs, schemas
offer extensive data typing and validation criteria for content.
They are also optimized for interacting with databases and data
structures. Although DTDs are still widely used, schemas are becoming
more and more popular.
wait these are just ASCII text files. How does XML actually
accomplish its magic? Surely
there is more to it!
are a few more pieces to the XML puzzle!
there are a lot more! Jeff went on to say that to implement
XML, you need to validate the XML document against its DTD using
these two components:
and processors can be standalone server-based components or may
be incorporated into XML-aware applications, he explained. They
enforce content structure and syntax, and validate content
against the DTD or schema. They also make the content elements accessible
to databases and other applications that might generate final output.
that there is no means of directly publishing XML content
and for good reason, he cautioned. XML was designed to store,
manage and make content portable and accessible. It was not intended
to define a specific type of output or publication product.
the whole point the separation of content from engineering
layers. So, to publish XML, its content must be converted, or
transformed, into the appropriate output format of the target
where XSL (XML Style sheet Language) and XSLT (XML Style
sheet Language Transformation) come in, he said, as follows:
begin to explore XML, Jeff said, you will find dozens if not hundreds
of XML-based syntaxes, document types (XML dialects) and XSLTs
already in existence. These include SVG (Scalable Vector Graphics),
(XSL Format), MathML, VoiceXML, WML (Wireless ML), XQuery, XForms,
MusicML, SMIL (Synchronized Multimedia Integration Language) and XTM
(XML Topic Mapping), to name a few.
are a few extended XML technologies worth special mention, he indicated.
XPath, XLink, and XPointer are evolving specifications that
enable XML-aware applications to use intelligent synchronous links
and incorporate dynamic content from multiple sources.
tools generate or use XML?
that generate or support XML can be broadly divided into three
categories for: 1) authoring, 2) deploying, and 3) displaying
XML content, according to Jeff.
creation and authoring: In its simplest form, you can write
XML with a simple text editor and validate it with a standalone
parser. You can also use XML authoring applications such as
AborText's Epic Editor, or a variety of others such as
XMLspy. A quick search of the Internet will produce a
very long list of XML development tools.
deployment: A common use of XML is to generate Web pages
dynamically. Other scenarios: A Web server like
Apache equipped with an XML parser and processor can use server-side
Java applets to extract data from a data source, convert it
to XML documents, and transform it using an XSLT into XHTML
that is dynamically generated for JSP-based (Java Server Pages)
Web sites. The same content could also be formatted using XSL-FO
or a PDF processor so that it could be printed. Other XML processors
can convert content from one XML dialect to another, or convert
a format such as e-mail into a wireless data format that can
be delivered directly to a cell phone.
presentation: Although you can't view XML directly except
as text during authoring, you can see it in browsers such as
Internet Explorer 6.x and Netscape 7.x, which have built-in
parsers and processors to display the content in real time.
XML presentation depends on the application or tool that is
hosting the final output. For example, if a DTD compliant XML
file is opened in InDesign, and the InDesign file has been configured
with matching tags that correspond to InDesign formatting styles,
the XML content will appear as printable PostScript.
can you start using XML?
clarified that to reap the big productivity benefits from using
XML with content management, you would most likely be working in a
large organization that is creating and managing thousands
or millions of pages versus tens or hundreds. These organizations
put in a lot of planning, systems design, development, and testing
to build a full-scale implementation from scratch.
XML is text-based and readily accessible to individuals at standalone
workstations using nothing more than a text editor! In the simplest
terms, he explained that the overall process might be to:
Define the architecture: Identify elements of your content,
their attributes, and their hierarchical relationships.
Define the syntax: Create a DTD or schema that encodes your
Author the content: Create the actual XML documents using the
specified syntax for their respective document definitions.
(Caution: Unless you code directly in XML or use
an XML authoring tool, such as ArborText's Epic Editor, you
might not get the results you expect. XML-aware applications such
as MS Word, Adobe FrameMaker, Adobe Illustrator, and Adobe InDesign
can export XML documents with varying degrees of success. The XML-aware
tools may use proprietary DTDs or schemas that add application-specific
elements. These can lead to errors when another application tries
to read the output.)
Create transforms: Identify and create the XSLTs for your target
Establish the processing environment: Define and implement an
XML-enabled application environment for deployment.
Test the system: Simulate the content publication.
Deploy the content: Become a content-managed publisher!
presented the compelling message that XML is an extremely powerful
and essential ingredient for solving large scale content management
problems. The concepts and implementations of XML technologies offer
an intriguing glimpse into the theories and structure of information
and the mechanics of managing timely cost-effective communication
products on a grand scale. It is abundantly clear that XML is an acronym
that will find its way into common usage in any discussion involving
technical communication. It deserves the full attention of technical
in all, XML is shaping up to become the worldwide standard
for content management and portability. For a complete reference on
current and evolving standards and specifications for XML-based markup
languages, see the W3C at www.w3.org/XML/.
few sources for more information