[SCL] XML question
pat hayes
phayes at ihmc.us
Thu Jan 22 15:06:19 CST 2004
Does anyone have a good answer to the following question? Its really
about the design principles of XML.
In writing the SCL core syntax, I had in mind that it ought to be
possible to include chunks of SCL core inside XML documents without
the XML barfing. Since XML reserves the characters '<' and '&' and
uses ' " ' for quoting, my first instinct was to simply ban these
characters from appearing anywhere in the SCL core syntax: then one
can take any piece of SCL core text, stick double-quotes on either
side of it, and plonk it down as for example an attribute value in
XHTML, and nothing breaks (this would be neat for example when
attaching SCL as markup to a web page, since the SCL would be
invisible in the HTML but visible to processors).
But what about an SCL string which might contain any character? Well,
XML allows one to include any character in *parsed* character data by
escaping the bad characters using entity references. That handles
going from SCL-in-XML back into SCL. But how about going from SCL
into SCL-in-XML? The use of the XML escaping seems to require that
any software which creates XML - for example, something which wants
to transmit some SCL text between SCL engines using SCL-in-XML -
must perform a kind of XML-unparsing step to replace every occurrence
of '<' or '&' by the entity reference.
My question boils down to this. Do I *need* to keep the surface
syntax of SCL Core "XML-safe" in the sense that it is guaranteed to
simply never contain the characters less-than or ampersand (or
double-quote, in fact) ? This can be done, but is a pain, and
requires SCL to have its own character-escaping conventions different
from XML's conventions. (It can't use the XML conventions, since then
XML itself will alter the SCL string encodings.)
Or is this being inappropriately fussy, since XML tools are already
capable of handling text which is not "XML-safe" in this way, and
automatically doing the transformations to and from the XML-escaped
forms? In which case I should just ignore XML's character
restrictions when thinking about the SCL syntax itself, and rely on
generic XML tools and conventions to faithfully handle the parsing
and coding in and out of the XML syntax.
Or, should I use the CDATA feature of XML? This seems to have been
designed for cases like this, but I have the sense that CDATA is
rarely used in XML-based conventions, and wonder if there is any good
reason why not. It rather worries me that an XML processor is
apparently allowed to remove all traces of whether a piece of text
was originally in a CDATA section or not. I would like XML to
transmit any SCL-in-XML faithfully, and if XML parsers may remove
some of the critical encoding information then this seems to
introduce some fragility into the transaction.
Im sure that the XML community has come to an agreement on a suitable
best practice to follow in a case like this, and would appreciate
any guidance or input.
Pat
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32501 (850)291 0667 cell
phayes at ihmc.us http://www.ihmc.us/users/phayes
More information about the SCL
mailing list