[SCL] XML question

pat hayes phayes at ihmc.us
Thu Jan 22 15:06:19 CST 2004


Does anyone have a good answer to the following question? Its really 
about the design principles of XML.

In writing the SCL core syntax, I had in mind that it ought to be 
possible to include chunks of SCL core inside XML documents without 
the XML barfing. Since XML reserves the characters '<' and '&' and 
uses ' " ' for quoting, my first instinct was to simply ban these 
characters from appearing anywhere in the SCL core syntax: then one 
can take any piece of SCL core text, stick double-quotes on either 
side of it, and plonk it down as for example an attribute value in 
XHTML, and nothing breaks (this would be  neat for example when 
attaching SCL as markup to a web page, since the SCL would be 
invisible in the HTML but visible to processors).

But what about an SCL string which might contain any character? Well, 
XML allows one to include any character in *parsed* character data by 
escaping the bad characters using entity references. That handles 
going from SCL-in-XML back into SCL.  But how about going from SCL 
into SCL-in-XML? The use of the XML escaping seems to require that 
any software which creates XML - for example, something which wants 
to transmit some SCL text between SCL engines using SCL-in-XML - 
must perform a kind of XML-unparsing step to replace every occurrence 
of '<' or '&' by the entity reference.

My question boils down to this.  Do I *need* to keep the surface 
syntax of SCL Core "XML-safe" in the sense that it is guaranteed to 
simply never contain the characters less-than or ampersand (or 
double-quote, in fact) ? This can be done, but is a pain, and 
requires SCL to have its own character-escaping conventions different 
from XML's conventions. (It can't use the XML conventions, since then 
XML itself will alter the SCL string encodings.)

Or is this being inappropriately fussy, since XML tools are already 
capable of handling text which is not "XML-safe" in this way, and 
automatically doing the transformations to and from the XML-escaped 
forms? In which case I should just ignore XML's character 
restrictions when thinking about the SCL syntax itself, and rely on 
generic XML tools and conventions to faithfully handle the parsing 
and coding in and out of the XML syntax.

Or, should I use the CDATA feature of XML? This seems to have been 
designed for cases like this, but I have the sense that CDATA is 
rarely used in XML-based conventions, and wonder if there is any good 
reason why not. It rather worries me that an XML processor is 
apparently allowed to remove all traces of whether a piece of text 
was originally in a CDATA section or not. I would like XML to 
transmit any SCL-in-XML faithfully, and if XML parsers may remove 
some of the critical encoding information then this seems to 
introduce some fragility into the transaction.

Im sure that the XML community has come to an agreement on a suitable 
best practice to follow in  a case like this, and would appreciate 
any guidance or input.

Pat



-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes at ihmc.us       http://www.ihmc.us/users/phayes




More information about the SCL mailing list