[SCL] XML question

pat hayes phayes at ihmc.us
Thu Jan 22 19:05:08 CST 2004


>Showing my ignorance of XML, is there not an escape character inside 
>quoted strings that can be used to have XML reserved characters 
>ignored?

Not really. XML is absolute that XML text cannot contain '<' or '&' 
AT ALL except as part of XML markup, or in a special 'Cdata' block ( 
which I don't trust since it can get eliminated by the first XML 
parser.) I can see that that this is kind of inevitable in any 
notation that uses characters to encode both text and markup.

>  If so, is not that the easier route to go rather than slaving SCL 
>to what can go in an XML string?

Well, I don't like that either, but I think its better to swallow a 
little slavery in order to get the benefits of cooperation. A bit 
like marriage, now I come to think of it.

Pat

>On Jan 22, 2004, at 4:06 PM, pat hayes wrote:
>
>>Does anyone have a good answer to the following question? Its 
>>really about the design principles of XML.
>>
>>In writing the SCL core syntax, I had in mind that it ought to be 
>>possible to include chunks of SCL core inside XML documents without 
>>the XML barfing. Since XML reserves the characters '<' and '&' and 
>>uses ' " ' for quoting, my first instinct was to simply ban these 
>>characters from appearing anywhere in the SCL core syntax: then one 
>>can take any piece of SCL core text, stick double-quotes on either 
>>side of it, and plonk it down as for example an attribute value in 
>>XHTML, and nothing breaks (this would be  neat for example when 
>>attaching SCL as markup to a web page, since the SCL would be 
>>invisible in the HTML but visible to processors).
>>
>>But what about an SCL string which might contain any character? 
>>Well, XML allows one to include any character in *parsed* character 
>>data by escaping the bad characters using entity references. That 
>>handles going from SCL-in-XML back into SCL.  But how about going 
>>from SCL into SCL-in-XML? The use of the XML escaping seems to 
>>require that any software which creates XML - for example, 
>>something which wants to transmit some SCL text between SCL engines 
>>using SCL-in-XML - must perform a kind of XML-unparsing step to 
>>replace every occurrence of '<' or '&' by the entity reference.
>>
>>My question boils down to this.  Do I *need* to keep the surface 
>>syntax of SCL Core "XML-safe" in the sense that it is guaranteed to 
>>simply never contain the characters less-than or ampersand (or 
>>double-quote, in fact) ? This can be done, but is a pain, and 
>>requires SCL to have its own character-escaping conventions 
>>different from XML's conventions. (It can't use the XML 
>>conventions, since then XML itself will alter the SCL string 
>>encodings.)
>>
>>Or is this being inappropriately fussy, since XML tools are already 
>>capable of handling text which is not "XML-safe" in this way, and 
>>automatically doing the transformations to and from the XML-escaped 
>>forms? In which case I should just ignore XML's character 
>>restrictions when thinking about the SCL syntax itself, and rely on 
>>generic XML tools and conventions to faithfully handle the parsing 
>>and coding in and out of the XML syntax.
>>
>>Or, should I use the CDATA feature of XML? This seems to have been 
>>designed for cases like this, but I have the sense that CDATA is 
>>rarely used in XML-based conventions, and wonder if there is any 
>>good reason why not. It rather worries me that an XML processor is 
>>apparently allowed to remove all traces of whether a piece of text 
>>was originally in a CDATA section or not. I would like XML to 
>>transmit any SCL-in-XML faithfully, and if XML parsers may remove 
>>some of the critical encoding information then this seems to 
>>introduce some fragility into the transaction.
>>
>>Im sure that the XML community has come to an agreement on a 
>>suitable best practice to follow in  a case like this, and would 
>>appreciate any guidance or input.
>>
>>Pat
>>
>>
>>
>>--
>>---------------------------------------------------------------------
>>IHMC	(850)434 8903 or (650)494 3973   home
>>40 South Alcaniz St.	(850)202 4416   office
>>Pensacola			(850)202 4440   fax
>>FL 32501			(850)291 0667    cell
>>phayes at ihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>_______________________________________________
>>SCL mailing list
>>SCL at philebus.tamu.edu
>>http://philebus.tamu.edu/mailman/listinfo/scl
>>
>--
>Bill Andersen (andersen at ontologyworks.com)
>Chief Scientist
>Ontology Works, Inc. (www.ontologyworks.com)
>1132 Annapolis Road, Suite 104,
>Odenton, MD 21113
>Office: 410-674-7600
>Cell: 443-858-6444
>Fax: 410-674-6075


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes at ihmc.us       http://www.ihmc.us/users/phayes



More information about the SCL mailing list