[SCL] XML question
pat hayes
phayes at ihmc.us
Thu Jan 22 20:06:12 CST 2004
>pat hayes wrote:
>>Does anyone have a good answer to the following question? Its
>>really about the design principles of XML.
>>
>>In writing the SCL core syntax, I had in mind that it ought to be
>>possible to include chunks of SCL core inside XML documents without
>>the XML barfing. Since XML reserves the characters '<' and '&' and
>>uses ' " ' for quoting, my first instinct was to simply ban these
>>characters from appearing anywhere in the SCL core syntax: then one
>>can take any piece of SCL core text, stick double-quotes on either
>>side of it, and plonk it down as for example an attribute value in
>>XHTML, and nothing breaks (this would be neat for example when
>>attaching SCL as markup to a web page, since the SCL would be
>>invisible in the HTML but visible to processors).
>
>In document content, which is called PCDATA, you have to escape '<' and
>'&' (and you should escape '>' too in practice). In attribute values,
>which are likely to be declared CDATA, you just escape all markup
>characters, including the single and double quotes.
>
>If we know beforehand that SCL will commonly appear within XML documents,
>in other words, we might consider it as a design goal, then avoiding
>those characters is probably a Very Good Idea.
Yes, that's where I started from.
>The list of XML characters
>to really avoid are '<', '>' and '&'. The single and double quotes are
>in practice not really a big deal, unless you think to put SCL content
>within attribute values.
I would indeed like to be able to do that in XHTML and get away with
it. That is a kind of subsidiary, side-issue goal rather than a
main-track goal, though.
>I generally avoid putting anything like document
>content within attribute values, so single and double quotes are usually
>not an issue. Since you mention specifically the idea of putting SCL
>within an HTML document as attribute values, you're taking a fairly big
>risk, in that a misplaced quote will do an unending amount of harm.
Well, ANY misplaced illegal character will do a lot of harm, right? A
misplaced " can render any piece of XML illegal. So what's special
here?
>The
>skies will fall (and worse still, inconsistently, different on different
>browsers on different platforms).
Ive tried it on 5 browsers and 2 platforms and it seems to be work
OK. They are all post-2000, but I don't really care about old stuff.
>I'm not sure under what circumstance you'd actually want to include
>SCL markup on a web page. If you did, you could escape *all* the markup
>characters, but again, you're kinda asking for trouble (not from the
>perspective of the specification, but from the users' difficulties,
>i.e., in the real world).
Well, maybe. I'd still like to have a shot at it, if it can be done
without too much pain. The world badly need a way to put semantic
markup into Web pages without breaking browsers.
>
>>But what about an SCL string which might contain any character?
>>Well, XML allows one to include any character in *parsed* character
>>data by escaping the bad characters using entity references. That
>>handles going from SCL-in-XML back into SCL. But how about going
>>from SCL into SCL-in-XML? The use of the XML escaping seems to
>>require that any software which creates XML - for example,
>>something which wants to transmit some SCL text between SCL engines
>>using SCL-in-XML - must perform a kind of XML-unparsing step to
>>replace every occurrence of '<' or '&' by the entity reference.
>
>If you're trying to make the distinction between PCDATA and CDATA, that's
>not relevant within attribute values. In XML you can't declare an attribute
>value to be PCDATA, only CDATA.
Ah, that seems to be the key point that I had missed. OK, that seems
to settle my question, given my (multiple) ambitions: I need to avoid
all the XML markup characters. Thanks.
>When you say SCL-in-XML, do you mean some sort of XML syntax for SCL,
>such as what I've been calling XCL
I meant that, yes.
>, or do you mean escaped SCL (a non-XML
>syntax)?
>
>>My question boils down to this. Do I *need* to keep the surface
>>syntax of SCL Core "XML-safe" in the sense that it is guaranteed to
>>simply never contain the characters less-than or ampersand (or
>>double-quote, in fact) ? This can be done, but is a pain, and
>>requires SCL to have its own character-escaping conventions
>>different from XML's conventions. (It can't use the XML
>>conventions, since then XML itself will alter the SCL string
>>encodings.)
>
>As I said, this depends on whether you really want SCL to be able to
>live in attribute values, unescaped. That design decision will affect
>everything else you do, so perhaps rather than me answer in a ton of
>detail, we should iron that one out first.
OK, lets say yes.
>
>>Or is this being inappropriately fussy, since XML tools are already
>>capable of handling text which is not "XML-safe" in this way, and
>>automatically doing the transformations to and from the XML-escaped
>>forms? In which case I should just ignore XML's character
>>restrictions when thinking about the SCL syntax itself, and rely on
>>generic XML tools and conventions to faithfully handle the parsing
>>and coding in and out of the XML syntax.
>
>The XML tools (at least compliant ones) only do this kind of "handling"
>when those characters happen in the proper context. Most tools do have
>converters, but you have to also think about authoring, and honestly,
>there's a lot of people who hand author, and there's a lot of broken
>tools.
Thats what I was afraid of.
>
>>Or, should I use the CDATA feature of XML? This seems to have been
>>designed for cases like this, but I have the sense that CDATA is
>>rarely used in XML-based conventions, and wonder if there is any
>>good reason why not. It rather worries me that an XML processor is
>>apparently allowed to remove all traces of whether a piece of text
>>was originally in a CDATA section or not. I would like XML to
>>transmit any SCL-in-XML faithfully, and if XML parsers may remove
>>some of the critical encoding information then this seems to
>>introduce some fragility into the transaction.
>
>What you're talking about is called a CDATA section. It can appear
>anywhere within a document, and creates a completely opaque block
>of characters that the XML processor pays no attention to whatsoever,
>except for looking for that final ']]>' string. But while this is
>very good for escaping code, it doesn't really buy you what you're
>looking for, since from the perspective of a document structure,
>CDATA sections don't really exist -- you can't engineer them into
>a schema, for example, they're just a feature available to authors.
>If you were creating a full-blown XML markup language, you could
>create whats called an "application convention", unenforceable by
>the schema/DTD, that says, whenever you see a <foo> element, its
>content will contain a CDATA section containing a special notation.
>This isn't a good idea and isn't generally done, except as a last
>resort (IMO).
That was my sense also, but Im grateful to have it confirmed by you.
OK, I will not go there.
>
>>Im sure that the XML community has come to an agreement on a
>>suitable best practice to follow in a case like this, and would
>>appreciate any guidance or input.
>
>I think in order to answer your question better I'd need to know
>what the design goal is, really.
I have three. The main one is to be able to transmit a variety of
surface syntaxes for SCL inside XML, safely. That is, it ought to be
easy to write small pieces of code which will 'dump' some SCL surface
syntax into a standard XML format which is XML-legal and can be sent
via an XML pipe to an XML parser which will then spit out the
original SCL surface syntax. This ought to be a useful
general-purpose way to use XML to communicate SCL from one place to
another without needing to translate it or do any complicated SCL
parsing of one surface syntax into another. What I have in mind here
is something like a set of attributes which can be used specify the
syntactic form, then just enclosing the surface syntax as PCDATA text
inside suitable elements. Call this SCL-in-XML.
The second one, related to the first, is to invent something like
your XCL: that is, a 'standard' XML syntax for SCL itself, using XML
elements to exhibit the SCL syntax structure appropriately. Call this
XCL for now. This will be one of the SCL surface syntaxes, so it
ought to be possible to include XCL inside SCL-in-XML, but it also
has a special status in that it can be the 'official' way to describe
the abstract syntax, so all other surface forms are required to be
parsable as XCL. So one way, which may well be the 'official' way, to
transmit SCL is to parse your surface syntax into XCL and then send
that: possibly with some header information saying what surface form
it come form originally.
BTW, this might well be related to the XMI model of the SCL core
syntax that was completed recently. I'll get this up on the website
ASAP.
And then there is the third one, which is a minor goal to be able to
include the SCL core surface syntax (the KIF-like syntax in the
document) inside XHTML attributes without breaking a web browser.
This is kind of independent of the first two (I think) and is a
private hobbyhorse of mine.
>If you want to be able to put SCL
>into an XML attribute value, unescaped, then you want to avoid any
>markup characters if possible, including single and double quotes.
>Since that's really impractical
I don't quite see why you think it is impractical. We are defining
SCL core syntax ourselves, and its not that hard to define it so that
it doesn't use any of the XML markup characters. So, once so defined,
what is impractical? Its awkward and can be a bit of a bugger to
hand-author when you want to encode arbitrary strings, but lots of
SCL won't be using strings in any case.
>, you (as you say) begin to rely on
>XML authoring tools to escape the contents. This can be a real
>difficulty for authoring, but it certainly is the solution. Now,
>if XCL is an XML markup language in its own right, you'd just put
>the XCL into the document as XML markup, using XML Namespaces.
Right. The reason for the third goal is to be able to link the SCL to
HTML anchors; but like I say, this is a private hobbyhorse. The first
two are more important.
Pat
>The
>current generation of browsers should (in theory) be able to handle
>that, though in my experience you never know what the hell is going
>to come out of Redmond in this regard.
>
>So, in a nutshell, is there an SCL design goal to put it into XML,
>or will there be an XML markup language for SCL that would be used
>instead?
>
>Murray
>
>PS. sorry if I'm rambling tonight. I normally don't respond to
>email after I've been drinking...
>......................................................................
>Murray Altheim http://kmi.open.ac.uk/people/murray/
>Knowledge Media Institute
>The Open University, Milton Keynes, Bucks, MK7 6AA, UK .
>
> "At the Fresno event, even some of the handpicked guests expressed
> skepticism about the state selling $15 billion in bonds to balance
> the budget. A few said the state could look harder for more cuts
> to the government bureaucracy -- but nevertheless said they would
> defer to Schwarzenegger's judgment for now."
>
>http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2004/01/21/MNG7L4E7IT1.DTL
>
> Defer to Arnold's judgment?!
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32501 (850)291 0667 cell
phayes at ihmc.us http://www.ihmc.us/users/phayes
More information about the SCL
mailing list