[SCL] XML question
Murray Altheim
m.altheim at open.ac.uk
Thu Jan 22 20:31:43 CST 2004
I think we've answered most of the questions, including my reply
to John, but I'll cover a few that weren't, below.
pat hayes wrote:
[...]
>>I generally avoid putting anything like document
>>content within attribute values, so single and double quotes are usually
>>not an issue. Since you mention specifically the idea of putting SCL
>>within an HTML document as attribute values, you're taking a fairly big
>>risk, in that a misplaced quote will do an unending amount of harm.
>
> Well, ANY misplaced illegal character will do a lot of harm, right? A
> misplaced " can render any piece of XML illegal. So what's special
> here?
If you consider the two places that you can put document content, within
attribute values, and within element content (i.e., between the start
and end markup tags), there are fewer practical restrictions on the latter.
You don't type ampersands that often, but single and double quotes show
up a lot. Mistakes in attribute content are hard to track, whereas mistakes
in element content are usually pretty simple to locate. A misplaced quote
in element content does no harm, since it's not interpreted.
>>The
>>skies will fall (and worse still, inconsistently, different on different
>>browsers on different platforms).
>
> Ive tried it on 5 browsers and 2 platforms and it seems to be work
> OK. They are all post-2000, but I don't really care about old stuff.
Yes, but did you try to obtain the contents of the markup from the
browser, as would be necessary in a lot of circumstances when one
wants to *use* the content. But I guarantee you, from plenty of
experience, this is not so safe as you make out. I've watch IE
versions go back and forth on not exploding, and I have no idea
what happens behind the scenes to the rest of the content when
that markup hits the browser fan.
But this is kinda a subsidiary issue.
>>I'm not sure under what circumstance you'd actually want to include
>>SCL markup on a web page. If you did, you could escape *all* the markup
>>characters, but again, you're kinda asking for trouble (not from the
>>perspective of the specification, but from the users' difficulties,
>>i.e., in the real world).
>
> Well, maybe. I'd still like to have a shot at it, if it can be done
> without too much pain. The world badly need a way to put semantic
> markup into Web pages without breaking browsers.
Then do it the right way by embedding actual semantic XML markup,
rather than something that will require a specialized parser. If
you embed an XML version of SCL, the existing browser tools that
don't choke on things they don't understand, and can accept XML,
will be able to process it. That's the way of the future, we all
hope.
[...]
>>>Im sure that the XML community has come to an agreement on a
>>>suitable best practice to follow in a case like this, and would
>>>appreciate any guidance or input.
>>
>>I think in order to answer your question better I'd need to know
>>what the design goal is, really.
>
>
> I have three. The main one is to be able to transmit a variety of
> surface syntaxes for SCL inside XML, safely. That is, it ought to be
> easy to write small pieces of code which will 'dump' some SCL surface
> syntax into a standard XML format which is XML-legal and can be sent
> via an XML pipe to an XML parser which will then spit out the
> original SCL surface syntax. This ought to be a useful
> general-purpose way to use XML to communicate SCL from one place to
> another without needing to translate it or do any complicated SCL
> parsing of one surface syntax into another. What I have in mind here
> is something like a set of attributes which can be used specify the
> syntactic form, then just enclosing the surface syntax as PCDATA text
> inside suitable elements. Call this SCL-in-XML.
Apart from escaping '<' and '&' (and while not technically necessary,
'>' also), you can put anything you like within element content. So
you'd only have to escape two or three characters if you don't try
putting things in attribute values. If you're going to put things in
attribute values, I'd recommend that tools do conversions of that
sort in both directions (into attribute values, and out of attribute
values).
> The second one, related to the first, is to invent something like
> your XCL: that is, a 'standard' XML syntax for SCL itself, using XML
> elements to exhibit the SCL syntax structure appropriately. Call this
> XCL for now. This will be one of the SCL surface syntaxes, so it
> ought to be possible to include XCL inside SCL-in-XML, but it also
> has a special status in that it can be the 'official' way to describe
> the abstract syntax, so all other surface forms are required to be
> parsable as XCL. So one way, which may well be the 'official' way, to
> transmit SCL is to parse your surface syntax into XCL and then send
> that: possibly with some header information saying what surface form
> it come form originally.
I don't exactly follow why you'd want to put XCL within an XML
document in escaped form. I'd just put it in directly as XCL markup.
I'm guessing that's what you really mean here. But as I responded
to John, if the SCL syntax is specified in EBNF grammar, you basically
get a parser for free, since there are EBNF parser generators (for
at least Java, but I'm guessing for perl, python, and probably other
languages as well).
> BTW, this might well be related to the XMI model of the SCL core
> syntax that was completed recently. I'll get this up on the website
> ASAP.
Good.
> And then there is the third one, which is a minor goal to be able to
> include the SCL core surface syntax (the KIF-like syntax in the
> document) inside XHTML attributes without breaking a web browser.
> This is kind of independent of the first two (I think) and is a
> private hobbyhorse of mine.
As I said above, this is probably not too difficult, but you'll
probably just want a blanket policy of just converting all markup
characters using tools. The possibility of making a mistake in
either hand-authoring or eyeball-interpreting a raw XML document
containing that kind of content is a bit like reading URL query
strings. There's definitely people that can do it (I can), but it's
ugly.
>>If you want to be able to put SCL
>>into an XML attribute value, unescaped, then you want to avoid any
>>markup characters if possible, including single and double quotes.
>>Since that's really impractical
>
> I don't quite see why you think it is impractical. We are defining
> SCL core syntax ourselves, and its not that hard to define it so that
> it doesn't use any of the XML markup characters. So, once so defined,
> what is impractical? Its awkward and can be a bit of a bugger to
> hand-author when you want to encode arbitrary strings, but lots of
> SCL won't be using strings in any case.
If you're going to avoid XML markup characters, it's not impractical.
It's just that defining a syntax that avoids [<>&'"] is going to make
things difficult. Why avoid single and double quotes in your SCL just
to do this? (maybe I'm not understanding you correctly here)
>>, you (as you say) begin to rely on
>>XML authoring tools to escape the contents. This can be a real
>>difficulty for authoring, but it certainly is the solution. Now,
>>if XCL is an XML markup language in its own right, you'd just put
>>the XCL into the document as XML markup, using XML Namespaces.
>
> Right. The reason for the third goal is to be able to link the SCL to
> HTML anchors; but like I say, this is a private hobbyhorse. The first
> two are more important.
There's probably better ways to do this, using the XCL syntax
instead. Do you mean link the HTML-to-SCL, or SCL-to-HTML? If
you want to be able to point from SCL/XCL to HTML anchors (say,
for purposes of obtaining documentation on an SCL entity), this
can be done in SCL proper using its own linking syntax, and in
the XCL using a proper linking syntax like XLink, or by creating
a proprietary linking syntax, like <xcl:link href="uri"/>. The
former has the advantage of having existing and well-known
semantics for simple links. XLink is unfortunately a bit broken
for complex linking structures (for reasons I can explain at some
other time if people are that interested).
I think that covers everything I can think of right now.
Murray
......................................................................
Murray Altheim http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK .
"At the Fresno event, even some of the handpicked guests expressed
skepticism about the state selling $15 billion in bonds to balance
the budget. A few said the state could look harder for more cuts
to the government bureaucracy -- but nevertheless said they would
defer to Schwarzenegger's judgment for now."
http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2004/01/21/MNG7L4E7IT1.DTL
Defer to Arnold's judgment?!
More information about the SCL
mailing list