[SCL] Re: XML syntax for CL

Murray Altheim m.altheim at open.ac.uk
Thu May 8 20:23:23 CDT 2003


pat hayes wrote:
> Guys, we have too many cooks stirring  pots.
> 
> At a telecon a few weeks ago, Tamel agreed to make a draft XML syntax 
> for CL proposal.  I thought it was understood that he would take Chris 
> M's DTD as a starting point, and maybe he did, or maybe that idea got 
> lost, Im not sure.
> 
> In the meantime, Tamel has produced a document
> 
> http://philebus.tamu.edu/pipermail/scl/2003-April/000125.html
> 
> and nobody is paying it any attention. Well, someone is:
> 
> http://www.altheim.com/specs/xcl/1.0/
> 
> Many thanks, Murray.

Well, don't thank me before we see if the result is just confusion. My
most important point in posting that was probably the Goals section. And
a few points I will hopefully make clear here.

But thanks to you Pat -- I'm happy to hear from you and am hoping that
this conversation leads to a productive and valuable result.

I'm going to now embark on a long journey to let you know why I made
some of the design decisions, why I posted the XCL spec when I did,
why I have no interest in an RDF-based syntax, and other arcane info.
I don't do this out of any particular ego (I don't think), probably
as much as having finally woken up today at 1am after not feeling
too well earlier (bad food I think). So if you can bear with me, I
hope it's worth your time.

BTW, I don't care what we call the beast. I'll use "XCL" instead of
"SCL" in this message, but currently I consider it a community project.
If in the end I want to take my toys and go home, you guys can either
use "SCL" or if the committee plans to use "XCL" I'd let you have it.
(I have the purl.org/xcl domain but I don't consider I have any more
ownership rights than anyone else).

> Clearly there is more than one way to use XML to give a syntax. Before 
> we get down to details of bugs and fixes, can we discuss the overall 
> best way to do it? I have to say, I don't much like either of the ones 
> we have at the present  but lets at least all talk to one another about 
> what we are all doing. For a start, could each of you, Chris M . and 
> Tamel, and Murray if you have time, look at what the others have done 
> and maybe mutually critique, or at least comment? Bearing in mind that 
> we have to actually get to a consensus eventually, of course.

In some ways, my proposal *is* a critique of Tamel's. There's a lot
of features in mine that are completely unnecessary, iff all you want
to do is express what Tamel's expresses. And it would be fairly easy
to create a subset of my proposal that did only that. In looking at the
19 XCL Goals you might see that some are questionable in light of CL
by itself. But in terms of an XML of CL, there's two approaches: one,
CL-in-XML, two, CL-on-the-Web. XCL is the latter in its current form,
but could be the former if you like. A real chameleon. And I can also
create a DTD that has no XML namespace prefixes, or one with. Easier
without. If you don't plan to play on the Web, and only want to express
CL in XML, then you don't need XML namespaces at all because you're
not going to be mixing it with other things, you're not going to be
posting SCL docs on the web and having other SCL docs refer to them,
you're not going to have "live" SCL documents. And that's fine. It's
a completely valid approach, just different than where I'd take XCL
if I were the captain.

In my head I can figure about three or four different syntaxes for
XCL. They could all be delivered in one or two modular DTDs:

    * CL-in-XML, no frills. No web linking, no intra-document linking,
      no PSIs, basic stuff. This is why I think you believe OWL or RDF
      could do the trick, as GXL is just graph theory in XML, just like
      RDF is. But without the namespace mess. This would satisfy XCL
      goals 1, 2, 9, 10, 12, 14, 16, 17, and 18.

    * If you go the route of RDF and create a proprietary linking and
      processing syntax (where custom specialized processors would be
      likely necessary), you could arguably add 5, 6, 7, and 8, but
      only XML processors aware of XCL would know which elements or
      attributes are links, and if you mix XCL with other namespaces
      in the future, you'll have the problem that HTML now has. The
      entry level and potential confusion could be quite high. And
      interchange would probably suffer, as an XML document could have
      all manner of features and various interpretations.

    * #1, but we add the ability to have multiple XML Namespaces, so
      XCL can be mixed with other XML syntaxes. Fine, and those other
      markup languages may have features that make XCL interesting.
      But XCL itself is still just #1. In addition to #1, by adding
      XLink we satisfy goal 4, and we can now satisfy 5, 6, 7, 8
      without creating a proprietary linking syntax. As a byproduct,
      we can now have validateable XML documents using DTDs or oter
      types of schemas that use or don't use namespace prefixes. A
      language is a lot easier to read if it doesn't, and I'm
      guessing a lot of XCL applications won't really need 'em.

    * We add XLink for linking, a subset of XHTML for documentation
      and "rich text". We also get XHTML's own link syntax within
      the XHTML namespace (so for example, Cyc's documentation could
      be stored as XHTML nodes within a larger logical framework.

    * Rather than create a new XML syntax, we use GXL 1.1. But we
      no longer have a <predicate> element, we have something like

          <edge type="http://purl.org/xcl/1.0/#pred"/>

      or perhaps simpler:

          <node id="pred" type="http://purl.org/xcl/1.0/#pred"/>

          <edge type="#pred"/>           (so we get a nice shorthand)

      [this might be an interesting and valuable project in its own
       right, and a good conversion from whatever XCL is in the end.]

> Many thanks to everyone for all the work, but we need to coordinate. 
> Unfortunately (?) this is one area where I do not feel competent to 
> adjudicate between rival criteria for adequacy.
> 
> Like most other XML syntaxes I have been obliged to deal with, these 
> proposals seem to me to be so ugly, long-winded, information-deficient, 
> unreadable and virtually impossible to use that it is hard to believe 
> that anyone would seriously suggest using them for any purpose.  Maybe 
> I'm just missing the point of XML, but I wish someone would tell me what 
> the point is supposed to be. The example used by Tamel uses 61 
> characters in a reasonable logical notation, which becomes approximately 
> 400 characters in the bare XML rendering, and would expand to something 
> closer to 800 characters when embedded in a fully decorated XML document 
> with all its surrounding scaffolding, canonical identifiers and so on. 
> The information density has decreased by about 1.3 orders of magnitude 
> and the document has become unreadable and harder to parse. Can anyone 
> tell me what has been gained by this transformation?

First off: completely, completely, completely ignore how verbose XML is.
It's irrelevant. The largest documents most people in this community are
ever going to deal with are going to be parsed in milliseconds or much
less. And in two years, microseconds. I know John Sowa feels this is a
bogus argument, but I feel arguments about verbosity are bogus in a world
where cheap computers run at over a gigahertz and a 40GB hard drive is
less than $100. I'd say most XCL documents are going to be less than
several megabytes. All of Cyc might be an issue, but how many of us
are dealing with all of Cyc at once? And in five years, we all *could*
deal with all of Cyc on a laptop, in XML, while we watch a football
game full screen too.

XML is human-readable, but *really* it's for processing by computers.
Why expect it should look like those 61 characters? Those 61 characters
could be read by a computer, but you'd not have unambiguous linking
and identification of language components, the ability to add in other
markup languages, and many other features. I could probably express
Tamel's 61 character example in about 120 characters of legal XML,
but we'd have no features, and our elements would be 1 character in
length. The computer wouldn't care.

So verbosity. What do we get? The approach I advocated in XTM (which is
verbose, more verbose than it absolutely needed to be) and what you see
in XCL is that I am deliberately making explicit things that can be
hidden in attributes.

For example, one could express the same thing in various ways:

     <quantifier type="x"/>
     <quant type="x"/>
     <q type="x"/>

 From a performance perspective, it's not much different. From a readability
or potential-confusion perspective, the first is best. XCL uses the middle
approach as being mnemonically unambiguous and saving a few keystrokes
(assuming a human would ever type XCL, which is questionable).

Now, there's a big disadvantage to the three examples above if one wants
to add attributes to the 'type' attribute. You can't. So, 'type' must
become <type>. You can take the HTML approach and have

   <a b="x" c="y" d="z"/>

or try this:

   <a>
     <b xlink:href="x"/>
     <c>y</c>
     <d xlink:href="x"/>
   </a>

and now it's pretty clear that 'b' and 'd' were links. And then you
can also add attributes to the links for behaviour or meaning. The computer
doesn't care about this difference, and absent all this whitespace making
it easier for us to read, performance wise it's immaterial.

   <a><b xlink:href="x"/><c>y</c><d xlink:href="x"/></a>

Now, if type's value of "x" (in the first example) is actually a URI
reference, i.e., it's a link, and if you want references to operate
across the XML Web and have Java XLink libraries processing XCL documents
without requiring proprietary link processors, it'll be an XLink element:

    <quant>
      <type xlink:href="x"/>
    </quant>

or

    <quant>
      <type>
        <xref xlink:href="x"/>
      </type>
    </quant>

The latter might seem like overkill until one realizes that it's
now possible to:

   * add attributes to <type>
   * include multiple links within a <type> element
   * include different linking elements within <type>
   * add an attribute to the <xref> element
   * add a <documentation> element within <quant> or <type>
   etc.

We see this in XTM:

   <topic id="dog">
     <instanceOf>
       ...
     </instanceOf>
     <subjectIdentity>
       ...
     </subjectIdentity>
   </topic>

where "..." could be one of the three XTM linking elements
(there aren't any other linking elements or attributes in XTM,
making it very easy to read and process link-wise):

   <topicRef xlink:href="x"/>
   <resourceRef xlink:href="x"/>
   <subjectIndicatorRef xlink:href="x"/>

This could also have been done as a single link element
with three possible attribute values on link type.

   <link type="rsrc" xlink:href="x"/>

This would works either way, except the way we did it in XTM we
can limit where <resourceRef> is allowed without having to check
an attribute value, a bit more foolproof.

The XTM syntax is verbose because we use many of these types of
features. We also were very modular. In HTML there's a million
little linking attributes, some elements have five or six. They
all do different things in different contexts, have different names,
and none can have attributes that control how they behave or
what they mean. Sometimes their parent elements have attributes
to control what they do, basically an attribute controlling
another attribute's behaviour. Poor design. And from my experience
on the W3C HTML Working Group, they're paying for it now, as they're
locking into that weird legacy design.

> But no doubt I should avoid such complaining, as I have found that it is 
> not productive when talking to XML enthusiasts.

John thinks I'm an XML enthusiast (I think he used stronger language,
actually), but neither of you know me very well. I could give squat
about XML. I'm for making things happen. Right now you guys work in
a small and specialized community. If ISO 13250 Topic Maps had stayed
as an SGML meta-DTD, topic maps might have only made it into NewsML
and be used by Reuters. Jon Bosak at Sun knew a good thing when he saw
one, and sent me to work with Steve Newcomb and Michel Biezunski in
creating an XML version of Topic Maps, not because we all love XML
but because we knew that the Web would provide an incredible breeding
ground for development if we could attract the web community. I think
Jon (the "father of XML") probably thinks about XML like I do: it's
just lego. A lot of people like lego and use it to build cool stuff.

I look at the 620,147 registered users on sourceforge.net and think
that if we can get 1% of them interested in working with XCL, we'd
have 6200 developers making playthings. Good and bad playthings, but
lots of attention. This has happened in the Topic Map community. We
have perhaps not 6200 developers, but I bet we have several hundred
actively playing around with XTM, more everyday. And I'm betting
that among those playing we'll see a lot of very interesting work.
And we started with a community of about 20 people with no DARPA
or W3C help. We gambled that we could find enthusiasts within the
Web community, and we have.

> Perhaps more to the point, however, is the need for us to take seriously 
> the potential need to interact with the RDF style of using XML. Consider 
> an OWL ontology encoded as an OWL-RDF graph written in RDF/XML.
> http://www.w3.org/2001/sw/WebOnt/ , see particularly 
> http://www.w3.org/TR/owl-semantics/
> The ontology itself has a natural translation into CL (similar to the 
> one in http://www.coginst.uwf.edu/~phayes/OWL2LBASE.html , though it 
> could be done slightly differently). If we were to do that translation, 
> then write the CL in XML, what, if any, relationship might there be 
> between that piece of XML and the OWL/RDF/XML (see 
> http://www.w3.org/TR/rdf-syntax-grammar/) ?

I don't claim to understand OWL, and neither do I have any interest
in working with RDF. I've had plenty of exposure, having seen early
drafts while in the W3C. Guha came to Sun and gave a bunch of us an
entire afternoon of RDF and graph theory. It honestly changed my
thinking about a lot of things, profoundly. But RDF is a different
approach to XML that I don't consider really XML. It doesn't use an
XML schema language, it uses its own. RDF users are to me religious
about RDF. I'm not religous about topic maps, I just happen to have
been intimately involved in developing XTM and am using XTM as one
of the core technologies in my Ph.D. project. I realize this is
sorta like a crazy person trying to claim he's not crazy, but I'm
just the sort of person to try. Frankly, I'm not interested in
working on an RDF-based CL syntax because there are plenty of DARPA
dollars being spent making that work, and the entire RDF community
working on making it work. They don't need (or probably want) my
help. Whether that will all work, whether it will make any sense to
anyone else, whether it'll get traction, whether it will have enormous
holes or work like a charm, it doesn't matter to me. I don't think
it will, but I'm just a voice in the wilderness.

I'm being so blunt about this because we don't have the opportunity
to sit down over a coffee or a beer and all get to know where each
of us is coming from. I try not to be dogmatic about this stuff,
and I'm trying very hard to express that my real goals here are for
success of CL-on-the-Web. If you guys are more interested in
CL-in-XML, then you probably don't need my help, and I'll just be
a distraction. In response to you saying you "don't feel competent
to adjudicate between rival criteria for adequacy", I'd say that
neither do I, nor am I interested in spending a lot of time
arguing or defending myself. This message is basically it. If the
design goals in the current spec plus this message make any sense
(and I don't claim that they do, esp. since the spec is not even
complete), then I only hope to influence the committee to consider
the difference between CL-in-XML and CL-on-the-web, and realize
that they will likely have a different implementation approach
and therefore syntax.

If this long journey has brought any of you only confusion, my
apologies. If I can answer any questions or clarify anything,
I'm certainly willing to try. I'm also willing to adjust the
XCL spec, answer questions on CL-in-XML or CL-on-the-Web, either
approach. I'm also willing to withdraw the XCL proposal entirely.
I'm not wedded to it, it's just a bunch of ideas. I happen to
have a vision for what it could be, but if either I'm not able
to express that or nobody is interested, I'm not going to fight
about it. I've got a lot of pans on the fire already.

Thanks for your patience and forebearance,

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .










More information about the Scl mailing list