[SCL] Re: The second draft for XML syntax for CL
Murray Altheim
m.altheim at open.ac.uk
Sun May 11 10:48:39 CDT 2003
Tanel Tammet wrote:
> Hi,
>
> Attached please find the second version of the
> SCL-in-XML draft.
[...]
Attached are my comments on the second draft. Primarily, I think
before any reasonable discussion can approach these decisions
on logic logically, there needs to be an agreed-upon set of
application requirements upon which each decision can be judged.
Absent that, it's just the lot of us expressing our ideas about
a variety of different applications all called "SCL", each coming
from our differing aesthetics, expectations and conceptions about
what SCL is or could be. Without some metrics we just have opinions,
and not only is that not productive, it usually just leads to
shouting matches, or as I've seen in some working groups, the
tyranny of editor(s).
Murray
......................................................................
Murray Altheim http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK .
When you hear the phrase "free trade", don't think it's about the
freedom of people, it's about corporate freedom from government.
http://www.zmag.org/ZMag/articles/march02herman.htm
-------------- next part --------------
> XML syntax for SCL level 1
> --------------------------
>
> SCL workgroup internal draft: second version
> Tanel Tammet
> 11 May 2003
>
> based on:
> - the earlier draft from: 29 april 2003
> - ideas, feedback and discussion after first draft with:
> Geoff Sutcliffe, Murray Altheim, Pat Hayes, Dan Connolly
>
>
> Introduction
> ------------
>
> This draft contains a concrete suggestion for encoding
> SCL in XML, presented with minimal details. Necessary
> additions, explanations, examples, etc will be added after
> the basic details have been agreed upon.
>
>
> Changes and motivations for these after the first draft
> -------------------------------------------------------
>
> This small chapter summarises the main feedback items
> after version 1 of the draft. Each item is briefly commented upon
> and it is indicated whether the current draft contains
> a corresponding change and why.
>
> The current chapter should NOT be a part of the spec:
> it is here only for discussion purposes.
>
> - Pat: XML is ugly, verbose, unreadable, hardly useful.
Having an XML notation for CL will provide a means of incorporating
it with a huge amount of existing software and documents. XML is the
de facto standard for interchanging data nowadays on the Internet.
Whether it's ugly, verbose, unreadable, etc. is a matter of opinion
and irrelevant.
> Comments by Tanel:
>
> - we need alternative concrete syntaxes except
> XML, geared towards human readability. However, once we want a
> concrete XML syntax as well, we cannot escape the obvious
> consequences there. Just have live with XML if we want to use it.
>
> - the real point is the lack of need for extra parsers. We can
> always assume that XML-represented logic can be parsed by any standard
> XML parser, no need for additional implementation work to parse
> a new syntax. Hence we should NOT attempt to put less-verbose-syntax
> into XML documents: people would need an ADDITIONAL parser in such
> cases, making their life significantly harder.
>
> - Pat: We need to take seriously the potential need to interact
> with the RDF style of using XML.
>
> Comment by Tanel: we need an alternative concrete syntax for
> combining logic in SCL with RDF. This document does not target
> compatibility with RDF. Important issue, but has to be done differently.
> Why differently? Because RDF is a straightjacket for ordinary
> logic: in case we create an RDF-compatible syntax for SCL
> (and we should do this!) it would be very hard to use this
> syntax for "ordinary" logic.
I agree. XML is simply a carrier for data, whereas RDF is a specific
application of XML with its own built-in semantics, basically graph
theory-in-XML. For RDF to act as a carrier for CL data, there must be
a very clear separation between the semantics of CL and the semantics
of RDF, and where there is overlap, they must have equivalence.
> - Geoff: XML is ugly, verbose, unreadable, hardly useful.
see above.
> Comment by Tanel: see above.
>
> - Geoff: need to encode annotations, e.g., names, type, etc, as in the
> TPTP. Suggests to define a wrapper around the logic
> formulae, and allow the annotations to be tag arguments or inner tag
> structures. The TSTP annotation structure, which was designed to be very
> flexible, may be a useful starting point (or ending point!).
>
> Comment by Tanel:
>
> - strictly speaking, we need no wrappers, since one can simply put
> extra tag arguments with arbitrary string values into every tag.
> All the annotations can and should be there.
>
> - However, it may be convenient to have a top-level wrapper providing
> an intuitive location for these annotation tags. Murray has introduced
> a <formula> tag.
In XML there are "element types", "elements" and "attributes" in XML.
"Tags" are the things delimiting elements, and what you're calling
parameters are attributes. Just for clarification.
I've also introduced an <xcl> element to use as a semantic-free
wrapper for when people want to deliver multiple formulae in one
document. You could also use any of the other XCL element types
as document elements, as might make sense in some applications.
> Change in version 2 of the draft: <formula> tag introduced.
> Non-obligatory to use (this might change later).
>
> - Geoff: allowing or requiring an arity value, e.g.,
> <scl:connective scl:name="and" scl:arity=2>
> That will make parsing and error reporting more robust.
>
> Comment by Tanel: superfluous. Required arity or the lack
> of it is pre-defined for each connective.
But what about extensions? What if someone wants meet(i,j)? How
will a processor know its correct arity?
> - Geoff: for quantified formulae the variable should be written outside
> as a <scl:term/>.
>
> Comment by Tanel: superfluous. In our minimalist approach
> each term in the abstract syntax is converted to one tag in the
> XML representation. We try to minimize the nr of tags, to keep
> the structure simpler and flatter. Extra tags complicate the
> structure. Hence keeping variable-as-an-argument-of-quantifier-tag
> approach.
Agreed. You need to firmly separate document content from markup,
i.e., use element and attribute content rather than having people
create their *own* elements and attributes, which will be unknown
by XCL processors.
> - Geoff: unique existence is something useful, the generalized form which the
> TPTP will support is
> ? [1:X,2:Y] : p(X,Y)
> meaning there are exactly one X and two Ys such that p(X,Y).
>
> Comment by Tanel: no clear need to introduce this into minimal core
> (we want to keep the core minimal!). Could be introduced in extensions.
> Easy to give an extra parameter to quantifier tag, like this:
> <quantifier name="exists" variable="x" count="2">
> indicating that we say that there are exactly TWO instances.
> However, bringing this into core creates unnecessary complications.
> Let us see whether it will be in the SCL core - if yes, we can put
> it into the XML concrete syntax.
This could be introduced as a PSI rather than as new syntax. As in my
last comment.
> - Murray: introduce <formula> tag
>
> Comment by Tanel: see above: introduced in this version.
>
> - Murray: shorten some, but no all, tag names,
> for example <quant> instead of <quantifier>
>
> Comment by Tanel: confusing to have some tag names as full
> names in English, some as shortened. Better to keep full
> names consistently. Since XML is very verbose anyway,
> micro-optimisations such as shortening the tag names are useless.
Depends on the application. I've been hearing arguments from John
and others about verbosity being an issue. Is it or isn't it? Saving
half the number of characters used in element type names over a
document of 300,000 elements is significant. The shortening leaves
names that are still unambiguous. Are most CL documents expected
to be large or small?
It's quite simple to store enormous documents in an isomorphic syntax
using very short (1 character) names and transform on the fly to a
standard syntax for interchange.
> - Murray: use XLINK namespace to refer to parts of other docs
>
> Comment by Tanel: unclear why we need to to make other namespaces
> obligatory. Rather, let us introduce a parameter href (as
> xlink:href or just href in html) for referencing. This is subject
> to discussion.
If you use the XML namespace you must declare it. You can't use xlink:href
without declaring what "xlink" refers to. If you use "href" (or "foo",
for that matter) application developers will have to write their own
code for linking rather than using XLink libraries. The whole purpose
of XLink wasn't simply to provide a linking syntax, it was to provide
a whole methodology (and by extension, software libraries) for linking
in XML. Using "href" will gain you none of that.
OTOH, if CL-in-XML is never expected to link outside of the document,
that's fine too. But you won't scale into the next language (what I'm
calling Level 2) because you've created a proprietary link structure
(relying on understanding the semantics and linking behaviour of SCL)
rather than taking advantage of the existing link semantics and
behaviours.
> - Murray: introduce <doc> tag for arbitrary doc strings
>
> Comment by Tanel: unnecessary, since we have a "comment" parameter
> for free-form documentation strings. As said before, we do not
> want to complicate the term structure by additional tags not
> correspoding to SCL abstract syntax term elements. This
> might change in case SCL abstract appears and demonstrates
> that the mapping to XML term-as-tag would be useful.
If you use an attribute (it's in XML an attribute rather than a parameter)
you can only use ASCII data, unless you want to force your users to write
strings that use stuff like ∀x:⊿, etc. for documentation. And
stuffing long strings in attribute values is prone to errors, such as
when people accidentally have a quote markup within their documentation.
And you won't be able to extend documentation features by adding their
own markup, use XHTML, use links, etc. There's no reason to use attributes
over elements from markup standpoint. If people are expected to be typing
SCL it might be an issue, but if your requirements expect the predominant
use to be hand-typed SCL, we're working on different projects.
> - Murray: inroduce <type> tag to indicate the type of a variable
>
> Comment by Tanel: this would open up a possibility to give
> complex types attached to variables. The current proposal
> of using "type" parameter only allows predicate names to
> be used as types, simplifying the type usage. As said before,
> we want to avoid new tags and complexities in structure.
Currently, it does not open up any such possibility, as in XCL Level 1
the content is declared as PCDATA (character data). In Level 2 the
ability to add a link here is important in that it will allow XCL
documents to reference existing ontologies and their types. Use of
an attribute value here has the same problems as in my last comment.
You won't be able to open this up later (unless you are willing to
have a redundant 'type' attribute and a <type> element.
> - Murray: introduce reference tags, for example, <connRef>,
> <quantRef>, etc
>
> Comment by Tanel: MANY such tags are either superfluous or open
> up a possibility for complex reference information.
They're only in XCL Level 2, not Level 1.
> One is enough. This has the advantage of having simple
> "replacement" semantics, which can be seen as a separate layer.
Except that one must traverse the link to know what is at the other
end. A <predRef> (predicate reference) unambiguously references a
<pred> element, and this is *easily* checked by a processor. It's
also much more apparent during authoring. In the end, one link type
would certainly do, but people would have to be much more careful
when authoring, editing, and interpreting SCL documents.
> Change in version 2 of the draft: "include" tag and "href" parameter
> for this tag introduced. Semantics: replace the "include" tag with
> href inside with a text referenced. Example:
>
> <conn name="and">
> <pred name="p">
> <term name="1>
> <include href="someuri1">
> </pred>
> <include href="someuri2">
> </conn>
>
> Any include tag (and no other tags) must contain href parameter.
>
> Any include tag has to be replaced by the text referenced by href.
>
> Change in version 2 of the draft: "id" parameter is introduced.
> Any tag may contain "id".
>
> NB! Observe we could use XLINK namespace instead of putting these
> into SCL-in-XML namespace. In the current draft we DO NOT use XLINK
> namespace, just to simplify matters. This may be changed.
Are you proposing that your "Simple CL" (what I've called XCL Level 1)
incorporate linking features? Or is this left to a future version?
> - Dan: about Murray's reference tags like <termRef> etc:
> not sure he groks termRef. Otherwise, it looks pretty straightforward.
>
> Comment by Tanel: see above.
>
>
> Summary of changes in the version 2 of the draft:
>
> - some trivial errors (ending / forgotten at some places) corrected.
> - formula tag introduced.
> - include tag and the href parameter for this tag introduced.
> - id parameter introduced.
> - example modified to demonstrate the usage of "formula", "include"
> "href" and "id".
>
> (as said, "include" and "id" could be taken from other namespaces,
> if we so wish: has to be discussed).
There's no point in "taking from other namespaces" unless they're
XML core. There's a mistaken belief I've noted commonly in XML circles
that if somebody grabs <svg:filter> or <xtm:baseName> that you'll
somehow get all the behaviours for free. I'm not saying that's what
you're stating, but you might as well simply keep your own namespace
clean as much as possible. The only exception to this would be the
XML and XLink namespaces. You get xml:base, xml:space, xml:lang,
and the xlink:* stuff as part of XML (although there's probably a
new xml:* spec out this week... *sigh*). You might use XInclude,
but you're probably better off with XLink in terms of finding
software library support. (You can't ignore that issue if you wish
this to succeed.)
> Approach (explanatory)
> -----------------------
>
> The approach used is a compromise between minimalism and
> the ordinary ways to represent data in XML. In particular,
> we attempt to encode some data in tag structures and some
> data as arguments to tags.
>
> The language should be easily extendable by users by:
> - using connective and type names invented by the users
> - using parameters invented by the users
> - putting free-form XML into SCL terms
"free-form" XML means that SCL documents have ambiguous meaning.
It's sort of like saying, drive that car but ignore all the knobs
and buttons you don't know about. You might be okay, but you might
find out that one of the knobs was the brake. You can't ignore
markup without having documents interpretable differently by
different processors. That's called information loss.
> The language does not follow the example of RDF and OWL
> in the sense that the arity of predicates is not restricted.
> However, we envision that RDF and OWL structures can be
> put into the SCL structures, using the proposed xmlterm tag.
If you have an expectation for an extension, don't allow it
arbitrarily -- create a methodology for extension that allows
people to know what they're getting. Documents that aren't
strictly SCL should be so labeled.
> The arity of associative connectives is not restricted.
> The arity of non-associative connectives is restricted.
>
> The language contains a trivial mechanism for referencing
> other texts and naming terms and formulas.
[...]
What is meant by "trivial"? Intra-document? Or inter-document?
If the latter, it's not trivial. There are a lot of issues to
be dealt with, and these issues are solved for you by using an
existing link spec like XLink, and paying attention to xml:base
and URI issues (you just need to reference those specs and
understand what they mean. And that's admittedly not trivial).
[...]
> NB! Observe that the second (included) file need not be in the SCL-in-XML
> language. We could, if we so wish, use such a file:
> - - - - - - - -
> <h1>Just a header in html</h1>
>
> Now comes a formula part not visible in html browsers,
> see the id parameter and include usage:
You might note that there's very little reliability ith assuming
HTML browsers will properly handle non-HTML markup. In many cases
they don't, and in browsers that are "XML-aware" the user will
receive warning messages, empty pages, and all manner of results
when encountering such markup. Hopefully this situation will
improve, but for now this is quite treacherous ground to tread.
For example, there are web pages that appear on MS Internet
Explorer that go complete bonkers on either the Macintosh version
of IE, or on Netscape or Opera.
> <scl:quantifier scl:name="forall" scl:variable="x"
> scl:id="foo">
> <scl:predicate scl:name="r">
> <scl:term scl:name="x"/>
> <scl:include scl:href="#bar">
> <scl:term/>
> <scl:quantifier/>
If you use <scl:include scl:href="#bar"/> developers are going to
need to implement specialized linking support for SCL. Even for
this "trivial" example.
I'm curious as to why you feel the need to namespace prefix SCL but
not HTML. I'd do the opposite, since you can declare a default XML
namespace (unprefixed) and have all non-SCL markup prefixed. I'd
have done the opposite. It saves space, is a lot easier to read,
and the namespace prefixes are unnecessary. Your example didn't
include the declaration for "scl:" so to compare fairly:
<scl:quantifier xmlns:scl="http://scl_namespace_uri/"
scl:name="forall" scl:variable="x"
scl:id="foo">
<scl:predicate scl:name="r">
<scl:term scl:name="x"/>
<scl:include scl:href="#bar">
<scl:term/>
<scl:quantifier/>
or
<quantifier xmlns="http://scl_namespace_uri/"
name="forall" variable="x"
id="foo">
<predicate name="r">
<term name="x"/>
<include href="#bar">
<term/>
<quantifier/>
They mean exactly the same thing to an XML processor.
You could then add in XHTML markup by providing a DTD that includes the
XHTML modules for what you want. And you can get all of the XHTML
character entities (like the symbols for ∃, ∧, ∨, etc.),
bold and italic, HTML linking, etc.
<quantifier xmlns="http://scl_namespace_uri/"
xmlns:x="http://www.w3.org/1999/xhtml"
name="forall" variable="x"
id="foo">
<predicate name="r">
<x:p>
Here's some documentation, including <x:b>bold</x:b> and
character entities like ∃ or ä, and a
<x:a x:href="http://foo.org/opossum.html">link</x:a>
</x:p>
<term name="x"/>
<include href="#bar">
<term/>
<quantifier/>
But I wouldn't advocate this in version 1.
> And here is the included term, also not visible in html browsers:
>
> <scl:term scl:name="1"
> scl:id="bar"/>
>
> End of file.
> - - - - - - -
>
> Principles of encoding
> ----------------------
>
> SCL namespace defines the following tags:
>
> - formula
> - connective
> - quantifier
> - predicate
> - term
> - xmlterm (NB! may occur in arbitrary places in the SCL structure)
> - include (NB! is replaced by referenced text)
>
> and the following parameters for tags:
>
> - name (obligatory for all scl tags, except include)
> - variable (allowed and obligatory only for scl quantifier tags)
> - type (allowed only for scl quantifier tags)
> - comment (allowed for all scl tags)
> - href (allowed and obligatory only in the include tag)
> - id (may be used in any tag, except include tag)
You might want to mention the implications of IDs and the ID
namespace, vs. values within the 'name' attribute.
> SCL further defines the following name parameter
> value strings with predefined meanings (meaning
> defined as such only if the value occurs in a suitable
> SCL tag):
>
> - exists (meaning defined iff in quantifier tag)
> - forall (meaning defined iff in quantifier tag)
> - and (meaning defined iff in connective tag)
> - or (meaning defined iff in connective tag)
> - xor (meaning defined iff in connective tag)
> - implies (meaning defined iff in connective tag)
> - not (meaning defined iff in connective tag)
> - equivalent (meaning defined iff in connective tag)
> - equal (meaning defined iff in predicate tag)
>
>
> What is not prohibited
> -----------------------
>
> SCL does not prohibit using additional tags, parameters and parameter
> values inside SCL text structures.
Why not? Are you defining an interchange syntax or simply providing
a playground for vendors and developers?
> SCL parsers (unless specially extended) should simply
> ignore all unknown tags, parameters and values.
This means that for a given SCL document, different implementations
will have different interpretations. That cannot be a good thing.
> Suggestions for extensions
> --------------------------
>
> Language users can easily create richer languages on top of
> SCL, using their own namespaces. Some of the suggested ways:
>
> - using additional tag parameters
"element types", "elements" and "attributes" in XML.
> - using additional connective tag name parameter values
> - using additional quantifier tag name parameter values
> - using arbitrary XML inside the xmlterm tag
>
> In particular, users can invent their own parameters to
> mark formulas as belonging to different parts of the knowledge
> base (for example: axioms for arihtmetic, theory of lists,
> knowledge of a particular person, a query, etc).
Are there any rules for this, or do we expect to receive an SCL
document and have to figure out what it means by hand? Will
non-standard SCL documents be labeled differently than ones that
conform to the standard?
The one thing I think necessary in order to determine the answers
to design questions (and the first thing I did) is to define a
set of requirements for SCL and have everyone agree to those
requirements. Then it's a matter of determining whether a specific
option meets or fails the requirements. As it is now, you simply
say you don't think a decision fits your understanding of the
goals and toss it aside. Other people's requirements may be quite
different than yours. And these things shouldn't be decided on
opinion, but on some metric. For example, just how important is
verbosity vs. human readability? What is the expected size of
SCL documents? Otherwise, you and I could just argue until we're
blue about whether it should be <quant> or <quantifier>, which
is stupid (and typical of working groups I've known). If humans
are expected to never read SCL documents, it could be <q>. You
just need to set the requirements and these things tend to shake
out. I can't tell you how many projects I've known that either
didn't have any requirements or paid little attention to them,
instead playing back and forth with committee members' senses
of aesthetics rather than valid engineering criteria.
More information about the Scl
mailing list