[SCL] SCL and XML syntax translation questions

pat hayes phayes at ai.uwf.edu
Tue Dec 24 19:45:40 CST 2002


>Hi,
>
>Wanted to ask your opinion on how should the SCL-lisp <-> SCL-XML 
>transformation look like.

Quick aside comment: we ought to think in terms of mappings to and 
from the abstract syntax. Same issues apply, of course.

>I think we should get a basic agreement
>on this before we attempt to fix the specifics of SCL-lisp
>syntax and semantics.

I hope we can do the abstract syntax + MT more or less independently 
of the surface syntaxes.

>Here are some naive alternative solutions for discussion:

I am not an XML expert, but I think it is essential to keep in mind 
that we need to have at least one syntax which allows SCL-XML to be 
used alongside, and integrated with, the XML surface syntax which has 
already been chosen for RDF and (hence) for OWL.

RDF (and all the subsequent languages) have essentially distinct 
syntaxes for binary predications and others. Unary predications are 
subsumed under binary ones by using an explicit 'apply' relation 
(called rdf:type) , and higher arities can be handled by using lists, 
eg R(a, b, c, d) could be glossed as having the structure R(a, L) & 
L=[ b, c, d] . (In fact, the RDF actually translates as
(exists (L)( R(a, L) & L=[ b, c, d] ))   with the list described 
using a conjunction of existential assertions about the dotted-pair 
components of the list, in this case something like

(exists (L, L1, L2)(R(a, L) & first(L,b) & rest(L, L1) & first(L1, c) 
& rest(L1, L2) & first(L2,d) & rest (L2, nil) )

An alternative strategy used in RDF (and recommended by the RDF 
'primer') would be to resort to a case-grammar style, and use 
auxiliary binary relations so that it comes out like this:

(exists (x) (R(a, x) & R1(x, b) & R2(x, c) & R3(x, d) )

I KNOW these are ugly, but for all its faults the use of some 
reduction to the binary case is both widely practiced and an existing 
standard, and we need to be able to integrate it and work with it.

Pat

>To transform a formula
>
>(implies
>    (forall (?x)
>      (P (succ ?x)) )
>    (P (succ 0)) )
>
>we could have smth like (several different encoding methods):
>
>1) Some untyped nested list translation
>    with NO function->tag conversion, but a special
>    treatment for varlist (I'd probably prefer this
>    format, since it is least restrictive and does
>    not convey more information than the lisp syntax):
>
>   <term>
>      <term>implies</term>
>      <term>
>        <term>forall</term>
>        <varlist>
>          <var>?x</var>
>        </varlist>
>        <term>
>          <term>P</term>
>          <term>
>            <term>succ</term>
>            <term>?x</term>
>          </term>
>        </term>
>      </term>
>      <term>
>        <term>P</term>
>        <term>
>          <term>succ</term>
>          <term>0</term>
>        </term>
>      </term>
>   </term>
>
>To be really faithful to list syntax we should perhaps
>drop the <varlist> tag and simply write:
>
>        <term>
>          <term>?x</term>
>        </term>
>
>Otherwise the parser/converter must KNOW which functions
>are quantifiers and which not. This might be hard to
>achieve for exotic logics and specific quantifiers people may
>want to use.
>
>2) General nested list translation with function->tag conversion and
>    exception for varlist. Here the parser/converter again
>    needs to know which functions are quantifiers and which not.
>
>   <implies>
>      <forall>
>        <varlist>
>          <var>?x</var>
>        </varlist>
>        <P>
>          <succ?>?x</succ>
>        </P>
>      </forall>
>      <P>
>        <succ>0</succ>
>      </P>
>   </implies>
>
>3)  Some kind of a rather strongly typed nested list translation
>     which explicitly contains information about the formula structure
>     apparently NOT present in SCL-lisp syntax (observe that
>     it would not be so easy for higher order case where we cannot
>     have <literal fun="P"> anymore, since fun may be a complex term).
>
>     Here the parser/converter needs to know an awful lot, putting
>     serious restrictions on the practical usage of exotic logics:
>
>   <prop fun="implies">
>      <quantifier fun="forall">
>        <varlist>
>          <var>?x</var>
>        </varlist>
>        <literal fun="P">
>          <term fun="succ">
>            <atomicterm term="?x"/>
>          </term>
>        </literal>
>      </quantifier>
>      <literal fun="P">
>         <term fun="succ">
>           <atomicterm term="0"/>
>         </term>
>      </literal>
>   </prop>
>
>
>4) Something completely different?
>
>
>About the varlist and variable/constant encodings:
>
>Varlist:
>
>It seems to me that the variable list is the only set
>construction in the formula syntax: for other lists the first
>element of a list is a distinguished function/predicate/connective/etc
>term and the order of elements normally matters.
>
>This seems to require a special treatment in the XML syntax as well.
>
>The XML syntax must also allowed typed variables to be
>present in the list, preferably a mix of
>typed/untyped like in (?x (int ?y) ?z).
>
>We could, in principle, remove the problem by requiring
>a quantifier to bind a SINGLE variable, but I guess this
>will be cumbersome to use and is not compatible with KIF
>and CL ideology.
>
>The first example above proposes an alternative dumb solution
>of treating (?x (int ?y) ?z) as just another term in XML.
>
>
>Variable/constant encodings:
>
>BTW, is there a reason to REQUIRE that variables start with ?
>or is it just a cozy convention we do not have to force?

The latter. Lexical markers for variables should be considered an 
idiosyncracy of the surface syntax, and we can choose them to suit 
ourselves. The chief considerations to bear in mind are consistency 
with existing standards, ease of writing parsers, and familiarity, in 
that order, I would suggest.

>
>If we prohibit open formulas at top level, then we do not need vars to
>start with ?. In "ordinary" theorem provers which take
>clausal input (hence open formulas) the variable/constant
>distinction is mostly done using different conventions, like
>Prolog-style capital/noncapital or Otter-style first character
>convention of x,y,z,u,v,w.

I don't think either of thse can be usefully extended to a larger 
setting, however, since for example we have to allow users to write 
urirefs as constant names, and there is no prohibition against a 
uriref starting with any ascii symbol. So things like reserved 
alphabets and case distinctions cannot be used as a general mechanism.

>If open formulas at top level are
>prohibited, then we do not need to force any conventions
>upon other people in this respect.

That would be ideal.

Strictly speaking there is no absolute need to even make a syntactic 
distinction between variables and nonvariables: any name that is 
bound by a quantifier is considered to be a 'variable' and otherwise 
it isn't as far as the CL model theory is concerned. However, I don't 
think that implementers will be very happy with such a language. 
(What is your view??)

Pat Hayes

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes at ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam at ai.uwf.edu   for spam




More information about the Scl mailing list