[SCL] Re: SCL syntax
Mark Stickel
stickel at AI.SRI.COM
Tue Mar 2 17:41:24 CST 2004
Concerning SCL lexical syntax, I believe that '=' and '@'
are treated too specially and restrictively. Why can't
we have names like '=time' and '==' and sequence variables
like "@" and "@@x"?
I suggest that "=" be defined as a reserved element
instead of as a special character. Except that it is
a single nonalphabetic character, it is functionally similar
to current reserved elements like "or". We don't disallow
"orthodontist" just because it starts with a reserved element,
while the current syntax does disallow anything starting
with "=" except "=" by itself, for no clear reason to me.
I suggest that seqvar be defined by
seqvar = "@", {char};
That is, let any legal token beginning with "@" be a sequence
variable. This includes "@" by itself (perhaps useful)
and sequence variables with additional occurrences of "@"
(ugly but harmless?).
namesequence can then be defined by
namesequence = (char - "'", {char}) - (reservedelement |
numeral | seqvar);
As Lisp does, and unlike previous versions of SCL syntax,
tokens like "123ABC" are now legal namesequences. I'm okay with
this (and it is general), but some may prefer that
namesequences not start with digits.
Mark
Pat Hayes wrote:
>
> Mark, (and indeed anyone else :-) if you could cast an eye over the
> new draft of the syntax I would be grateful.
>
> http://www.ihmc.us/users/phayes/SCL_current_2004_rf.html
>
> It tries to stick with approach 4.
> I have completely rewritten the lexical stuff. The intention now is
> that the first symbol of a lexical item signals its type:
> ( or ) force a lexeme break and are always themselves;
> = can only be itself if its the first character, but does not force a
> break;
> @ can only start a seqvar, which cannot contain @;
> ' can only start a quoted string, and (unless preceded by \) must end
> it;
> whitespace starts a whitespace sequence which forces a lexeme break
> control characters are errors.
> Numerals are sequences of digits.
>
> Other than this, any character can occur inside a name, so for example
> these are all legal names:
> =
> ab2
> 2ab
> abc
> a=b
> ab=
> a'b
> http://www.chowbaby.com/restaurantnameMsa.asp?restid=15622306
> ~!@#$%^*:"{}|+_4567#$%^&,&.**shflsdhjf;asdjkf;jk|;
>
> but not these:
>
> =ab (starts with = and is not just =)
> @a at bc (illegal seqvar)
> 'ab (starts with ' but isnt unquoted)
> ~!@#$%^*:"{}|+_4567(#$%^&,&.**shflsdhjf;asdjkf;jk|; (has '(' in
> it)
>
> This is motivated partly by wanting to allow URIs to be names, by the
> way. (Right now URIs officially even may contain whitespace, but this
> will soon be changed, I gather. In practice browsers don't work with
> whitespace so nobody uses this anyway.)
>
> The expression syntax allows strings and numerals to count as names
> (and this is made explicit). We could change this, but only at the
> cost of making the expression syntax more complicated. I'm inclined
> to just let it happen, since the semantics could handle it, so it is
> legal to write things like
> ('abc' 17)
> which can be satisfied if the character string 'abc' can be mapped
> (folded) into some relational extension with <17> in it. Well, if
> that's what you write, that's what you get :-) And seriously, the
> things that people finish up wanting to do when they write things like
> Web scrapers can actually produce text like this, eg you might want to
> treat a piece of NL text as meaning a category of things, ie a unary
> predicate.
>
> Pat
>
> --
>
> ---------------------------------------------------------------------
> IHMC (850)434 8903 or (650)494 3973 home
> 40 South Alcaniz St. (850)202 4416 office
> Pensacola (850)202 4440 fax
> FL 32501 (850)291 0667 cell
> phayes at ihmc.us http://www.ihmc.us/users/phayes
More information about the SCL
mailing list