[SCL] unrelated fol syntax spec draft
Tanel Tammet
tammet at staff.ttu.ee
Wed Jan 15 05:58:55 CST 2003
Hi,
This message contains a small extended FOL syntax draft
which is NOT related to the SCL project.
Perfectly OK to ignore and delete the message.
I am sending the language draft simply because I am
currently working on a larger system which handles a new yet-another
FOL syntax, and we have a first partial draft of the language
ready.
The language:
- is intended to be human-readable for nonspecialists.
- is meant to be used in a database-oriented context.
- contains some (weak) higher order features and
nonmonotonic operators (not in syntax, just
in semantics).
The language is of course not useful in the SCL context.
However, I thought you might be interested of seeing all kinds of
languages people (in this case me) happen to come up with in specific
application areas. Of course I'd be glad if you'd have any
(non-SCL) comments or observations on the language.
Please do not distribute: too raw and ugly.
The draft spec is an attached text file.
Regards,
Tanel Tammet
-------------- next part --------------
Rule system language
--------------------
An PARTIAL internal spec draft by Tanel,
based on input and ideas from Vello.
15 January 2003
Introduction
------------
This spec draft contains five parts:
1) Overview of the general structure and kinds of
rules and data.
2) Data representation and language (strings, numbers,
predicates, rule language)
3) Specifics of the logic and the workflow of getting data,
deriving new facts, proving or refuting queries.
Kinds of data and rules
-----------------------
A rule solver takes four kinds of information into
a rule&data set to be processed. Each query may
require different rule&data sets.
Briefly, there are following kinds of
data a rule solver takes:
- Raw data from the database
- Terminology rules and algorithms for terminology
and general knowledge.
- Specific rules and queries posed by
the user and checked by the system
Here comes a more detailed description:
1) Data
Raw data from the database. Given as a set of
literal facts a la 'pred(12,"Jaan Mets","XZY")' etc. Normally
given to the prover by an external function using
pre-built views on the database.
This part typically differs from query to query.
User does not directly manipulate that data:
user manipulates that data by changing the database.
2) Terminology
General knowledge about the database structure
and the investment world. Expressed as rules,
not raw data in the database. Rules typically allow
the prover to logically derive new facts from input data.
This part typically remains unchanged from query
to query.
This part is normally not changed by the user,
but it is possible to change it if need arises.
3) Specific rules and queries
Actual rules to be checked. Expressed as rules,
often using special extralogical query functions
in addition to standard logic.
This part may be relatively often changed by the user
(new queries, new constraints, etc)
Data representation and language
--------------------------------
We have several layers of languages in the prover:
- Internal layer (never shown to any users and developers
except prover engine developers)
- Every data item is encoded as an
integer plus additional data structures for information.
- Facts and rules are clauses with internal decorated
pointer structure.
- Raw input language layer (sometimes may be shown for developers) is
used internally by the prover as a result of parsing an external
file given in some specific language. Raw input language
uses scheme lists for representing clauses and additional
information.
Prover engine converts raw input language into the internal
layer before starting proof search.
- Clause language layer is the standard language used
by the Gandalf prover engine for prover competitions,
mathematics and electronics. It is based on the Otter
syntax.
It is a "syntactic sugar" on top of the raw input
layer. A special module parses that language into
the raw input layer language for the prover engine.
Clause language is not used in this document.
- Programmer layer is the layer used in this document.
It is a "syntactic sugar" on top of the raw input
layer, similarly to clause language layer.
Again, a special module parses that language into
the raw input layer language for the prover engine.
Programmer layer presents first order logic similarly to
a programming language and (somewhat) to a natural
language. Programmer layer is designed to be used
by programmers and sophisticated users. It can be
extended and modified as need arises.
NB! In principle there could be various "sugared"
language layers similar to the programmer language
and clause language.
In the following parts of this chapter we describe
the programmer language layer.
Lexical structure
- - - - - - - - -
The lexical components are:
- parenthesis ( and )
- semicolon ; (line comment marker)
- period . (fact and rule end marker)
- strings in double quotes
- integers
- anonymous variables (a single underscore _)
- names:
The following strings qualify
as names:
- a name must start with
a dollar $ symbol or
an alphabetic character a-Z
- a name may contain (after the initial
symbol) only the alphabetic characters
a-Z or numeric digits 0-9
Upper and lower case characters are considered
to be different.
Base data types in facts and rules:
- - - - - - - - - - - - - - - - - -
Initially we have these data types:
- variables are either:
- names which start with a capital letter (like X122a)
- a single underline '_' (prolog-style): each such
underline is automatically treated as a new variable
not occurring anywhere else.
- Strings are represented as ordinary strings: "Jaan Mets"
- Numbers are represented as ordinary integers: 12333
- Special constants:
- Booleans are represented as special names $true and $false
- null value is represented as $null
- Skolem constants (denoting 'abstract' objects) are represented
(again without quotes) with a name starting with a
dollar symbol.
As need arises, we may implement additional data types:
- Dates are represened as a term $date(year,month,day) where
year, month, day are integers.
- Lists are represented as a term $list(el1,el2,el3,...,eln)
where any element can have an arbitrary type.
- Generally any new datatype is represented as a term
$typename(typeelem1,...,typeleemn).
Functional terms
- - - - - - - - -
Functional terms are rarely used. When they are used,
they are normally assumed to be either:
- data types
- computable (that is,
functions can be computed and a functional term
is replaced by a base type value).
Syntax is traditional, a la 'foo(X,"jaan",bar(1,"ants",X))'
Instead of writing (2+3)*5 we will hence write
*(5,+(2,3)) in the current syntax.
A dollar symbol as a first character of the function
means that the function has a special built-in algorithm
or an external program attached.
Logical atoms
- - - - - - -
Logical atoms are either:
- simple propositional atoms
- terms led by a predicate.
Simple atoms are names like 'isok', without
quotes and they can get a logical true or
logical false as values.
Terms led by a predicate traditionally look like
functional terms a la 'father(X,Y)' only that
the function is boolean, ie a predicate.
A dollar symbol as a first character of the predicate
means that the predicate has a special built-in algorithm
or an external program attached.
We use special syntactically sugared forms
for predicate terms of different arities:
- One-argument (unary) predicates a la 'p(10)'
which essentially say that an an argument
has a type/property named by the predicate:
10 is p
"Jaan Mets" is person
+(X,1000) is bignumber
NB! The structure is: '[object] is [predname]'
- Two-argument (binary) predicates a la 'father("jaan", "ants")'
which say that arguments are related to each other:
"jaan" is father of "ants"
20 is bigger than 10
20 is <= than 30
10 is investmentsum of "EEK"
NB! The structure is: '[object1] is [predname] of|than|to [object2]'
where the system makes no difference between elements 'of|than|to'
- General predicates (more than two arguments) have the standard
syntactical form a la: 'p(10,"Jaan",200)'
NB! It is OK to express also unary and binary predicates
in this general boolean-function syntax.
Not
- -
There are two forms of not:
- negation in classical logic (not true is
false and vice versa)
- negation in the Prolog sense:
unknown or cannot find evidence for
Classical negation is expressed by
writing 'not' before a generic atom syntax
or after 'is' in the unary or binary
sugared syntax a la:
not p(10,"Jaan",20)
10 is not p
10 is not investmentsum of "EEK"
NB! In the first version of the language we do not support
writing 'not' before complex formulas.
Negation of a fact P in the Prolog sense means "we do not
know that P holds."
We use 'notknown' for Prolog negation with a similar
syntax to that of classical negation:
notknown p(10,"Jaan",20)
10 is notknown p
10 is notknown investmentsum of "EEK"
Equality
- - - - - -
Equality is represented either in
classical term notation like
equal(X,Y) or
equal(+(2,5),7)
or (since it is a binary predicate)
by writing 'is equal to' between
two objects which are equal, like this:
+(2,5) is equal to 7.
Disequality is written similarly by writing
'is not equal to' between
two objects which are not equal, like this:
+(2,5) is not equal to 7.
Parentheses
- - - - - -
It is always OK to put parentheses around a logical
atom or a complex formula (even if not required).
And and Or
- - - - - -
In the first version of the language And and Or
may only take logical atoms or negated logical atoms
as arguments.
Expressions connected by And/Or must
- always have parenthes surrounding the whole expression:
([atom1] and [atom2] and ... and [atomn])
([atom1] or [atom2] or ... or [atomn])
We write, for example:
((10 is p) and (10 is not investmentsum of "EEK") and p(10,"jaan",50))
(10 is notknown p and not p(10,"jaan",50))
NB! Whitespace before and after the and/or
symbols is obligatory.
In later versions it will be OK to have or-terms
inside and-terms, vice versa, etc etc.
Implication and logical equivalence
- - - - - - - - - - - - - - - - - -
We use if...then... . to express implication.
We use if...onlythen.... . to express logical equivalence.
We use a special sugared form "...is... ." to express
implication between two unary predicates (described later).
The implication expression must end with a period (.).
Examples:
if
(10 is notknown p and not p(10,"jaan",50))
then
p(20,"Ants", 100).
if 10 is not p then 10 is not investmentsum of "EEK".
if
((10 is notknown p) and not p(10,"jaan",50))
onlythen
p(20,"Ants", 100).
Special sugared implication "...is...":
Instead of an ordinary implication
if (X is man) then (X is animal) or equivalently
if (man(X)) then animal(X)
where exactly one positive unary atom is derived
from another positive unary atom and both
have a common variable argument, it is allowed
to write shorthand:
man is animal.
The goal of the shorthand is to simplify writing
terminological hiearchies.
NB! It is not allowed to have nested implications
or logical equivalences: an implication or an
equivalence may occur only at the top level
of the facts/rules file given to the prover.
higher-order functions
- - - - - - - - - - - -
In addition to ordinary logic the language contains
special higher-order query functions not directly representable
in first order logic: $sumforallfacts and $maxforallfacts.
Both functions (internally) loop over all the given or
derivable facts which correspond to a an atomform/criteria given
in the function term and compute sum (or max) of all the
integers in a given place in the atomform/criteria.
The syntax is as follows (same for $maxforallfacts):
$sumforallfacts([variable],[atomform],[criteria])
where:
- [variable] is a variable occurring
in [atomform]
- [atomform] is a single atom or classically negated
atom containing variables and/or constants
- [criteria] is a an atom or an 'and' formula
determining the subset of atoms which have to be looped over.
NB! [atomform] and [criteria] may contain shared variables
or variables shared in the enclosing if-construction.
For example:
$sumforallfacts
(Percentage,
tickerpercentage(Percentage,Ticker),
deposit(Ticker)).
loops over all given and/or derivable atoms
of the form 'tickerpercentage(...,...)' where
the first argument of the 'tickerpercentage(...,...)'
can be proved to be a deposit.
An alternative sugared syntax:
$sumforallfacts
(Percentage,
Percentage is tickerpercentage of Ticker,
Ticker is deposit).
Queries
- - - -
Some of the top-level sentences in the language
may be queries.
A query indicates that the system must compute
specific answer and return it to the user.
Each query sentence starts with a keyword "give".
There are two main types of query sentences:
1) Boolean queries: a truth value of the query is
asked. Essentially, a query is a rule or fact
which has to be checked: does it follow from all
the other facts and rules or not.
Syntax:
give <rule/fact>
Some examples:
give "Jaan" is man.
give man is animal.
2) Data queries: values for the variables in the
query are asked for. A data query is similar to
the SQL "select ... where ...." construction
and Prolog queries, which return tuples of values
for variables in the query.
Syntax:
give <variablelist> where <andlist>
some examples:
give X where X is man.
give X,Y where (X is father of Y and X is man).
A data query returns a list of tuples:
((var1val,...,varnval)
...
(var1val,...,varnval))
where each tuple is one solution instance of the variables.
Each varival is a term.
In the future additional forms of data queries will be
added. The main need is to have data queries which
- return not more than N answes for some N
- return a new subset of the answer tuples each time they
are called
Top level structure of a fact/rules file
- - - - - - - - - - - - - - - - - - - - -
A fact/rules file given to the prover has the
following basic structure:
fact1
...
factn
...
rule1
...
rulen
where:
- each fact is one atom (possibly with a negation)
ending with period (.)
- each rule is an implication or equivalence (which
is always ending with period).
- the order of facts or rules is unimportant (moreover,
rules and facts may be intermingled).
In order to give names to facts and rules (one name
for one fact or rule!) a name is written into the file
before the named object, using syntax where a name is
suffixed by a colon, like this:
somename:
In order to give names to groups of facts or rules:
- the group must be written inside parentheses
- there must be a period (.) after the end of group
- a name must be given before the group.
Example:
people:
(
man is animal.
"Jaan Mets" is man.
"Mihkel" is man.
animal is livingthing.
).
In order to give comments in a file one writes the
semicolon: everything from semicolon to the
end of line is treated as comment. Example:
; some rules about people
people:
(
man is animal.
"Jaan Mets" is man. ; known person
"Mihkel" is man.
animal is livingthing.
).
More information about the Scl
mailing list