Sather Home        
        Page

Section 5:
The Lexis

This specification of the lexis for the Sather programming language uses the notation defined in ISO/IEC 14977:1996 Syntactic Meta-Language.

A Sather source file text consists of lexical tokens separated by optional white space where necessary to differentiate one token from another. Some of the tokens are specified here in terms of a symbol - "xxx_SY" which may be any culturally defined sequence of one or more character encodings, or, in the case of identifiers and comments, sequences of one or more culturally defined character encodings the meaning of which is defined by the programmer.

The representation forms of the xxx_SY tokens for the purposes of textual illustration are given by the strings defined in the separate Reference specification in Annex D - which does not therefore form a necessary part of this lexis - it is given merely for use in example source text as may be defined throughout this document.

5.1 Source Text

sather source code = {delimited token}, {separator} ;
delimited token = {separator}, token ;
token = word token | symbol | constant literal ;
NOTE A token is the longest sequence of encodings which satisfy the definitions below. As a consequence of this, separators (see sub-section 5.8) must be provided after a word token or a literal. Additionally literals must be separated from other literals where adjacent literals are permissible.

5.2 Word Tokens

word token = identifier | keyword | name ;

5.3 Identifiers

identifier = pervasive identifier | full identifier ;
pervasive identifier = Aget_SY | Aset_SY | Assign_SY | Cluster_Count_SY | Div_SY | Exception_SY | Far_SY | Here_SY | Initial_SY | Is_Eq_SY | Is_Lt_SY | Minus_SY | Mod_SY | Near_SY | Negate_SY | New_SY | Not_SY | Plus_SY | Pow_SY | Result_SY | Self_SY | Times_SY | Is_Void_SY | Where_SY ;
NOTE The Cluster_Count_SY is retained in the above table for the current language definition. There is a proposal that it should be deleted in the next language revision.
full identifier = culture defined letter,
{culture defined alphanumeric | low line} ;

low line = Low_Line_SY ;

5.4 Names

There are three forms of 'name' in the source text of a Sather program - the name of an iter, the name of an abstract class/type and any other programmer defined identifier being either the name of an iter method or of an abstract class/type.

name = type name | iter name ;
type name = abstract type name | concrete type name ;
NOTE Since there are many world scripts which neither have cases nor even letters, it is impractical to require that class/type names should be all upper case letters. This is merely a matter of programming style where applicable and can have no significance in a program conforming to this specification.
abstract type name = Abstract_Signifier_SY,
(pervasive abstract name | identifier) ;

pervasive abstract name = external reference | Abs_Lock_SY | Attach_SY | Object_SY  ;

concrete type name = pervasive concrete name | identifier ;

pervasive concrete name = external reference | Aval_SY | Aref_SY | Bit_SY | Bool_SY | System_SY | Tuple_SY ;

external reference = Reference_SY ;

iter name = (pervasive iter name | identifier), Iter_Signifier_SY ;

pervasive iter name = Break_SY | Cluster_SY | Until_SY | While_SY ;

5.5 Keywords

keyword = Abstract_SY | Assert_SY | Attr_SY | Bind_SY | Case_SY | Class_SY | Constant_SY | Do_SY | Else_SY | Elsif_SY | End_SY | External_SY | Fork_SY | Guard_SY | If_SY | Immutable_SY | Include_SY | Inout_SY | Is_SY | Iter_SY | Library_SY | Lock_SY | Loop_SY | Once_SY | Out_SY | Parallel_SY | Parloop_SY | Partial_SY | Post_SY | Pre_SY | Private_SY | Protect_SY | Quit_SY | Raise_SY | Readonly_SY | Return_SY | Routine_SY | Same_Type_SY | Shared_SY | Stub_SY | Synchronise_SY | Then_SY | Typecase_SY | Unlock_SY | When_SY | With_SY | Yield_SY ;
NOTES 1. A number of words in previous lists of keywords were actually value expressions, etc and have been removed to appropriate other places in this document.
2. The keyword shown in red in the definition above is a proposed addition to the language to enable named libraries to be introduced.

5.6 Symbols and Operators

Sather requires a number of 'punctuation marks' which are either symbols needed in parsing of the form of punctuation or binary or unary operators which are required to be mapped to associated method calls.

symbol = required symbol | text quote mark
| unary operator | binary operator | binary logical operator  ;

required symbol = At_SY | Bar_SY | Colon_SY | Comma_SY | Do_Attach_SY | Fullstop_SY | Left_Angle_Bracket_SY | Left_Brace_SY | Left_Bracket_SY | Left_Parenthesis_SY | Rename_SY | Right_Angle_Bracket_SY | Right_Brace_SY | Right_Bracket_SY | Right_Parenthesis_SY | Semicolon_SY ;
text quote mark = Single_Quote_SY | Double_Quote_SY ;
unary operator = create op | negate op | not op ;
create op = Number_SY ;
negate op = Negate_Op_SY ;
not op = Tilde_SY ;
binary logical operator = and op | or op ;
and op = And_SY ;
or op = Or_SY ;
binary operator = assign op | divide op | equal op | greater than op | greater than or equal op | less than op | less than or equal op | minus op | modulus op | not equal op | plus op | power op | times op ;
assign op = Assign_Op_SY ;
divide op = Divide_Op_SY ;
equal op = Equal_Op_SY ;
greater than op = Greater_Op_SY ;
greater than or equal op = GEq_Op_SY ;
less than op = Less_Op_SY ;
less than or equal op = LEq_Op_SY ;
minus op = Minus_Op_SY ;
modulus op = Modulus_Op_SY ;
not equal op = NEq_Op_SY ;
plus op = Plus_Op_SY ;
power op = Power_Op_SY ;
times op = Times_Op_SY ;

5.7 Constant Literals

The Sather language permits the expression of four kinds of literal value in the source text of a Sather class.

constant literal = bit literal |
bool literal |
void literal |
numeric literal |
text literal ;

5.7.1 Bit Literals

bit literal = Bitset_SY | Bitclear_SY ;

5.7.2 Boolean Literals

bool literal = True_SY | False_SY ;

5.7.3 Void Literal

void literal = Void_SY ;

5.7.4 Numeric Literals

numeric literal = integer literal | whole number literal | approximate number literal ;
integer literal = sign, whole number literal ;
sign = [Plus_Op_SY | Minus_Op_SY] ;
whole number literal = hexadecimal literal | decimal literal | octal literal ;

hexadecimal literal = base prefix, Hexadecimal_Base_SY, hexadecimal digit,
{hexadecimal digit | separator character} ;

decimal literal = digit, {digit | separator character} ;

octal literal = base prefix, Octal_Base_SY, octal digit,
{octal digit | separator character} ;

base prefix = Digit_Zero_SY ;
NOTE There are culture scripts for which no Digit_Zero_SY exists - hence such cultures cannot represent a base for numeric value literals. This point must be addressed when revising the language specification!
approximate number literal = sign, whole number literal, Decimal_Point_SY, digit sequence,
[Exponent_SY, sign, digit sequence] ;

digit sequence = digit, {digit} ;

5.7.5 Text Literals

In the two production rules for text literal and for character literal given below, the opening text quote mark and the closing text quote mark must be the same character in each application of the rule.

text literal = character literal
| string literal ;

character literal = text quote mark, character literal mark, text quote mark ;

string literal = text quote mark, {character literal mark}, text quote mark ;

character literal mark = formatting character | culture defined character ;
formatting character = Escape_Sy, special character ;
special character = format control signifier | text quote mark | Escape_SY | Alert_SY ;

format control signifier = New_Line_Signifier_SY | Carriage_Return_Signifier_SY | Form_Feed_Signifier_SY | Horizontal_Tab_Signifer_SY | Vertical_Tab_Signifier_SY | Backspace_Signifier_SY ;

5.8 Separators

separator = white space | comment ;

5.8.1 White Space

white space = (space | format control code),
{space | format control code} ;

space = Space_SY ;
format control code = new line | horizontal tab | carriage return | new page | backspace | vertical tab ;

new line="ISO 6429 Line Feed encoding (LF)" ;
horizontal tab="ISO 6429 Horizontal Tabulation encoding (HT)" ;
carriage return="ISO 6429 Carriage Return encoding (CR)" ;
new page="ISO 6429 Form Feed encoding (FF)" ;
backspace="ISO 6429 Backspace encoding (BS)" ;
vertical tab="ISO 6429 Vertical Tabulation encoding (VT)" ;

5.8.2 Comment

comment = Hyphen_SY, Hyphen_SY, {comment body}, new line ;
comment body = (space | horizontal tab | backspace | culture defined visible character) ;

5.9 Culture Dependent Definitions

This section of the lexis specifies characters within specific character groups as specified by a Local Culture Specification (LCS) made in accordance with ISO/IEC 14652. Because of this it is defined in terms of natural language strings below.

culture defined character = "Any character in the repertoire of the LCS where the program was written and compiled" ;
culture defined letter = "Any character in the LCS in the group 'alpha'" ;
culture defined alphanumeric = "Any character in the LCS in the group 'alpha' or in the group 'digit'" ;
culture defined visible character = "Any character in the LCS in the group 'print'" ;
separator character = "A local culture defined thousands separator character encoding (defined in the numeric culture section)" ;
digit = "Any character in the LCS in the group 'digit'" ;
hexadecimal digit = "Any character in the LCS in the group 'xdigit'" ;
octal digit = "Any character in the LCS in the group 'digit' which does not represent the decimal value 8 or 9" ;

Specification Index Language Index
Comments or enquiries should be made to Keith Hopper.
Page last modified: Tuesday, 24 October 2000.
Produced with Amaya