Sather Home Page

Section 8.16.1.4:
$TEXT_STRING

This page defines two generic abstract classes named $TEXT_STRING which have different numbers of class arguments

abstract class $TEXT_STRING{ELT < $IS_EQ} < $STRING{ELT}

Inheritance map $IS_EQ $ELT $HASH $STRINGS

Formal Definitions

This abstract class defines a state component which is a set of all instantiations of objects of any class sub-typing from this class in addition to the vdm model types used wherever this class name is used. Note that SAME has to be an instantiated class, not an abstract one.

types

SAME = object_type ;
$STRING_ELT = set of object_type

state

multi : $STRING_ELT
inv multi_types ==
forall obj in set multi_types & sub_type($STRING_ELT,obj)
NOTE See the important note about vdm state in the notes on vdm-sl usage in this specification.

This abstract class characterises the concept of all forms of simple string whether binary, text or other as sequences of the argument class (elements) which must sub-type from $IS_EQ. Classes which sub-type from this shall have immutable semantics!


create

All forms of text string require to be able to create a new string from a single text element. This is independent of the type of element.

create (
val : ELT
) : SAME
Formal Signature
create(val : ELT) res : SAME
Pre-condition

Since this is a creation feature, the pre-condition is vacuously true.

Post-condition
post res = [val]

This creation feature returns a new text string consisting of the single element given.


index_lib

This feature is the cultural and coding which is associated with the string. It need not be the default culture and coding for the environment in which the program is executing, since a program may manipulate cultur objects independently of local textual representations.

index_lib : LIBCHARS
Formal Signature
index_lib(self : SAME) res : LIBCHARS
Pre-condition

Since the string has to exist then so does this component. The pre-condition, therefore, is vacuously true.

Post-condition

This is also vacuously true, since it is a component of every string of text.

This feature provides access to all of the cultural and environment dependencies relating to this character string.




abstract class $TEXT_STRING{ELT < $IS_EQ, FTP < $FTEXT_STRING{ELT}, STP < $TEXT_STRING{ELT}} < $TEXT_STRING{ELT}, $SEARCH{ELT}, $BINARY

Inheritance map $IS_EQ $ELT $HASH $FSTRINGS $STRINGS $FTEXT_STRING{ELT} $TEXT_STRING{ELT}

Formal Definitions

This abstract class defines a state component which is a set of all instantiations of objects of any class sub-typing from this class in addition to the vdm model types used wherever this class name is used. Note that SAME has to be an instantiated class, not an abstract one.

types

SAME = object_type ;
$TEXT_STRING_ELT_FTP_STP = set of object_type

state

multi : $TEXT_STRING_ELT_FTP_STP
inv multi_types ==
forall obj in set multi_types & sub_type($TEXT_STRING_ELT_FTP_STP,obj)
NOTE See the important note about vdm state in the notes on vdm-sl usage in this specification.

This abstract class characterises the concept of a text string as a sequence of the argument class (elements) which must sub-type from $IS_EQ. The second and third class arguments are the 'corresponding' mutable ($FTEXT_STRINGS{ELT}) and immutable (sub-typing from $TEXT_STRINGS{ELT}) string classes. Classes which sub-type from this class shall have immutable semantics!

Auxiliary Functions

The specification of the strip feature in this class needs the following auxiliary functions.

functions

lmark () res : STP

post res = CHAR_STR.str(LIBCHARS.Line_Mark(STP.index_lib(buffer)))

lm_tail : SAME -> BOOL
lm_tail(str) ==

let test = str(1, ..., (len str - len lmark)) in
test = lmark

remove_lm : SAME -> SAME
remove_lm(str) ==

if str = [] then
[]
elseif lm_tail(str) then
remove_lm(str(1, ..., (len str - len lmark)))
else
str
end

build

This feature replaces the one inherited from $BINARY which makes use of the execution environment default repertoire and encoding in building the resultant text string.

build (
cursor : BIN_CURSOR
) : SAME
Formal Signature
build(cursor : BIN_CURSOR) res : SAME
Pre-condition
pre not cursor.is_done
Post-condition
post let width = lib.my_size in
((BIN_CURSOR.remaining(cursor) mod width > 0) or not exists idx1, idx2 in set inds cursor.buffer & (idx2 = idx1 + width - 1)
and REP_MAP.is_valid_encoding(LIBCHARS.culture(LIBCHARS.default()).charmap, cursor.buffer(idx1, ..., idx2))
and (cursor.index = cursor~.index))

or (cursor.is_done
and let res be st forall idx in set inds res &
let start = idx - 1 * width,
finish = start + width - 1 in
binstr(res(idx)) = cursor.buffer(start, ..., finish)

This routine builds a new string from the binary string indicated using the encoding and repertoire defined by the external execution environment. If there is not an exact number of character codes in the string then void is returned and the cursor has not been moved.


build

This feature makes use of the given encoding and repertoire rather than the execution environment default in building the resultant text string.

build (
cursor : BIN_CURSOR,
lib : LIBCHARS
) : SAME
Formal Signature
build2(cursor : BIN_CURSOR, lib : LIBCHARS) res : SAME
Pre-condition
pre not cursor.is_done
Post-condition
post let width = lib.my_size in
((BIN_CURSOR.remaining(cursor) mod width > 0) or not exists idx1, idx2 in set inds cursor.buffer & (idx2 = idx1 + width - 1)
and REP_MAP.is_valid_encoding(LIBCHARS.culture(lib).charmap, cursor.buffer(idx1, ..., idx2))
and (cursor.index = cursor~.index))

or (cursor.is_done
and let res be st forall idx in set inds res &
let start = idx - 1 * width,
finish = start + width - 1 in
binstr(res(idx)) = cursor.buffer(start, ..., finish)

This routine builds a new text string from the binary string indicated using the encoding and repertoire defined by lib. If there is not an exact number of character codes in the string then void is returned and the cursor has not been moved.


strip

This feature provides a facility of removing line marks from the end of a string (if there are any there). Multiple line marks at the end will be removed, irrespective of any escaping mechanism.

strip : SAME
Formal Signature
strip(self : SAME) res : SAME
Pre-condition
pre true
Post-condition

This post-condition uses the auxiliary function remove_lm defined above.

post res = remove_lm(self)

This feature removes as many line marks as are found at the end of the string, returning the result.


is_upper

This predicate tests to determine if the string contains all upper-case letters (being defined by the current execution environment cultural specification as being in the class 'upper'). Note that where a script does not define any upper case letters - or has no case distinction at all then the result will be identically false - even though the characters are letters.

is_upper : BOOL
Formal Signature
is_upper(self : SAME) res : BOOL
Pre-condition
pre size(self) > 0
Post-condition
post res =
forall index in inds self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Upper_Case

This predicate returns true if and only if every element of self is upper-case, otherwise false. Where there is no case distinction in the script concerned then this returns identically false.


is_lower

This predicate tests to determine if the string contains all lower-case letters (being defined by the current execution environment cultural specification as being in the class 'lower'). Note that where a script does not define any lower case letters - or has no case distinction at all then the result will be identically false - even though the characters are letters.

is_lower : BOOL
Formal Signature
is_lower(self : SAME) res : BOOL
Pre-condition
pre size(self) > 0
Post-condition
post res =
forall index in inds self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Lower_Case

This predicate returns true if and only if every element of self is lower-case, otherwise false. Where there is no case distinction in the script concerned then this returns identically false.


char

char (
index : CARD
) : ELT
Formal Signature
char(self : SAME, index : CARD) res : ELT
Pre-condition
pre index < self(size)
Post-condition

Note that the index in this post-condition is incremented by one to take account of the indexing difference between Sather and vdm.

post res = self(index + 1)

This routine returns the element to be found at the indicated position in self.


upper

This routine creates a copy of self in which all lower case letters are replaced by an upper case equivalent if one exists. Note that there are scripts (eg Armenian) which have lower case letters to which there is no corresponding upper case letter. If no upper case equivalent exists then no change is made to a letter code. Non-letter codes are not changed.

upper : SAME
Formal Signature
upper(self : SAME) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post let upindices : set of nat1 =
{index |forall index in set dom self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Lower_Case} in
forall idx in set upindices & self(idx) in set UNICODE.Lower_only
or res(idx) = CHAR_MAPPING.to_domain(self(idx))

This routine returns a copy of self in which every lower case character has been converted to its upper case equivalent provided one exists.


lower

This routine creates a copy of self in which all upper case letters are replaced by a lower case equivalent. Non-letter codes are not changed.

lower : SAME
Formal Signature
lower(self : SAME) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post let upindices : set of nat1 =
{index | forall index in set dom self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Upper_Case} in
forall idx in set upindices & res(idx) = CHAR_MAPPING.to_range(self(idx))

This routine returns a copy of self in which every upper case character has been converted to its lower case equivalent.


capitalize

This routine creates a copy of self in which the first character of each word is converted to its upper case equivalent (if one exists). The start of a word is defined as either the first character in the string unless that is white space or punctuation, otherwise the first character following a whitespace or punctuation character unless that is itself white space or punctuation.

capitalize : SAME
Formal Signature
capitalize(self : SAME) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post let space : set of ETP = CHAR_TYPES.classes(CHAR_CLASS.Space) union
CHAR_TYPES.classes(CHAR_CLASS.Punctuation) in
let capindices : set of nat1 = {index | forall index in inds self &
((index = 1) and self(index) not in set space)
or (self(index) not in set space

and self(index - 1) in set space} in
forall idx in set capindices & self(idx) in set UNICODE.Lower_only
or res(idx) = CHAR_MAPPING.to_domain(self(idx))

This routine returns a copy of self in which the first character of every word (from the beginning of the string or after punctuation or a whitespace) is converted to its upper case equivalent if one exists.


repeat

This feature returns a text string which is the concatenation of self the given number of times.

repeat (
cnt : CARD
) : SAME
Formal Signature
repeat(self : SAME, cnt : CARD) res : SAME
Pre-condition
pre (size(self) > 0)
and (cnt > 0)
Post-condition
post forall idx in set {1,...,cnt} &
let start : nat1 = (idx - 1) * size(self) + 1 in
forall index in set {start,...,(start + size(self))}, index2 in inds self &
self(index2) = res(index)

This routine returns a new string which contains the contents of self concatenated cnt times.


replace

This feature enables arbitrary element substitution to be made over the entire text string.

replace (
old_elt : ELT,
new_elt : ELT
) : SAME
Formal Signature
replace(self : SAME, old_elt : ELT, new_elt : ELT) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post forall index in inds self & ((self(index) = old_elt)
and (res(index) = new_elt))
or (self(index) = res(index)

This routine returns a new string which is a copy of self apart from which each occurrence of old_elt has been replaced by new_elt.


replace

This second variant of this feature enables simple set substitution to be made, any element in the string which is treated as if it were a set of elements being replaced by the given replacement element.

replace (
test_set : STP,
new_elt : ELT
) : SAME
Formal Signature
replace(self : SAME, test_set : STP, new_elt : ELT) res : SAME
Pre-condition
pre size(self) > 0
and STP.size(test_set) > 0
Post-condition
post forall index in inds self & ((self(index) in set dom test_set)
and (res(index) = new_elt))
or (self(index) = res(index)

This routine returns a copy of self in which all occurrences of any element in set are replaced by new_elt.


remove

This feature returns a copy of self in which every occurrence of elt has been deleted.

remove (
elt : ELT
) : SAME
Formal Signature
remove(self : SAME, elt : ELT) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post res = [self(index) | forall index in inds self & self(index) <> elt]

This routine returns a copy of self from which all occurrences of elt have been removed.


remove

This feature returns a copy of self in which every occurrence of an element which is in the str argument has been deleted. The string argument is treated as if it were a set of elements.

remove (
test_set : STP
) : SAME
Formal Signature
remove2(self : SAME, test_set : STP) res : SAME
Pre-condition
pre size(self) > 0
and STP.size(test_set) > 0
Post-condition
post res = [self(index) | forall index in inds self & self(index) not in set dom test_set]

This routine returns a copy of self from which all elements contained in test_set have been removed.


escape

This routine provides a facility to convert a text string into one with escape elements inserted. This is frequently useful when it is necessary to process the string by some external service which may treat the elements in elist specially unless preceded by an escape element. The list argument is treated as if it were a set of elements. Note that the list argument may be empty, in which case the only changes which occur is the duplication of every escape element.

escape (
escape : ELT,
elist : STP
) : SAME
Formal Signature
escape(self : SAME, escape : ELT, elist : STP) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post let test_set : set of ELT = dom elist union {esc} in
res = escaped(self,esc,test_set)


escaped : SAME * ELT * set of ELT -> SAME

escaped(me,escape,test_set) ==
let loc_res : SAME =
let head = hd me in
if head in set test_set then
[escape,head]
else
[head] in
if tl me = [] then
loc_res
else
loc_res ^ escaped(tl me,escape,test_set)

This routine returns a text string which is a copy of self in which all elements occurring in elist - and the escape element itself - are preceded by the escape element.


minus

This feature returns a copy of self from which the first occurrence (if any) of str has been removed.

minus (
str : STP
) : SAME
Formal Signature
minus(self : SAME, str : STP) res : SAME
Pre-condition
pre size(self) > 0
and size(self) >= STP.size(str)
Post-condition
post let tmp : [seq of ELT] be st
(head ^ tmp ^ tail = self)
and ((tmp = str)

or (tmp = nil)) in
res = head ^ tail

This routine returns a copy of self from which the first (if any) occurrence of str has been deleted.


minus

This variant of the minus feature returns a copy of self from which the first occurrence after the given index position (if any) of str has been removed.

minus (
str : STP,
start : CARD
) : SAME
Formal Signature
minus2(self : SAME, str : STP, start : CARD) res : SAME
Pre-condition
pre size(self) > 0
and size(self) >= STP.size(str) + start
Post-condition
post let ignored : [seq of ELT] be st ignored ^ self(1,...,(start + 1)) = self in
let tmp : [seq of ELT] be st
(head ^ tmp ^ tail = self)
and ((tmp = str)

or (tmp = nil)) in
res = ignored ^ head ^ tail

This routine returns a copy of self from which the first (if any) occurrence of str after the starting index has been deleted.


rev!

This feature corresponds to the elt! feature. This one yields the values of the individual elements of self starting with the one with the highest index and thereafter successively lower indices.

rev! : ELT
Formal Signature

Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.

rev_iter(self : SAME) yld : ELT
Pre-condition
pre size(self) > 0
Post-condition

This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).

post yld = self(size(self) - size(history~)
and history = history~ ^ yld
Quit condition

For quit actions see the specificatiion of the quit statement.

errs QUIT : size(history) = size(self) -> quit

This iter yields the elements of self in reverse order of the indices.


Codes from text strings

A text string consists of text elements which may have one or more codes per element (in Telugu or Vietnamese, for example). One of the necessary features of internationalising the required library, therefore, has resulted in the concept of a character code - the class CHAR_CODE. The routines in this section are provided to manipulate these when doing such things as code/character conversion/substitution operations.

create

All forms of text string require this form of creation operation in order that composition of characters may be effected. This merely returns a text string containing the element denoted by the single code. Note that this code may not be a combining code(see the class UNICODE for further information on this).

create (
code : CHAR_CODE
) : SAME
Formal Signature
create(code : CHAR_CODE) res : SAME
Pre-condition

Since the code can have any value and the string takes its encoding from that, the pre-condition is vacuously true.

Post-condition
post size(res) = 1

This creation routine returns a single element string formed from the encoding given.


code!

This is the first of a pair of code yielding iters. Do not assume that the number of codes yielded will correspond to the number of elements in the text string. That is only true for text strings in which all elements happen to have a single code!

code! : CHAR_CODE
Formal Signature

Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.

code_iter1(self : SAME) yld : CHAR_CODE
Pre-condition
pre size(self) > 0
Post-condition

This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).

post let codes : seq of ELT be st codes = self in
yld = codes(card history~ + 1)
and history = history~ ^ yld
Quit condition

For quit actions see the specification of the quit statement.

errs QUIT : let codes : seq of ELT be st codes = self in
card(history) = card(codes) -> quit

This iter yields each individual character encoding in self in sequence using the repertoire and encoding of the text string.


code!

code! (
start : CARD
) : CHAR_CODE
Formal Signature

Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.

code_iter2(self : SAME, start_code : CARD) yld : CHAR_CODE
Pre-condition
pre size(self) > 0
Post-condition

This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).

post let codes : seq of CHAR_CODE be st codes = self((start_code + 1),..., card self) in
yld = codes(card history~ + 1)
and history = history~ ^ yld
Quit condition

For quit actions see the specification of the quit statement.

errs QUIT : let codes : seq of CHAR_CODE be st codes = self((start_code + 1),..., card self) in
card(history) = card(codes) -> quit

This iter yields individual character encodings in self in sequence beginning with the first code in the element at the given index in the string.


Language Index Library Index String Index
Comments or enquiries should be made to Keith Hopper.
Page last modified: Wednesday, 29 November 2000.
Produced with Amaya