Language Reference

The Rule Specification Language (RSL) is a language that operates in two modes. One mode where lines of text encountered at input are staged onto a buffer (the buffer mode), and one mode that controls the buffer (the control mode). When operating in one mode, the other mode is inactive. The buffer mode is active by default. If the buffer mode encounters a line with the dot character (.) as the first non-whitespace character, the control mode is activated. When the control mode encounters a line break, the buffer mode is turned back on. The program terminates when all input has been processed.

The following sections describe how the language works when operating in the control mode. For more information on the buffer mode, see Buffer Mode.

Note

Keywords and names are case insensitive. Names can be made up of any alpha (a-z, A-Z) or numeric (0-9) characters or underscore (_) character. Names cannot begin with a numeric character, and cannot conflict with keywords.

Basic Constructs

The following sections describe basic language constructs that share similarities with other general purposed programming languages.

Core Types

The RSL language define five core types; boolean, integer, real, string, and unique_id. The boolean type is limited to two values (true and false), whereas the other core types are unbounded. In practice however, all types are bounded. The exact ranges depend on the implementation. Generally, integers are represented by signed 64bit integers, reals by 64bit floating point numbers, unique_ids by 128bit unsigned integers, and strings are bounded by the amount of available RAM.

Note

There are additional types that are used to hold references to instances and fragments. These types are further explained in Model Interactions and Functions and Fragments.

Literal Values

Literal values can be entered for four of the core types. The table below exemplifies how these literal values are specified for each core type.

Core Type

Examples (separated by ,)

boolean

true, false

integer

0, 256, -1

real

0.0, -256.44

string

“Hello world”

Tip

Values of the core type unique_id can be created by reading the global information fragment attribute unique_num. See Global Information Fragment for more information.

Transient Variables

All transient variables are implicitly declared upon the first assignment. Assignments are expressed using the assign keyword as exemplified below:

.assign My_Boolean = true
.assign My_Integer = 42
.assign My_Real    = 3.14
.assign My_String  = "Hello world!"

Any subsequent assignment simply re-assign the same variable. A re-assignment of a variable to a different type is not allowed. A stack execution model is assumed. Variables are pushed onto the stack as they are implicitly declared and are popped off the stack as they fall out of scope. Any variable implicitly declared inside of a block falls out of scope when the end of the block is encountered.

Comments

Comments can be entered using the comment statement as exemplified below:

.comment My comment

A shorter variant inspired by C-like languages is also available:

.//My other comment

At least one whitespace character must follow the comment keyword. A whitespace character does not need to follow the shorter variant.

Expressions

Variables and values can be combined into expressions using operators. There are three kinds of expressions; unary, binary, and compound expressions. The following sections present operators that are valid for core types.

Note

There are additional operators available that are not valid for core types. These operators are further explained in Instances and Sets and Iterating Sets of Instances

Unary expressions

Unary expressions consist of one operator and one operand. Below is a table of unary operators that are valid for core types.

Unary Operator

Core Type(s)

Description

not

any

Logical negation

-

integer, real

Numeric negation

The following example demonstrates how to perform a numeric negation on an integer:

.assign Positive_Integer = 42
.assign Negative_Integer = -Positive_Integer

Binary expressions

Binary expressions consist of one operator and two operands. Below is a table of binary operators valid for core types.

Binary Operator

Description

and

logical AND

or

logical inclusive OR

+

arithmetic addition (integer & real) or concatenation (string)

-

arithmetic subtraction

*

arithmetic multiplication

/

quotient from arithmetic division

%

remainder from arithmetic division

<

less-than

<=

less-than or equal-to

=

equal-to

!=

not-equal-to

>=

greater-than or equal-to

>

greater-than

The following example demonstrates how to perform a numeric addition of two integers, concatenation of two strings, and a greater-than comparison between two integers:

.assign My_Addition = 42 + 5
.assign My_Concatenation = "Hello " + "world"
.assign My_Comparison = 5 > My_Addition

Note

In recent versions of the language, the and and or operators have short-circuit semantics. If the left operand of an and operation evaluates to false, the right operand is not evaluated. Likewise if the left operand of an or operation evaluates to true, the right operand is not evaluated.

Compound expressions

Compound expressions consist of several operators and operands that are combined using matching parentheses that determine precedence. The following example demonstrate a series of string concatenations.

.assign My_String = ("Hello" + (" " + "world"))

In the example above, ” “ and “world” are concatenated first. Then, “Hello” and ” world” are concatenated.

If, Elif and Else

The keywords if, elif and else can be combined to form a statement that control execution of other statements based on the outcome of boolean expressions. The following example demonstrate one way on how the three keywords may be combined.

.if (My_Control_Variable > 0)
    .// Do something
.elif (My_Control_Variable < 0)
    .// Do something else
.else
    .// Do nothing
.end if

Hint

Any number of elif constructs may be present in the same statement, and the else construct is optional.

While Loops

The while statement provides a general purpose iteration mechanism. The following example demonstrates how to compute the sum of all integers between one and ten.

.assign Sum = 0
.assign Counter = 0
.while (Counter < 10)
    .assign Counter = Counter + 1
    .assign Sum = Sum + Counter
.end while

The break while statement provide an alternative technique to end iterations. When executed, the break while statement causes control to be transferred to the statement after the end while statement corresponding to the innermost executing while loop. The following example performs the same computation as the previous example presented above, but using the break while statement to halt iteration.

.assign Sum = 0
.assign Counter = 0
.while (true)
    .if(Counter < 10
        .assign Counter = Counter + 1
        .assign Sum = Sum + Counter
    .else
        .break while
    .end if
.end while

Quoted Strings

Quoted strings get special handling in the language. Each quoted string is treated as a literal text line and is run through a variable substituter discussed in Substitution Variables. This allows simple string concatenation without using binary expressions. The following example concatenates the variables x and y with a whitespace between them.

.assign x = "Hello"
.assign y = "world"
.assign s = "${x} ${y}"

Note

Since quoted strings get run through a literal text substituter, use $$ to yield one $ character. In addition, use “” to yield one ” character. See Substitution Variables for more information.

Terminal Logging

The print statement can be used to print string literals to the standard output.

.print "Hello world"

Since the print statement only accept string literals, variables must be quoted before being printed. The following example prints the number 42 to standard output.

.assign My_Integer = 42
.print "${My_Integer}"

Program Termination

The exit statement can be used to terminate a program. Optionally, an integer based exit code may also be provided. For example:

.exit 1

Model Interactions

The following sections describe language features that allow interaction with an xtUML model. Below is a class diagram used by examples in these sections.

 -----------------                             ----------------------
| Class     {CLS} |                           | Other Class  {O_CLS} | prev
|------------------ *         R1         0..1 |----------------------|------
| Number: integer |---------------------------| Name: string         | 0..1 |
 -----------------             |               ----------------------       |
                               |                           0..1 | next   R2 |
                      ---------------------                      -----------
                     | Assoc Class {A_CLS} |
                     |---------------------|
                     | My_Boolean: boolean |
                      ---------------------

There are three classes in the example above; Class, Other Class, and Assoc Class. The text in the upper right corner within curly brackets on each class is called a key letter and is used as the class identifier in RSL. The three classes are associated to each other via the association R1. Furthermore, there is a reflexive association R2 on Other Class. Reflexive associations require a phrase to distinguish the directions of the links (next and prev in the example above). At the end of each link is the cardinality. The cardinality specify how many instances may be connected to a link.

The Assoc Class is a special kind of class called an association class. Such classes are used to add attributes to an association. The cardinality of links to association classes are not explicitly stated, they are implicitly assumed to be exactly one.

Note

The BridgePoint editor allow its users to specify links to association classes with the cardinality 1..*. Such association classes are rarely used, and should be avoided. The same semantics may be obtained by introducing a new class associated with the association class.

Instances and Sets

The introduction of instances and links into the language also brings new types. Specifically, the types inst_ref and inst_ref_set.

The type inst_ref acts as a reference to an instance of a class in the model, and is used to access instance attributes. The following table lists unary operators that are valid for transient variables of the type inst_ref.

Unary Operator

Description

empty

Check if the inst_ref operand refers to an instance

not_empty

Logical negation of the empty operator

cardinality

Count the number of instances the inst_ref operand refers to (zero or one)

The type inst_ref_set is used to holds references to several instances. The following table lists unary operators that are valid for transient variables of the type inst_ref_set.

Unary Operator

Description

empty

Check if the inst_ref_set operand contains any instance reference

not_empty

Logical negation of the empty operator

cardinality

Count the number of items the inst_ref_set operand refers to

There are also a number of binary operations that accept a mix of inst_ref and inst_ref_set operands. When any of the operands are of the type inst_ref, they are interpreted as an inst_ref_set that contains the referred to instance.

Binary Operator

Description

|

Returns the union of both operands

&

Returns the intersection between both operand

-

Returns a set of instance references that are in the left operand, but not in the right operand

^

Returns a set of instance references that are in the left operand or in the right operand, but not in both

==

Check if the intersection between both operands is empty

!=

Logical negation of ==

Note

There are additional unary operators for sets that are only valid during set iteration. See Iterating Sets of Instances for more information.

Selecting Instances

Instances may be selected from the model by using the key letter of the class. The following example demonstrates how to select any arbitrary instance of the class with the key letter CLS, and store a reference to the instance in a variable named inst.

.select any inst from instances of CLS

It is also possible to select several instances of some class using the many keyword instead of any. The following example selects all instances of CLS and stores an instance set reference in a variable named inst_set.

.select many inst_set from instances of CLS

Accessing Class Attributes

Class attributes may be accessed using the dot operator (.). The following example selects an arbitrary instance of CLS, and increment its Number attribute by one.

.select any inst from instances of CLS
.assign inst.Number = inst.Number + 1

Iterating Sets of Instances

The for each statement is used to iterate sets of instances. The following example computes the sum of all CLS.Number attributes.

.assign Sum = 0
.select many inst_set from instances of CLS
.for each inst in inst_set
    .assign Sum = Sum + inst.Number
.end for

During iteration, the following unary operators are supported.

Unary Operator

Description

first

Check if the inst_ref_set operand is on its first iteration

not_first

Logical negation of first

last

Check if the inst_ref_set operand is on its last iteration

not_last

Logical negation of last

The following example demonstrates how to generate a comma-separated list of O_CLS names.

.select many inst_set from instances of O_CLS
.assign s = ""
.for each inst in inst_set
    .assign s = s + inst.Name
    .if (not_last inst_set)
        .assign s = s + ", "
    .end if
.end for

Filtering Selections

Instance selections can be filtered using the where keyword. The selected keyword may be used inside a where-clause to access attributes on the instance currently being selected. The following example demonstrates how to select instances of CLS whose attribute Number is larger than 100.

.select many inst_set from instances of CLS where (selected.Number > 100)

Ordering Selections

Instance selections returned by select many statement may be ordered by one or more instance attributes. The ordered_by keyword sorts the resulting set in ascending order and the reverse_ordered_by keyword sorts the resulting set in descending order. If multiple attributes are specified, the set will be sorted by the first attribute and then within each value of this, by the second attribute and so on. The following example demonstrates how to select all the instances of PERSON and order them by age first, then alphanumerically by name.

.select many people from instances of PERSON ordered_by (age, name)

The following example demonstrates how to select all the instances of INVOICE and order them by greatest value.

.select many invoices from instances of INVOICE reverse_ordered_by (value)

Creating Instances

The create object instance statement is used to create new instances of a class. The following example creates an instance of CLS and assigns its Number attribute to five.

.create object instance cls of CLS
.assign cls.Number = 5

Connecting Instances

Instances can be connected and disconnected across associations using the relate and unrelate statements. The following example creates two instances of O_CLS and connects them across the reflexive association R2.

.create object instance inst1 of O_CLS
.create object instance inst2 of O_CLS
.relate inst1 to inst2 across R1.'other'

The following example disconnects them again.

.unrelate inst1 from inst2 across R1.'other'

Recent versions of the language allow connecting and disconnecting association classes in one single control statement. The following example creates one instance of CLS, O_CLS and A_CLS and then connects them to each other.

.create object instance cls of CLS
.create object instance other_cls of O_CLS
.create object instance assoc_cls of A_CLS

.relate cls to other_cls across R1 using assoc_cls

The following example disconnects them again, and deletes the association instance.

.unrelate cls from other_cls across R1 using assoc_cls
.delete object instance assoc_cls

Note

Disconnected instances of association classes violates model integrity and must be deleted manually.

Deleting Instances

The delete object instance statement is used to delete instances from the model. The following example selects an arbitrary instance of CLS and deletes it.

.select any inst from instances of CLS
.delete object instance inst

When an instance is deleted, the instance is removed from the class extent, and is unrelated from existing associations. Note that it is up to the user to ensure model integrity, e.g. that the data is not violating association constraints.

Warning

The delete statement only remove instances from the model, transient references may still refer to them. Depending on the language implementation, accessing such references may result in undefined behaviour.

Functions and Fragments

Functions allow reuse of blocks of control statements. All functions return a fragment. A fragment can be thought of as a pseudo-instance that has at least one, and possibly more attributes containing data specified by the function. The intent of functions is to use them to build fragments which can be organized into larger fragments and eventually used to build a whole generated file.

Note

All functions have their own literal buffer and cannot modify any other buffer when they operate in buffer mode.

Defining Functions

Functions are defined using the function statements, and parameters are defined using the param statement. In addition to the core types, three additional types can be used by parameters; inst_ref, inst_ref_set and frag_ref. The following example define a function f with one parameter of each type.

.function f
    .param boolean      My_Boolean
    .param integer      My_Integer
    .param real         My_Real
    .param unique_id    My_Unique_Id
    .param string       My_String
    .param inst_ref     My_Instance
    .param inst_ref_set My_Set
    .param frag_ref     My_Fragment
.end function

Tip

Recent versions of the language allow specifying the kind of class an inst_ref or inst_ref_set may refer to. The kind of class is specified using angle brackets as examplified below.

.function Func
    .param inst_ref<Key_Letter>     My_Instance
    .param inst_ref_set<Key_Letter> My_Set
.end function

When the kind of class is specified for an inst_ref or inst_ref_set, arguments are type checked accordingly.

Defining Fragment Attributes

Attributes may be defined for a fragment when the fragment is formed inside the function. The attribute body is always defined. After the invocation of a function, the body attribute contains the literal text buffered within the function while operating in buffer mode.

Additional attributes are defined by declaring transient variables inside the function with a name that starts with attr_. The following example defines a function name Func that return a fragment with two attributes; body and data.

.function Func
    .assign attr_data = "My Data"
.end function

Note

Be careful to make sure the attr_ variables are in scope when the end function statement is reached. Consider the following example.

.function Func
    .param integer p_value
    .if (p_value < 100)
        .assign attr_data = "Some Data"
    .else
        .assign attr_data = "Some other data"
    .end if
.end function

The example above results in the transient variable attr_data not becoming a fragment attribute since it falls out of scope with the if statement, and is therefore not on the stack when the end function statement is encountered.

A correct solution is the following:

.function Func
    .param integer p_value
    .assign attr_data = ""
    .if (p_value < 100)
        .assign attr_data = "Some Data"
    .else
        .assign attr_data = "Some other data"
    .end if
.end function

Invoking Functions

Functions are invoked using the invoke statement. The following example invokes a function named Func that takes an integer as parameter, then stores the returned fragment into a transient variable named Frag.

.invoke Frag = Func(4)

Tip

The returning fragment may be omitted from the syntax as exemplified below.

.invoke Func(4)

This may be useful when functions only modify the global scope, e.g. when modifying instances or emitting files to disk.

Available Builtin Functions

The language define a set of builtin functions. The following two functions can be used to read and modify environmental variables in the operating system.

.function get_env_var
    .param string name
.end function

.function put_env_var
    .param string name
    .param string value
.end function

The following function can be used to invoke the operating system shell with an arbitrary command.

.function shell_command
    .param string cmd
.end function

The following two functions can be used to read and write files on disk.

.function file_read
    .param string filename
.end function

.function file_write
    .param string filename
    .param string text
.end function

The following functions can be used to convert values of various core types.

.function string_to_integer
    .param string value
.end function

.function string_to_real
    .param string value
.end function

.function integer_to_string
    .param integer value
.end function

.function real_to_string
    .param real value
.end function

.function boolean_to_string
    .param boolean value
.end function

Global Information Fragment

There is a special fragment named info that is always accessible. The word info is thus a keyword and cannot be used to name a transient variable.

The following table lists all attributes accessible from the info fragment.

Attribute Name

Description

date

current date and timestamp

user_id

user id of the using running the program

arch_file_name

basename of the rule file currently being executed

arch_file_line

current line number of the executing file

arch_file_path

full path to the executing file

arch_folder_path

full path to the folder containing the executing file

interpreter_version

the name and version of the RSL interpreter

interpreter_platform

the name of that platform on which the interpreter is running

unique_num

returns a unique_id each time it is accessed. For example the first time it is referenced, it may produce 1, the next time 2, the next time 3, and so on. The order of the unique numbers generated is guaranteed to be exactly the same from one invocation of the program to the next.

The following example creates a string that contains the current date and time.

.assign s = "Current date and time is: " + info.date

Including Files

The include statement can be used to include files. The following example includes a file named my_file.inc.

.include "my_file.inc"

When a file is included, a marker is placed on the stack and the execution continues on the first line of the included file. When all lines in the included file have been processed, all variables pushed onto the stack since the include marker was pushed are considered out of scope (and therefore popped from the stack). The execution then resumes on the line following the include statement.

Note

Transient variables that are accessible just before a file is included are also accessible from the within included file.

Emitting Buffered Text

The emit to file statement can be used to output buffered text to disk. The following example emits the buffer to a file named emit_data.txt into a folder named data located in the current working directory.

.emit to file "data/emit_data.txt"

The emit statement also clears the buffer’s contents.

If an emitted file already exists, the contents of the new file are compared to the existing file. If the files are the same, then the existing file is left undisturbed, so that modification times are left intact. If the files are different, the existing file is replaced with the newly generated file.

Note

Folders leading up to the filename are created automatically.

To clear the contents of the buffer without emitting the contents to a file, the clear statement can be used as exemplified below.

.clear

Buffer Mode

The following sections describe how the language behave in the buffer mode. Specifically, how to access variables defined in the control mode, how to transform strings using formatters and parse keywords, and how to escape special characters.

Substitution Variables

Literal text lines can contain substitution variables which allow you to access variables defined in the control mode and place its content in a buffer so it can be emitted to text files. The following example define a transient variable named Data in the control mode, and puts its value into the buffer surrounded by the html tag div.

.assign Data = "Some text"
<div>${Data}</div>

When emitted to a file, the above example would produce the following output.

<div>Some text</div>

Parse Keywords

A parse keyword is a piece of text placed in a string-based variable. Text that follows the parse keyword, up to the next line break character, can be extracted.

.assign Data = "VALUE: Hello world"
${Data:VALUE}

The example above produce the literal text Hello world.

Transforming Substitution Variables

Values held by a substitution variable can be transformed by a number of pre- defined format characters, e.g. converting all characters to upper-cased letters (the character u), or replacing whitespaces with underscore (using the underscore character).

.assign Data = "Some text"
<div>$u_{Data}</div>

When the example above is executed, the following literal text is produced.

<div>SOME_TEXT</div>

The table below list all pre-defined format characters available in the language.

Format Character

Transformation Function

u

Upper - make all characters upper case

c

Capitalize - make the first character of each word capitalized and all other characters of a word lowercase

l

Lower - make all characters lowercase

_

Underscore - change all whitespace characters to underscore characters

r

Remove - remove all whitespace. Note: The removal of whitespace occurs after the capitalization has taken place in the case of the CR or RC combination.

o

cOrba - make the first word all lowercase, make the first first character of each following word capitalized and all other characters of the words lowercase. Characters other than a-Z a-z 0-9 are ignored.

The following table lists example input and output for various combinations of pre-defined format characters.

Input

Format

Output

Example Text

u

EXAMPLE TEXT

Example Text

u_

EXAMPLE_TEXT

Example Text

ur

EXAMPLETEXT

ExamplE TExt

c

Example Text

ExamplE TExt

c_

Example_Text

ExamplE TExt

cr

ExampleText

ExamplE TExt

l

example text

ExamplE TExt

l_

example_text

ExamplE TExt

lr

exampletext

ExamplE@34 TExt

o

example34Text

Defining Custom Format Characters

It is possible for a user to define its own custom format characters. These format characters must start with the letter t. When using multiple format characters at the same time, the user-defined format character must be specified last. User-defined format characters are applied before any other pre-defined format characters (i.e., $ut{…} applies t first, then u). The default transformation function when nothing is supplied by the user leaves the string unchanged.

The following example demonstrate how to define a new format character in pyrsl that remove quotes from strings.

from rsl import gen_erate
from rsl import bridge
from rsl import string_formatter

@string_formatter('trmquot')
def remove_quot(s):
    QUOTES = "'\""
    first_index = 0
    last_index = len(s) - 1

    if s[0] in QUOTES:
        first_index += 1

    if s[-1] in QUOTES:
        last_index +- 1

    return s[first_index:last_index]

print('Running my custom version of gen_erate')
rc = gen_erate.main()
sys.exit(rc)

The following example demonstrate how to use the format character defined above.

.assign s = "'hello world'"
$trmquot{s}

When the example above is executed, the value of s is transformed from ‘hello world’ into hello world.

Escaping Special Characters

A literal text line with the dot dot character sequence (..) as the first non-whitespace characters results in the dot character being emitted. A dot character anywhere else in the literal text line results in a dot character being emitted (i.e. no special treatment).

The dollar character ($) is used by the buffer mode to access variables defined in the control mode. Consequently, to stage a dollar character onto the buffer, the character sequence $$ shall be used.

Newline characters at the end of a line of literal text are passed through to the emitted output. If you do not want a newline at the end of an emitted line (presumably due to control statement constraints), then place a backslash character (\) as the last character of the literal text line. The \\ character sequence as the last two characters of the literal text line results in one backslash character and one newline character as the last characters of an emitted line. The \\\ character sequence as the last three characters of a line of literal text results in one backslash character as the last character of an emitted line with no newline character.

The following table summerize the escaping rules presented above.

Character

Position

To Generate at Position

.

First non-whitespace

..

$

Any

$$

\ (with new line)

Last

\\

\ (without new line)

Last

\\\