Tokens

Token is atomic element of Malsys grammar. In token can not be any whitespace character, however whitespace characters are often used to separate individual tokens.

Identifier

1
ID = (ALPHABETIC_CHAR | '_') (ALPHABETIC_CHAR | DIGIT | '_' | ''')*

Formal grammar of identifier is simplified, to avoid using characters groups in unicode. ALPHABETIC_CHAR matches any letter and DIGIT matches any digit. From definition is obvious, that identifier can not start with digit.

Number

1
2
3
4
5
6
NUMBER =
| [0-9]+ ('.' [0-9]+)? ([eE] ('+'|'-')? [0-9]+)?
| '0'[bB] [01]+
| '0'[oO] [0-7]+
| '0'[xX] ([0-9] | [a-f] | [A-F])+
| '#' ([0-9] | [a-f] | [A-F])+

Malsys supports 5 different formats of number literal.

  • Floating-point format
  • Binary format with prefix 0b
  • Octal format with prefix 0o
  • Hexadecimal format with prefix 0x
  • Hexadecimal format with prefix #

All numbers in Malsys are stored in double-precision floating-point format. Precision of numbers is about 16 decimal digits.

Operator

1
2
3
4
5
OPERATOR = (first_op_char op_char*) | '==' | '/'
 
first_op_char = '!'|'$'|'%'|'&'|'*'|'+'|'\\'|'<'|'>'|'@'|'^'|'|'|'~'|'?'|':'|'-'
 
op_char = first_op_char | '=' | '/'

Not all combinations of characters are defined as operator in Malsys, see list of predefined operators.