The Onyx programming language
version 0.0.0, August 23, 2020
- Foreword
- Introduction
- 1. Scope
- 2. Normative references
- 3. Terms and definitions
- 4. Symbols and abbreviated terms
- 5. Conventions
- 6. Terms
- 7. Design
- 8. Syntax
- 9. Variables
- 10. Functions
- 11. Safety
- 12. Types
- 13. Structs
- 14. Traits
- 15. Core API
- 16. Directives
- 17. Literals
- 18. Interoperability
- Appendix A: Compiled Syntax
- Appendix B: Optimization
- Appendix C: NXAPI
- C.1. Schema
- C.2. Structure
- C.3. Nodes
- C.3.1. Special nodes
- C.3.2.
delayed_macro_expr
- C.3.3.
namespace_decl
- C.3.4.
var_decl
- C.3.5.
function_decl
- C.3.6.
forall_element
- C.3.7.
arg_decl
- C.3.8.
arg_pass
- C.3.9.
type_expr
- C.3.10.
type_expr_op
- C.3.11.
type_ref
- C.3.12.
trait_decl
- C.3.13.
struct_decl
- C.3.14.
enum_decl
- C.3.15.
enum_val_decl
- C.3.16.
annotation_decl
- C.3.17.
annotation_call
- C.3.18.
intrinsic_decl
- C.3.19.
delayed_intrinsic_call
- C.4. Examples
- Appendix D: Core API
- Appendix E: Macros API
- Glossary
Foreword
NXSF (the Onyx Software Foundation) is an international non-profit organization with a mission to develop the Onyx programming language and its ecosystem. More information on NXSF can be found at https://nxsf.org.
This document is developed by the NXSF Onyx Standard Committee. Any feedback or questions on this document should be directed to the NXSF Onyx Standard Committee at [email protected].
A entity willing to become a member of the NXSF Onyx Standard Committee or any other NXSF Committee to obtain voting rights on developing documents should consult to the information found at .
The procedures used to develop this document and those intended for its further maintenance are described in the NXSF Directives found at https.
This document follows the scheme. Multiple major versions of this document may co-exist simultaneously, but every revision replaces previous minor and patch versions of this document.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. NXSF shall not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the NXSF list of patent declarations received found at .
Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.
Introduction
The Onyx programming language is a statically-typed general purpose programming language with the following design goals:
-
Performance in mind.
1. Scope
This specification describes the form and establishes the interpretation of programs written in the Onyx programming language. It describes:
-
The representation of Onyx programs;
-
The syntax and constraints of the Onyx language;
-
The semantic rules for interpreting Onyx programs;
-
The restrictions and limits imposed by a conforming implementation of Onyx.
This specification does not describe:
-
The mechanism by which Onyx programs are transformed for use by a data-processing system;
-
The mechanism by which Onyx applications are invoked for use by a data-processing system;
-
The mechanism by which input data are transformed for use by a Onyx application;
-
The mechanism by which output data are transformed after being produced by a Onyx application;
-
The size or complexity of a program and its data that will exceed the capacity of any specific data- processing system or the capacity of a particular processor;
-
All minimal requirements of a data-processing system that is capable of supporting a conforming implementation.
2. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
-
ISO 9000:2015, Quality management systems — Fundamentals and vocabulary
-
ISO/IEC 9899:2018, Information technology — Programming languages — C
-
IEEE 754-2008, IEEE Standard for Floating-Point Arithmetic
-
ISO/IEC 10967 (all parts), Information technology — Language independent arithmetic
-
ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)
3. Terms and definitions
For the purposes of this specification, the following definitions apply. Other terms are defined where they appear in italic type or on the left side of a syntax rule. Terms explicitly defined in this specification are not to be presumed to refer implicitly to similar terms defined elsewhere. Terms not defined in this specification are to be interpreted according to ISO/IEC 2382.1. Mathematical symbols not defined in this specification are to be interpreted according to ISO 80000-2.
5. Conventions
This section describes conventions used in this specification.
This specification may borrow terms and practices from different standards on how to write standards, such as The ISO/IEC Directives, Part 2. However, it does not aim to fully comply with either, unless explicitly stated another.
The specification is written in English language with Oxford spelling.
5.1. Specification units
- Specification unit
-
A specification unit is a paragraph, an example block, a syntax definition listing, a table or a figure.
A specification unit other than an example block may be either normative or informative. An example block is always informative.
A specification unit other than an example block is normative by default. The following notation is used for informative units:
This is an informative paragraph. |
A single paragraph contains a single statement or multiple closely related or logically bound statements. To help clarify the contents of a paragraph, it may contain a number of examples.
A paragraph may not contain the same statements worded differently. In other words, a paragraph shall be written in the most unambiguous and shortest form.
5.2. Examples
An example contained in a specification unit other than an example block is called an inline example and inherits the normative status of the unit.
This example is informative.
An example block must not be indexed. Instead, it is considered related to the previous specification unit, or the containing section.
6. Terms
Contents of a specification unit exist in the context of parent sections, recursively. For example, the term "specification unit" is contained in the same section as the current paragraph, thus considered already known.
A term written in italic with an adjacent index has an explicit definition in either this or another document at the index location. For example, the term grapheme [T1] may be found at index T1 of this document. The term specification [ISO 9000:2015 § 3.8.7]) should be looked up in the ISO document. And this term [1] definition is to be found in the footnotes.
6.1. Syntax
Syntax definitions may contain semantic definitions in comments, but these semantic definitions are NOT normative, and are for convenience use only.
7. Design
This section describes design rationale of the Onyx language and accompanying standards.
Infer as much as possible, unless ambiguous.
Multiple syntaxes to achieve the same task is appropriate in some cases.
It is good to have full-character, short-character and symbolic representation of something so a developer could choose between readability and code size.
For example, equals?
, eq
and ==
.
The language is built on a finite set of simple rule blocks. A lesser set is better. For example, function and generic arguments follow the same semantics and syntax.
7.1. Annotations VS keywords
The less keywords are in the language, the better.
Reuse keywords as much as possible.
If an annotation is very often used, consider replacing it with a keyword.
However, low-level annotations should not be keywords.
For example, @[Align]
and @[Volatile]
are annotations and not variable modifiers.
8. Syntax
This specification uses EBNF for language and other syntax definition, unless mentioned another.
The entire compiled syntax can be found in Appendix A, Compiled Syntax. |
The specification extends EBNF with variables: a nonterminal may have a list of arguments prefixed with $
with optional default values, wrapped in parentheses.
Once referenced within the nonterminal production rule, an argument is immediately replaced with its value.
Recursion and nested variables are allowed.
A nonterminal with arguments must always be used with parentheses, even if there are no actual arguments.
foo($arg = "none") = "bar" | $arg;
a = foo(); (* Expands to `"bar" | "none";`)
b = foo("qux"); (* Expands to `"bar" | "qux";` *)
c = foo(b); (* Expands to `"bar" | "bar" | "qux";` *)
binary = "0" | "1";
octal = binary | "2" | "3" | "4" | "5" | "6" | "7" | "8";
digit = octal | "8" | "9";
(* Design rationale: hexadecimals are in upper-case to clearly
distingusigh them from exponents and prefixes (`Q` and `D`
are exceptions). Multiplier prefixes can not be used in
hexadecimal literals, so they don't mix. *)
hex = digit | "A" | "B" | "C" | "D" | "E" | "F";
(* Any Unicode abstract character, including
those consisting of multiple codepoints. *)
unicode = (? A Unicode character ?);
9. Variables
A variable is a pointer to a region of memory containg a value.
9.1. Declaration
TODO: Declaring means setting an identifier and a type.
Defining is giving it a meaningful value.
Defining is done using the :=
operator.
An Onyx variable always has an either impilcitly or explicitly defined storage.
Accessing an uninitialized variable is undefined behavior, hence unsafe.
It is possible to explicitly assign a variable to an uninitialized T
value, where T
is the variable type; such an assignment is unsafe.
Such an explicitly uninitialized variable would be considered safely initialized in the later code.
let x = unsafe! uninitialized SInt32 # let x : SInt32 = unsafe! uninitialized # Ditto, the type is infered
puts(x) # OK from the point of a compiler view.
Prior to an access from a specialization, a variable must be explicitly declared.
9.2. Variable storage
A variable storage defines its location and lifetime boundaries.
There are three variable storage options: static, instance or local.
A variable declaration within a primitive, trait or enum requires an explicit static
storage modifier.
Variables declared within a struct have implicit instance
storage by default.
An explicit instance
or static
storage modifier can be specified for struct variables.
9.3. Static variables
A static variable lifetime spans to the lifetime of the containing process. Therefore, a static variable is accessible at any point of the program execution.
A final static variable must be both declared and assigned to in the same statement.
A final static variable containing a constant or scalar object may be placed into the read-only executable section.
9.4. Instance variables
An instance variable exists in the state of a particular object instance. All instance variables are finalized after the instance is finalized, in undefined order.
A final instance variable must be defined at the moment of the containing object instance initialization. Therefore, it must be defined either in the definition statement or in any initializer having access to the variable.
Only structs can contain instance variable declarations.
9.5. Local variables
Local variables exist in the scope of a single function call. Once the function returns, its local variables are finalized.
A final local variable can have its definition separate from declaration.
Prior to read, a local variable must be explicitly defined.
Variables declared within functions have local storage by default.
9.6. Final variables
A final variable is declared using the final
keyword.
A final variable can not be re-assigned to another object, but it can still contain a mutable, i.e. a partially changeable, object.
xcite:final-static-var-inline-assignment[]
xcite:final-static-const-var-readonly[]
xcite:final-instance-var-initialization[]
xcite:final-local-var-separate-def[]
A definition of a non-final static variable within a function is guaranteed to happen at most once per this variable specialization.
9.8. Lifetime
Primitives and enums don’t have lifetime, but classes do.
In the end of their lifetime, a variable is finalized.
9.8.1. Moving variables
Only a local variable can be moved.
It is hardly possible to detect use-after-move of a static or an instance variable, hence the restriction. |
Only an lvalue can be moved.
A moved instance is an rvalue.
Moving a variable does not preserve previous its lifetime status: if disabled (for example, with @setalive(var, false)
), it would be enabled again.
A moved variable is treated in the same way as an undefined one. It is thus possible to define a previously moved variable again.
final a := Std::Twine("foo")
let b := unsafe! uninitialized Std::Twine
# If not disabling the lifetime, then `b` would be
# finalized upon moving into it, which is dangerous
unsafe! @setalive(b, false)
b <- a # Moving `a` into `b`; `b` could've been finalized here
# a # Panic! Use-after-move
unsafe! @setalive(b, true) # Make `b` alive again
assert(b.upcase == "FOO")
a := Std::Twine("bar") # Re-defining `a`
# This function accepts an instance
# rather than a pointer to one.
def foo(twine : Std::Twine);
foo(b) # OK, would call a copy-initializer
foo(<- b) # OK, move `b` instead of copying it
# b # Panic! Use-after-move
It is not possible to move a varible into another local variable definition, as it is hardly practical.
Thus, there is only a copy-define operator :=
, but no move-define.
final a = Std::Twine("foo")
final b := a # OK, copy-define
# final c <- a # Panic! `c` is not defined yet
9.9. Assignment
An assignment to a variable is re-writing the memory region it is pointing to.
An assignment of a new value to a variable may be performed by either copying the value or moving it, depending on the assignment operator. The return value of an assignment operation itself may be either a copy of the new variable value or a moved instance of the old variable value, depending on the assignment operator.
Operator | Example | b is…​ |
r is…​ |
---|---|---|---|
|
|
Copied |
A copy of |
|
|
Copied |
Moved old instance of |
|
|
Moved |
A copy of |
|
|
Moved |
Moved old instance of |
For consistency reasons, <= would be a better choice for simple assignment instead of = .
But it is already taken by the "less than or equal" comparison operator, and = is preferable due to familiarity.
|
<< may be read as “push the old value from the variable”.
|
-
sval
(storage value): variables, dereferenced pointers -
mval
(moved value): return values,(← x)
results -
rval
(right value (historical)): everything else
Operation | Allowed? | Example |
---|---|---|
|
Yes, implies copying |
|
|
Yes, implies moving |
|
|
Yes, implies moving |
|
|
Yes, implies moving |
|
|
Yes, implies moving (TODO: Do not allow?) |
|
|
Yes, implies moving (TODO: Do not allow?) |
|
A pair of defined variables of any storage can be swapped using the swap operator <->
.
Swapping implies simulatenous moving of variable values into each other, thus it does neither copy nor finalize nor lead to use-after-move behavior.
The return value of a swap operation itself is a copy of a new value of the left operand.
TODO: Taking an address of an undefined variable is unsafe. It returns a pointer with defined storage, though.
let a := 1
let b := 2
let new_a := a <-> b
assert(new_a == a == 2)
assert(b == 1)
(*
A variable declaration. In different contexts, optional items
may become mandatory; for example, a final static variable
must have its definition in-line.
*)
var_decl =
(* In most cases, it's implicitly `def` *)
["def" | "decl" | "impl"],
(* A mandatory accessibility modifier *)
("final" | "let" | "get" | "set"),
(* An optional storage modifier, otherwise infered *)
["static" | "instance" | "local" | "undefstor"],
(* An optional `var` keyword *)
["var"],
(* A mandatory variable identifier *)
id,
(* An optional type restriction, otherwise infered *)
[":" | "~", type_expr],
(* An optional inline definition of the variable *)
["=", expr];
TODO: let x, y : T === let x : T, y : T; let x : ?, y : T === let x : Undef, y : T
VS. let x, y : T === let x : Undef, y : T
.
TODO: final uninitialized variables MUST not be placed in read-only memory, thus unsafe initialization of them is possible.
10. Functions
Can not user-declare non-operators ending with =
(e.g. name=
is invalid).
10.1. Declaration
closure_arg_decl = (* TODO: *);
(* A closure declaration may be explicitly empty (e.g. `[]`). *)
closure_decl =
"[",
[closure_arg_decl, {",", closure_arg_decl}, [","]],
"]";
(*
```
~> || body # Any closure, types infered
~> |[]| body # Explicitly empty closure
~> |[C] A : R|
```
*)
block_restriction = "=>", [block_proto];
type_restriction = WS, (":" | "~"), WS, type_expr;
(*
For declarations, at least either identifier or restriction
must be specified, whereas aliasing is not possible.
For implementations, at least either of three must be specified.
*)
arg_decl =
(* Optional alias, unapplicable to function declarations *)
[id, NB, ":"],
(* An optional argument identifier *)
[id],
(* An optional argument restriction *)
[type_restriction | block_restriction];
(*
An named argument declaration with restriciton.
If type of the restriction is omitted, it is assumed to be concrete.
So that, `(arg: T)` == `(arg: : T)` != `(arg: ~ T)` == `(arg ~ T)`.
*)
arg_decl =
[id, NB, ":"], (* Optional argument alias *)
[var_decl], (* Optional argument variable declaration *)
[":" | "~"], id;
(*
An anonymous argument declaration
with restriction, e.g. `(: T)` != `(~ T)`.
*)
arg_decl = (":" | "~"), id;
fun_decl = "decl", id, "(", {arg_decl}, ")";
10.2. Storage
Function storage may be static
, instance
or undefstor
.
The latter is only applicable to function declarations with traits.
10.3. Mutability
Function mutability may be mut
, const
or undefmut
.
The latter is only applicable to function declarations with traits.
10.4. Manipulating definitions
10.4.1. Unimplementation
An implementation may be fully or partially erased using an unimpl
clause.
Only an existing implementation may be unimplemented.
def foo(x ~ Int);
# unimpl foo(x ~ FBin) # Panic! Can not find matching implementation
unimpl foo(x : SInt32) # OK, now we don't have an implementation
# for this particular type
# foo(42) # Panic!
foo(42i16) # OK
unimpl foo(x ~ SInt) # OK, now we only have implementation for `UInt`
# unimpl foo(x ~ SInt32) # Panic! Can not find matching implementation
foo(42) # OK
# foo(-1) # Panic! Can not find overload for `foo(x : SInt32)`
10.5. Generators
Generator is like macro, but bound to specific instance.
A function accepting a block argument is called a generator function.
# `=>` is a special "block restriction"
decl foo(=>) # An anonymous block arg
# If block arity > 1, requires all block args to be named
decl bar(success: =>, failure: =>) # Named block args
# Having a label in argument declaration
# always expects it to be a block label.
impl foo(%block: =>) { %block(42) }
# foo(block: => { }) # Panic! Declaration prohibits named argument
foo(=> { }) # OK
# Aliasing
impl foo(block: %blk =>) { %blk(42) }
10.6. Returning
A returned value is always moved.
Variants are treated specially.
If returned value r is rval or mval, then return r
is a shortcut to T?(r)
.
If returned value r is sval, then return r
is a shorcut to T?(<- r)
.
@[Trivial]
class Klass
end
def foo : Klass
final klass := Klass()
return klass # Directly returned, thus implicitly moved
# return Klass() # Ditto
end
def bar : Klass?
if @rand?
final klass := Klass()
return klass # Shortcut to `.(<- klass)`
# return Klass() # Ditto
else
return void # Shortcut to `.(void)`
end
# An absence of `return` is implicitly `return void`
end
def baz : (Klass, Klass)
final klass := Klass()
# Can not imply moving in a
# non-directly returned expression,
# thus by default `klass` is copied
return (klass, <- klass)
end
TODO: Hidden (non-public) functions should be treated as public.
For example, it may be better to mark them unsafe
rather then wrap their bodies in unsafe!
.
TODO: Can overload: visibility, storage, mutability. Can not: safety, throwing-ness, purity.
10.7. Pure functions
This section is to be discussed yet. |
Only a function can be pure, neither generator nor proc.
A function can be marked as pure using the pure
modifier.
Without that modifier, a function is implicitly impure
, even if a compiler can prove it is pure.
-
A pure function does not have side effects:
-
It only writes to local storage;
-
It does not call impure functions.
-
-
A pure function only reads from caller, local or temporal storages.
If any of the statements above is false for a function marked pure
, a compiler panics.
A pure function can be throwing: modifying a backtrace is not considered a side-effect.
pure def sum([]: a, []: b)
return a + b
end
def global_add([]: a)
static let global = 42
return global + a # Reading from static storage, thus impure
end
11. Safety
There are three levels of runtime safety in descending order, namely threadsafe
, fragile
and unsafe
.
The following operations are threadsafe
:
-
Calling a method with at least
fragile
safety modifier on a value with local storage.
The following operations are unsafe
:
-
Coercion, unless special conditions are met;
-
Some of pointer operations, depending on its storage.
All other operations are fragile
by default.
11.1. Safety statements
A program shall not invoke a lesser-safe code from a higher safety environment. Instead, a lower safety statement shall be used to wrap the lesser-safe code.
fragile_statement =
"fragile!",
expr | block();
unsafe_statement =
"unsafe!",
expr | block();
A safety statement itself becomes of the specified safety level.
Safety statements transfer the safety responsibilities from the compiler to a developer.
It is easy then to debug the program by grepping source files with For example, calling an |
11.2. Safety modifiers
There are threadsafe
, fragile
, unsafe
and undefsafe
safety modifiers.
An undefsafe
safety modifier implies that the modified entity has undefined safety.
A type, function or macro declaration has a safety modifier.
safety_modifier =
"threadsafe" |
"fragile" |
"unsafe" |
"undefsafe";
The global namespace implicitly has the fragile
safety modifier.
Declarations other than macro declarations implicitly inherit the containing declaration safety modifier unless overriden.
A macro declaration always has an implicit undefsafe
safety modifier unless overriden.
For example, a Similarly, an instance method declared in a |
A trait function declaration with undefined storage has the undefsafe
safety modifier unless explicitly overriden.
A derive
statement inherits the containing declaration safety modifier unless explicitly overriden.
trait Foo
# Implicitly `undefstor undefsafe`.
decl foo : Void
end
struct Bar
derive Foo
# Implicitly `instance fragile`.
impl foo;
end
threadsafe derive Foo
# Implicitly `instance threadsafe`.
impl foo;
end
end
12. Types
A type is a fundamental qualia of a entity determining its characteristics.
The definition of type given by Wikipedia:
[…​], a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data.
Data Type at July 20, 2020
is different from that used in this document. Instead, this document has the aforementioned specialized data type definition, which aligns closely with the Wikipedia’s.
A type is uniquely identified with a fully-qualified identifier.
A type is either a namespace, trait, unit, enum, struct, class or an annotation; this classification is known as the type kind. A type kind is a keyword.
A data type may have instances of that type known as objects which are deemed to physically exist in the space-time continuum, including stack memory, processing unit registers or sections within an executable file. info:[In practice, an object existence may be elided during translation for optimization purposes. Also, an object may be present in superposition, e.g. a quantum bit, which is not clearly a definition of existence. However, a human always considers an object existing somewhere and sometime.] Enum, struct and class types are considered data types.
TODO: A struct type may be named or anonymous. Anonymous struct also allow method declarations with reopening.
TODO: A class type has optional mutability. Read more in classes.
TODO: Entity declaration, implementation and definition.
A namespace, trait or unit type shall not be declared, but defined only. info:[An implementation statement requires a prior declaration statement, therefore these types can not be implemented either. The design decision is driven by the fact that these types do not have runtime state to be finalized. The annotation type kind is not included in that list because it may actually be defined with hooks.]
12.1. Members
A type contains zero or more members. A member may be a function, value, macro, or a type. This classification is known as member kind.
Every type is a name-space in sense of that it can have members declared.
value = "variable" | "constant"; (* TODO: Move away *) member_storage = "instance" | "static"; member_kind = "function" | "value" | value | "macro" | "type"; member = [member_storage], member_kind, "member";
Any type may contain type and macro members.
A function or value member has storage. The storage is either static or instance. A static member has lifetime aligned with the program execution. An instance member lifetime spans to the object’s. In a data or trait type, the storage is instance by default, and can be altered with a storage modifier. Only struct and class types may contain instance value members. In a enum type, a value member declaration shall explicitly contain the static storage modifier. In a namespace, annotation, or a unit type, a member storage is always implicitly static and can not be altered.
12.2. Generic types
A generic type is a type containing at least one generic argument declaration.
A generic argument is accessible from within the generic type scope. info:[A generic argument is not accessible from type members nested in the generic type.]
struct Foo<T>
struct Bar
let x : T # Panic! `T` is not declared in `Bar`
struct Bar<T>
let x : T # OK, but distinct from `Foo`'s `T`
Upon identifier qualification, a generic argument has higher precedence over the outer scope.
By convention, generic argument identifiers consist of the smallest reasonable number of uppercase Latin letters, e.g. Array<T>
or Encoding<BitSize: BZ, CodeunitSize: CZ>
.
A generic argument may be queried from a generic type using the t::<name>
accessor, where name
is the name (or alias, with higher precedence) of the argument.
info:[Because of that, <>
is in the list restricted names for a user function declaration.]
decl struct Array<Type: T, Size: S>;
@assert(Array<SInt32>::<Type> == SInt32)
final ary = FBin64[3](1, 2, 3)
@assert(ary::<Size> == 3)
# Also works for variants!
final var = Std@rand(%i[1 2 3], %f[1 2 3 4])
# @assert(var::<Size> == 3) # Would work in 50% of cases
12.3. Type specialization
A type specialization is the process of establishing a distinct type with a unique generic arguments combination from a type definition for the first time when a compiler determines that there is a possibility for either an object or a type instance of the type to exist in runtime.
info:[Simply referencing a type, for example from within a reopen
statement or restriction, does not trigger type specialization.
For the specialization to be triggered, there shall be a possibility of that the type would actually exist in runtime as an object or type instance.]
A non-generic type is considered to have zero generic arity, which is a unique combination; therefore a non-generic type specializes at most once.
struct Foo<T>
val x : T
\{% print "Spec'd with " .. nx.scope.T:dump() %}
end
let f1 = Foo<SInt32>() # => Spec'd with SInt32
let f2 = Foo<SInt32>() # => (already spec'd)
let f3 = Foo<FBin64>() # => Spec'd with FBin64
# let f4 = Foo<Real>() # Panic! Field `Foo:x : Real` has incomplete type
decl bar(foo: Foo<FBin32>) # Do not specialize, it's just a reference
let f5 = \Foo<FBin32> # => Spec'd with FBin32
Type specialization is recursive; that is, if a type
A type specialization triggers specialization of each trait from its traits set, in the same order as in the set.
A specialization of a child type triggers specialization of its parent.
ditto:[Delayed macro calls and blocks] written directly within a type declaration body are evaluated on every containing type specialization. ditto:[] within a trait type declaration are copied into every deriving non-trait type.
trait Container<Type: T>
\{%
-- This macro block would evaluate on every deriving type
-- specialization, unique per deriving type.
print "Spec'd"
-- Would only declare if `T` is of `Int`
if nx.scope.T.type < nx.lookup("Int") then
%}
# Return a sum of all of
# the container's elements.
decl sum() : T
\{% end %}
end
decl struct Array<Type: T, Size: S ~ %z>
derive Container<T>
end
decl struct List<Type: T>
derive Container<T>
end
final ary = Array<SInt32, 3>(1, 2, 3) # => Spec'd
final list1 = List<SInt32>() # => Spec'd
final list2 = List<SInt32>() # => (already spec'ed)
12.4. Type completeness
A complete type is a data type specialization with defined size of occupied space, or a unit type specialization. Only a complete type can be used as a value type in runtime.
An incomplete type is a namespace type, annotation type, or a data type specialization with undefined size.
A data type specialization is deemed to be incomplete if it contains a field of an incomplete type.
info:[A complete type may have an incomplete generic argument, e.g. Pointer<SInt>
.
What matters is the runtime representation of a data type; if a data type does not contain fields of incomplete types, it is complete.]
info:[For Core primitives, there are rules for every primitive regarding on how to infer its completeness.]
It may be possible to define structs, classes and enums with incompl
modifier, which makes the type incomplete and disallows to use it in runtime unless reopened with compl
modifier.
The compl
modifier is default for definitions and reopenings then.
12.5. Type inheritance
TODO: Classes can extend classes and structs; structs can extend structs only.
TODO: Shall only extend a defined type, so we know its layout.
12.6. Reopening
A type may be reopened using the reopen
statement.
reopen Foo<T> forall T
is similar to decl Foo<T>
.
Can reopen Variants, Unions, anonymous structs, tuples, enums, primitives.
reopen
statement syntax(* TODO: Move. *)
compl_mod =
(* !keyword(mod) *) "compl" |
(* !keyword(mod) *) "incompl";
reopen_statement =
[compl_mod],
(* !keyword(statement) *)
"reopen",
type_ref,
";" | block();
A reopen
statement block semantics is similar to the declaration block semantics of the reopened type.
It is not possible to implement fields within a reopen statement.
It is a error to try reopening a non-existing type declaration.
decl struct Foo<T> forall T ~ Int
let x : T
\{% print "Specialized the " .. nx.scope.T:dump() %}
end
# reopen Foo<T: FBin64>; # Panic! `Foo<T>` istype
reopen Foo<SInt32>;
# => Specialized a SInt
# => Specialized the SInt32
Type expressions may also be reopened. info:[This gives a powerful and flexible mechanism for declaring members on trait sums, for example.]
Given these traits and struct:
trait Drawable
decl draw(Canvas)
end
trait Colorized
decl color : UInt8
decl color=(UInt8)
end
struct Line
<~ Drawable, Colorized
let color : UInt8
# impl ~Drawable.draw
impl draw(canvas)
# Draw without considering the color
end
end
One option would be to reopen the traits sum with another-named method declaration.
reopen (Drawable && Colorized)
decl draw_colorized(Canvas)
end
reopen struct Line
# impl ~(Drawable && Colorized).draw_colorized(canvas)
impl draw_colorized(canvas)
# Draw with color!
end
end
final line = Line()
# W/ color
line~(Drawable && Colorized).draw_colorized(canvas)
line.draw_colorized(canvas)
# W/o color
line~Drawable.draw(canvas)
line.draw(canvas)
Another option would be to reopen the sum with the same-named method declaration. But this would require solving the arising collision problem in the struct.
reopen (Drawable && Colorized)
decl draw(Canvas)
end
reopen struct Line
impl ~(Drawable && Colorized).draw(canvas) as draw_colorized
# Draw with color!
end
end
final line = Line()
# W/ color
line~(Drawable && Colorized).draw(canvas)
line.draw_colorized(canvas)
# W/o color
line~(Drawable).draw(colorized)
line.draw(canvas)
Reopening may also happen right within a type; it shall be then referenced with self
.
reopen Drawable
reopen (self && Colorized)
# Draw colorized.
#
# NOTE: It's not a part of `Drawable`,
# but rather of `Drawable && Colorized`,
# hence no collision if deriving `Drawable` only.
decl draw(Canvas)
end
end
# The semantics is similar to the
# second option mentioned above.
line~(Drawable && Colorized).draw(canvas)
The in-type reopening comes in particularly useful when dealing with generic arguments and namespaced types.
trait Container::Indexable<K, V>
reopen (self && Enumerable<V>)
decl each(-> |(index: K, value: V)|)
end
end
# An alternative would be (quite wordy):
#
reopen (
Container::Indexable<K, V> &&
Container::Enumerable<V>
) forall K, V
decl each(-> |(index: K, value: V)|)
end
12.7. Refining
TODO: using refinement
.
12.8. Restriction
Behavioural bias is applied whenever a compiler can prove it. In every scope, a type restriction invests into the bias. The bias is reduced outside of the context. The bias does not exist in runtime.
# In this example, `Drawable` is root trait for both `Drawable2D` and
# `Drawable3D`, causing name collision in a type deriving from both,
# because if we bias `point` to `Drawable` and call `draw` on it,
# which dimension-draw call would it be?
#
# A) Can not bias to `Drawable`, only `p~Drawable2D~Drawable`.
# Abstract traits, which are not deriveable directly?
indirect trait Drawable
decl draw
end
trait Drawable2D <~ Drawable
end
trait Drawable3D <~ Drawable
end
struct Point
derive Drawable2D
impl ~draw as draw2d
<~ Drawable3D
impl ~Drawable2D.draw as draw2d;
impl ~Drawable3D.draw as draw3d;
def do_pointy_stuff;
end
# The argument is required to derive from `Drawable2D`.
# def draw2d(x : T) forall T ~ Drawable2D # Ditto (need e.g. for `sum`)
# def draw2d(x : T) forall T where T <~ Drawable2D # Ditto, allows more complex expressions
# TODO: Have formal algorithm for that.
def draw2d(x ~ Drawable2D)
\{%
# `x` is biased to `Drawable2D` only
nx.assert(nx.scope.x.type.bias ==
nx.typexpr("Drawable2D"))
%}
x.draw() # OK, equivalent to `x~Drawable2D`
end
# An argument is required to derive from at
# least one trait deriving from `Drawable`.
def draw(x ~ Drawable)
\{%
# `x` is biased to `Drawable`
nx.assert(nx.scope.x.type.bias ==
nx.typexpr("Drawable"))
%}
# x.draw() # Panic! `~Drawable.draw()` is indirect declaration
if x ~? Drawable2D
# It may also derive from any other `Drawable`,
# but here we're biased to `Drawable2D` only
# an do not care about other possible traits.
#
\{%
# `x` is biased to `Drawable2D`
nx.assert(nx.scope.x.type.bias ==
nx.typexpr("Drawable2D"))
%}
x.draw() # OK, eq. to `x~Drawable2D.draw()`
end
end
12.9. Float
In computing, floating-point arithmetic (FP) is arithmetic using formulaic representation of real numbers as an approximation to support a trade-off between range and precision.
A particular built-in Float
type specialization maps to a valid IEEE 754-2008 (further referenced as IEEE 754) floating-point interchange format.
A Float
specialization representation in memory is defined in accordance to IEEE 754.
An implementation must implement five basic Float
specializations in accordance to basic formats specified by IEEE 754, either natively or in software.
In addition to that, two non-basic specializations are declared, but are optional to implement.
IEEE 754 format | Bitsize | Alias | Float specialization |
Is basic? |
---|---|---|---|---|
|
16 |
|
|
No |
|
32 |
|
|
Yes |
|
64 |
|
|
Yes |
|
128 |
|
|
Yes |
|
32 |
|
|
No |
|
64 |
|
|
Yes |
|
128 |
|
|
Yes |
If a target natively supports an IEEE 754 format, then the type under the ISA namespace must alias to the according Float
specialization, for example alias SPARC::FQuad = FBin128
.
If a target natively supports a non-basic format (even with a limited operations set), an Onyx compiler implementation must implement that format.
For example, the binary16
format is not basic, but often implemented, including some ARM and NVPTX targets; the FBin16
specialization must be implemented for such targets.
If a target natively supports an extended precision format, then it must be defined by the implementation.
The naming convention for an extended precision format alias is F{Base}E
, where {Base}
is the extended basic format, for example FBin64E
.
For example, on x87 a target, the extended precision floating point format may be aliased as alias X87::Float80 = FBin64E = Float<2, 64, 16383>
, and the Float<2, 64, 16383>
specialization would map natively to that format.
The C18 § F.2 standard states the following:
Therefore, |
An implementation must provide a way to specify mapping from three C floating-point types float
, double
and long double
to Onyx types, for particular target, prior to compilation.
However, an explicit coercion or conversion is still required to interoperate with C types.
The mapping defines according as
methods, for example float <⇒ FBin32
mapping defines $float.as(FBin32)
and FBin32.as($float)
methods.
An implementation should provide a pre-defined default mapping from C floating-point types to Onyx types for every target it supports.
If a target type has the same arithmetic format as an IEEE 754 format, then it must be aliased to an according Float
specialization.
Otherwise, it may be mapped to an extended-precision floating-point as dictated by IEEE 754, iff it matches the extended-precision format requirements; x87 conforms this requirement, for example alias FBin64E = Float<2, 64, 15> = X87::Float80
.
Implementations not conforming to IEEE 754 must not be mapped to a Float
specialization; for example, PowerPC extended precision type does not conform to IEEE 754, as it’s merely a software emulation. Thus, no mapping must exist, even for FBin64E
.
A particular Float
specialization implementation may be ensured during compilation using the @@impl?
built-in compiler intrinsic and the nx.Node:is_implemented
macro method, for example if @@impl?(FBin16)
or {% if nx.lkp("FBin16").is_implemented %}
.
12.10. Fixed-point numbers
In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after (and sometimes also before) the radix point (after the decimal point '.' in English decimal notation).
A fixed-point number is defined by its natural base -0b11.101Q8f4 : XBin<Integral: 3, Fractional: 4>
.
A fixed-point number is always signed.
A negative value of -25kD5f-3 : XDec<Integral: 2, Fractional: -3>
is read as 25
and a sign bit, thus occupying exactly 8 bits.
An implementation is required to implement an arbitrary-sized fixed-point number in any of the defined bases with a reazonable maximum bitsize limit.
require "./fractional.nx"
# A fixed-point number.
decl primitive Fixed<
Base: 2 || 10,
# Size of the integral part in `Base`.
Integral: ~%Natural = 0,
# Size of the fractional part in `Base`.
# May be negative for right-shift.
Fractional: ~%Integer = 0
>
end
alias XBin<*> = Fixed<2, *>
alias XDec<*> = Fixed<10, *>
12.10.1. Binary fixed-point number encoding
In a binary fixed-point number,
@[Entry]
export void main() {
let q = 96Q8f-4 : XBin<Integer: 7, Fraction: -4>
assert(q == 0b00000110 << 4)
assert(q == 0b01100000)
assert(q == 96)
}
12.10.2. Decimal fixed-point number encoding
Digits in a decimal fixed-point number are encoded using the Densely Packed Decimal encoding. An additional bit is occupied by the sign.
Given 17.42Q4f2 : XDec<2, 2>
, 0
is for sign, and 1742
is 1001 1000 0011 00
in DPD encoding.
The following table compiles sizes required for the first ten digits in a decimal fixed-point number.
|
12.11. Units
A unit type is an object, but it only exists in a single instance and does not occupy any memory in runtime.
(* A unit type declaration. *)
unit =
type_visibility_mod,
(* !keyword(Unit type declaration) *)
"unit",
id,
[generic_args_decl],
";" | block();
A unit type may derive from a trait.
A unit type shall not extend any other types.
Unit members do not have storage.
Unit members may be accessed both in namespace- and object-access styleinfo:[, for example Unit.foo()
is equivalent to Unit::foo()
].
Akin to instance members in an object, unit functions are also referenced as (unit) methods, and unit values — as (unit) fields.
Within a unit method implementation, this
is equivalent to self
.
TODO: An instance storage bound to a unit always expands to the static storage.
A unit type is a functor with zero arity. A call to a unit type returns itself.
unit Foo
let x = 42
def get_x
return x
end
end
assert(Foo == Foo())
assert(Foo.x == Foo::x)
assert(Foo.get_x == Foo::get_x)
13. Structs
Can only define val
fields, which mutability depends on the containing struct. Struct have default initializer, can not have finalizers.
TODO: : this
in function returns control exactly to this
.
TODO: Returning or throwing an alive struct does not require moving it, as the act the passing an argument to the caller implicitly moves the lifetime responsibility to it. {#lifetime-return-move}
13.1. Members
A member is a struct instance variable.
13.3. Reopening
An already defined struct may be defined again. Instead of a complete redefining, the definition continues as if it was the original definition. Thus, instead of redefinition it is called reopening.
A struct reopening may be either complete or incomplete, set by compl
or incompl
definition modifiers accordingly.
Following the usual definition rules, reopening a struct is implicitly complete by default.
Declaring new members is limited by the reopened struct’s completeness.
13.4. Completeness
A complete struct has all its members declared, thus having a defined memory layout (unless reordered).
Only a complete struct may be initialized (see Section 13.2, “Lifetime”).
Once complete, a struct may not be reopened as incomplete.
A complete reopening of an already complete struct prohibits declaring new members and extending from structs.
By default, a struct definition is implicitly complete.
13.5. Extending
A struct may extend at most one another struct.
An extending struct inherits all of the extended struct declarations, excluding type declarations, in the same order of declaration. Also see Section 13.6.1, “Ordering”.
Extending struct | Extended struct | Extension possible? |
---|---|---|
Complete |
Complete |
Yes |
Complete |
Incomplete |
No |
Incomplete |
Incomplete |
Yes 1 |
An extend
statement shall be placed before any field implementations in a struct.
Regardless of where an extend
statement is placed, the extending struct members follow the extended struct members in memory unless the extending struct is unordered.
struct Foo
let x
end
# Memory layout of `Bar`
# would be `x, y, z`
struct Bar
extend Foo;
let y
let z
end
13.6. Memory layout
13.6.1. Ordering
The order of a struct fields in memory is undefined.
A struct may be reordered. In that case, the order of its members is not defined.
A reordered struct memory layout is a subject to compiler optimizations. |
To make a struct reordered, the built-in zero-arity Reorder
annotation shall be applied to it.
A struct extending from a reordered struct has undefined order of extended struct members only, in the boundaries of memory occupied by extended struct members.
Applying the Reorder
annotation to an extending struct mixes the boundaries of memory occupied by all members, thus allowing for more optimal reordering.
@[Reorder]
struct Foo
let foo_a : SInt32
let foo_b : FBin64
let foo_c : SInt32
end
# The memory layout would be the following:
# `[foo_*]x3, bar_a, bar_b, bar_c`,
# where `foo_*` have undefined order.
struct Bar
extend Foo;
let bar_a : SInt32
let bar_b : FBin64
let bar_c : SInt32
end
# In this case, the order of members
# of `Baz` is totally undefined.
@[Reorder]
struct Baz
extend Foo;
let bar_a : SInt32
let bar_b : FBin64
let bar_c : SInt32
end
13.7. Anonymous structs
An anonymous struct may only declare indexed or named members.
Indexed member declarations following a named or an explicitly indexed member declaration must be explicitly indexed.
An anonymous struct member can not have a default value, thus each member must be restricted to a concrete type.
The memory layout of a non-reordered anonymous struct matches its members declaration order. Thus, the order of members (even named) in a non-reordered anonymous struct matters.
13.7.1. Anonymous struct literals
An anonymous struct literal consists of an arbitrary amount of implicitly or explicitly indexed or explicitly named values, wrapped in parentheses.
Values declaration follows the Explicit indexing rule.
Trailing commas are allowed in anonymous struct literals.
A single implicitly indexed value without a trailing commas is treated as an expression rather than an anonymous struct.
13.7.2. Anonymous struct destruction
An anonymous struct may be destructed using the splat operator.
A destructed anonymous struct may be multi-assigned to a set of variables, or become a varg restriction.
Upon multi-assignment, an anonymous struct member can be referenced by either its implicit or explicit index, or by its name. References follow the Explicit indexing rule.
# Members `[0]` and `[1]` are referenced
# by their implicit indices 0 and 1.
# Members `foo` and `[2]` are referenced explicitly.
let a, b, foo: c, [2]: d = *('a', [1]: 'b', foo: 'c', [2]: 'd')
13.7.3. Anonymous struct member ordering
An anonymous struct may become reordered by setting its Reordered
generic argument to true
(false
by default).
The memory layout of a reordered anonymous struct may not match its members declaration order due to possible reordering of its memebers.
Two reordered anonymous structs containing the same set of members but in different order of declarations are guaranteed to have the same memory layout.
An anonymous struct literal may become reordered if appended with r
.
13.7.4. Packed anonymous structs
An anonymous struct may become packed by setting its Packed
generic argument to true
(false
by default).
A packed struct and its members have alignment of 1.
An anonymous struct literal may become packed if appended with p
.
14. Traits
A trait is a composable unit of behaviour.
The traits implementation is inspired by Traits: A mechanism for fine-grained reuse by Ducasse et al., 2006. |
TODO: Only units and objects may derive from a trait. Namespaces do not have behaviour.
14.1. Declaring a trait
trait_decl =
trait_decl_mod,
["decl"],
"trait",
id,
[nb, generic_arguments_decl],
";" | body();
trait_decl_mod = [type_visibility_mod];
A trait type can not be implemented; instead, it is continuously declared.
A trait T
statement is a shortcut to decl trait T
.
A trait declaration statement may include a type visibility modifier and a generic arguments declaration.
A trait type is always an incomplete type. A trait type declaration statement can not contain a completeness modifier.
A trait type function and value members have storage. The storage is implicitly instance.
A trait type allows method declaration and implementation.
A trait method implementation is specialized for every deriving type separately.
A trait method implementation may use this
and super
keywords, which evaluate in accordance to the context the method is specialzied within.
14.2. Trait arithmetic
An object or trait type conveys a set of traits it derives from, which is empty by default for object types and contains itself for trait types.
Deriving from a trait adds the trait term to the deriving type traits set.
Trait set composition is commutative: the ordering of adding traits to the set does not matter.
Nested traits are equivalent to flattened traits; the trait set composition hierarchy does not affect the traits behaviour.
The same trait may be derived multiple times by the same object; it would still count as a single traits set entry.
Recursive derivations are allowed.
14.3. Deriving from a trait
A type which has a derive T
clause is said to derive from T
.
derive
clause syntaxderive =
["undefstor" | "static" | "instance"],
"derive",
id,
";" | block();
- Deriver, deriving (type), derivative
-
The type deriving from a trait.
- Derivee, derived (type)
-
The trait a type derives from.
A deriving type may contain multiple derive
statements.
Deriving from a trait includes all of its members with undefined storage into the deriving type as if they were defined within the deriving type itself, which may lead to name collisions.
The storage of an included member depends on the storage modifier of the according derive
clause.
Deriving type | Allowed modifiers | Default modifier |
---|---|---|
Namespace |
|
|
Trait |
|
|
Primitive, class, enum |
|
|
Therefore, static derive T
from type U
is identical to reopening a Type<U>
type and then instance-deriving it from T
.
reopen SInt32
static derive Debuggable
impl debug(stream) { stream << "SInt32" }
end
end
reopen Type<SInt32>
derive Debuggable
impl debug(stream) { stream << "SInt32" }
end
end
Only statements allowed within a derive
statement are impl
, reimpl
, alias
and moveimpl
.
A derive
statement may be early-terminated with a semicolon, left without any inner statements.
Lookups within a derive
statement check the trait’s scope at first, and then go up to the deriving type.
A derived trait type may be referenced using the derived
keyword.
The deriving type may still be referenced using the self
keyword.
Technically, it is allowed to reference other traits' members within a derive
statement, e.g. self~OtherTrait:foo
(see behavioral erasure below) or ~OtherTrait:foo
or even foo
(unless there is an ambiguity).
A derive
statement does not require all included members' implementations to be contained within it (the derive
statement).
As any other, derived implementations are lazily checked upon specialization. If a declaration is not implemented at all, the implementation responsibility is passed to the next deriving type, if any.
14.3.1. Behavioral erasure
A derived trait’s members are always accessible via their original identifiers with behavioral erasure achieved with fuzzy restriction.
# In this example, both `Drawable2D` and `Drawable3D`
# have `draw` method declaration. A deriving type has
# to deal with the collision. One way would be to
# alias (both) methods under different names.
trait Drawable2D
decl draw
end
trait Drawable3D
decl draw
end
@[Trivial]
class Point
derive Drawable2D
impl draw;
# Must use `self` here, otherwise
# would attempt to declare a new
# identifier within a `derive` clause,
# which is prohibited.
alias self:draw2d to draw
end
derive Drawable3D
impl draw;
alias self:draw3d to draw
end
end
final p = Point()
p.draw2d # OK
p.draw3d # OK
# p.draw # Panic! Ambiguous choice between
# `Point~Drawable2D.draw` and
# `Point~Drawable3D.draw`
p~Drawable2D.draw # OK
(p ~ Drawable3D).draw # OK
def draw2d(x ~ Drawable2D)
x.draw # OK. Thanks to behavior erasure, exactly
# `Drawable2D~draw` is always referenced
end
draw2d(p) # OK
def draw_or(x ~ Drawable2D || Drawable3D)
# x.draw # Panic! `x` may be BOTH `Drawable2D` and `Drawable3D`,
# hence unable to choose right implementation
if not x ~? Drawable2D
x.draw # OK. The compiler can prove that it can only
# be `Drawable3D`. Even if we add `Drawable4D`
# with the same `draw` declaration,
# the behavior erasure feature would always
# reference `~Drawable3D.draw` exactly.
end
end
draw_or(p) # OK
def draw_and(x ~ Drawable2D ^ Drawable3D)
x.draw # OK. There will never be an ambiguity
end
# draw_and(p) # Panic! Point contains BOTH traits, but they're XORed
14.4. Abstract traits
A trait may be declared as abstract
.
An abstract trait shall not be derived by a non-trait type.
An abstract trait may be extended by a trait type.
info:[Effectively, an abstract trait is merged into the extending trait instead of being treated as a separate term participating in the traits sum.
All instance members of an abstract trait are treated as if they were declared in the deriving trait itself.]
Biasing to an abstract trait does not treat its methods as ever implemented.
abstract trait Drawable
decl draw()
def double_draw
draw()
draw()
end
end
trait Drawable2D < Drawable;
trait Drawable3D < Drawable;
struct Point
derive Drawable2D
impl draw() as self~draw2d;
end
derive Drawable3D
impl draw() as self~draw3d;
end
end
struct Line
derive Drawable2D
impl draw()
end
end
def double_draw2d(x ~ Drawable)
# x.draw() # Panic! Can not call an abstract trait method declaration
if x ~? Drawable2D
x.double_draw() # Eq. to `x~Drawable2D.double_draw()`
else
throw "Not a 2D, rejecting"
end
end
double_draw2d(Point()) # Calls `Point~Drawable2D:double_draw`
double_draw2d(Line()) # Calls `Line:double_draw`
If Drawable
was not an abstract trait, but rather a simple trait, it would participate in the trait sum, leading to unwanted collisions.
trait A
decl foo;
def double_foo
foo()
foo()
end
end
# Effectively, `B` = 0 + `A`.
trait B <~ A;
# Effectively, `C` = 0 + `A`.
trait C <~ A;
struct Foo
# `B + C = 0 + A + 0 + A = A`
<~ B, C
# Either one of these would
# suffice, but exactly one.
impl A~foo;
impl B~foo;
impl C~foo;
# An attempt to implement
# either again would panic.
impl A~foo; # Panic!
impl B~foo; # Panic!
impl C~foo; # Panic!
end
struct Bar
<~ B
impl A~foo;
end
Foo~A:foo # OK
Foo~B:foo # Nope.
reopen (A && B)
decl foo
end
TODO: decl let color : UInt8
vs. decl color : UInt8
&& decl color=(UInt8) : Void
.
line..color=(42).color
is acceptable?
15. Core API
The Onyx Core API is the Core::
namespace including types, functions and macros built into the language.
The Core::
namespace is available from every Onyx source code file.
Criteria for being included in the Core API are:
-
Have a great chance of being implemented on hardware level. For example, integers are ubiquitous, floating points are common, and tensors are typical for GPUs. For the same reason most basic math operations are in the Core API;
-
Have literals built into the language. For example, ranges;
-
Have traits otherwise impossible to express using the language. For example, blocks, lambdas, functions, unions, variants, void etc.
Core API types' memory layout is usually undefined.
Other than types deriving from Scalar
and Aggregate
, the Core API also includes freestanding Void
, Union
, Vector
and Quire
types.
16. Directives
A directive is an instruction to the compiler.
16.1. File dependency directives
File dependency directives are require
and import
directives.
A file dependency directive makes the program depend on a source file at the provided pathname.
The same file may be referenced from multiple file dependency directives, but it is guaranteed to be processed exactly once first time it is referenced.
If the same file is referenced both from a require
and import
directive, then it is processed exactly twice.
The order in which files are referenced both from inside a single file dependency directive and the global order matter. A user should keep in mind the actual entities structure so they do not reference an undeclared yet entity.
A single file dependency directive may contain multiple pathnames, as well as patterns in pathnames.
Filename and pathname resolution rules in a file dependency directive shall conform to sections 4.7, 4.8 and 4.13 of IEEE Std 1003.1-2017. Pathname pattern matching notation shall conform to section 2.13 of IEEE Std 1003.1-2017.
The IEEE Std 1003.1-2017 is known as the POSIX standard. Pathnames in file dependency directives shall conform to the POSIX standard so they stay cross-platform. |
An optional at
clause in a file dependency directive substitutes the provided pathname with every pathname listed in the from
clause in a file dependency directive.
If an at
clause is present, then from
clause shall be present with the explicit from
keyword.
For example, require "bar/foo.nx"
is equivalent to require from "foo.nx" at "bar"
.
An implementation shall provide a way to add to the list of base lookup paths for every file dependency directive class in the manner similar to -I
and -L
flags ubiquitous in C compilers.
The Macro API provides ways to both invoke a file dependency directive as well as alter the base lookup paths.
16.1.1. require
directive
A require
directive makes the program to include an Onyx source file into the program source.
Recursive require
directives are allowed.
require_directive =
"require",
@unord(
(* Path(s) of the required file *)
(["from"], path, {",", path}),
(* The base path *)
["at", path]
);
16.1.2. import
directive
An import
directive makes the program aware of entities contained in the imported file such that those entities are not a part of the program itself, but rather can be referenced from the program.
An imported file may contain source code in a language other than Onyx.
The Standard defines interoperability specifications for the following languages: Onyx, C, Lua. However, an implementation is not limited to those languages defined in the Standard. Read more about interoperability in Interoperability. |
(*
Import a file written possibly
in a language other than Onyx.
```
import from "stdio.h" in "C"
import in "Rust" from "main.rs", "aux.rs" at "/mypath/"
```
*)
import_directive =
"import",
@unord(
(* The source language of an imported file *)
("in", string),
(* Paths to the imported files *)
("from", string, {",", string}),
(* The base path for the imported files *)
["at", string]
);
16.2. using
directive
The using
directive either injects a namespace N
into the local scope, so that members of N
are directly accessible from the local scope; or applies a refinement R
in the local scope; or defines an alias A
to B
effective in the local scope.
using =
using_namespace |
using_refinement |
using_alias;
using_namespace =
"using",
["namespace"],
type_ref;
using_refinement =
"using",
["refinement"],
type_ref;
using_alias =
"using",
["alias"],
decl,
("=", "to"),
ref;
If a using
directive contains =
or to
, it is then implicitly treated as using alias
.
Otherwise, the kind of using
is inferred from the referenced identifier.
An explicit namespace
, reference
or alias
clause may be present to enforce the kind of using
.
For a using
directive, the scope is limited to the containing source file.
namespace Foo
let bar = 42
end
using Foo # `using namespace` is inferred
bar = 43 # OK
using
kindnamespace Foo
end
using namespace Foo # OK
# using refinement Foo # Panic!
namespace Foo
let bar = 42
end
using Baz = Foo # `using alias` inferred
Baz.bar = 43 # OK
Foo.bar = 44 # Still OK
# bar = 45 # Panic! `bar` is not declared in current scope
17. Literals
A literal is a constant value known at compilation time, as opposed to a non-literal value computed at runtime.
Despite of the knowledge of existence of such a literal value, a compiler may decide to change its form or completely remove from a binary for optimization purposes. In other words, for some kinds of literals, there is no guarantee that a literal would be stored anywhere in a resulting binary; but it must be transparent to a user of the program.
A literal is an rvalue.
17.1. Literal constrainment
Each literal conveys a certain amount of information, which determines its level of constrainment.
A literal constrainment determines the set of restrictions it can match.
A literal constrainment may be altered with literal prefixes and suffixes.
A literal without any prefixes or suffixes is said to be unconstrained.
Some literal kinds have default types defined for their unconstrained literals, e.g. an unconstrained integral literal is SInt32
by default.
Even an unconstrained literal still have some constraints. For example, a numeric literal can never become a string. Or a negative whole number can never become an unsigned integer. |
A literal can be instance-restricted (both with concrete and fuzzy restrictions) to a type, e.g. 42 : SInt32
.
Applying an instance restriction to a literal takes into consideration and contributes into its constrainment level.
For example, 42f : SInt32
would be invalid, as the literal is already constrained to a binary floating-point value.
Or, 42 ~ FBin : SInt32
would also fail on the second restriction.
A literal can be type-restricted to either an exact literal value (e.g. 42 :\: @{42}
) or a built-in type having literals (e.g. 42 ~~ @{Size}
).
Such a restriction, where the right operand is wrapped in @{}
, is called a literal restriction.
Literal restrictions are ubiquitous in generic type arguments, e.g. primitive Array<Type: T, Size: S ~ @{Size}> .
|
TODO: Refine exact and fuzzy restrictions, standardize scalar and aggregate literals.
Only a type restriction may be a literal restriction. Only a literal may be the left operand of a literal restriction. A literal restriction returns its left operand, the literal.
An exact literal restriction must match a complete type literal.
For example, @{42i}
and @{FBin}
are invalid restrictions, because neither literal represents a complete type (despite of the fact that i
defaults to SInt32
; in a literal restriction it expands to simply SInt
).
A valid example would be @{42i32}
.
A fuzzy literal restriction may match an incomplete literal kind, but it erasures the left operand behavior, which affects the literal constainment.
# `1` is unconstrained. An unconstrained
# integral literal defaults to `SInt`.
# However, it may be further restricted to other types.
1 ~ Integer
1 ~ FloatingPoint
1 ~ Integer<Signed: false> : UInt8
# Adding `-` to the literal reduces the
# set of restrictions it can match
-1 ~ Integer<Signed: true>
-1 !~ Integer<Signed: false>
-1 ~ FloatingPoint
# Adding a fractional part turns
# a literal into a non-integer
1.0 ~ FloatingPoint
1.0 ~ Fixed
1.0 !~ Integer
1.0f ~ FBin
1.0f64 : FBin64
def foo(x: L) forall L : %i
\{{ nx.scope.L.value }} # `foo(42i) -> 42i`
\{{ nx.scope.L.number }} # `foo(42i) -> 42`
end
def foo(x : L) forall L : %q
\{{ nx.scope.L.value }} # `foo("x"utf8) -> "x"utf8`
\{{ nx.scope.L.string }} # `foo("x"utf8) -> "x"`
end
%i, %si # Integer
%ui # Unsigned integer (natural)
%i32, %z # Limit values
%r, %ri32 # Any-end range
# For clarity with `()`, this is disallowed
# %ri[) # Specific-end range
%q # Any-encoding string, value is
%qucs # Accept `"foo"utf8` and `"foo"ucs2`, reject `"foo"ascii`
%ucs, %c, %cucs # A char
# Array of literals is prohibited?
# Because array becomes a runtime value?
# %c[] # An array of chars
%c(2) # Exactly 2 chars
%c(2..) # At least 2 chars
%c(1..3) # 1 to 3 chars
def foo(*x : L) forall L : %c(2)
foo(('a', 'b'), ('a', 'b'))
def foo(x : L) forall L : %c(2..)
foo('a', 'b', 'c')
def foo(*x : L) forall L : %c
foo('a', 'a', 'a') # Useless
def foo(*x : L) forall L in ('a', 'b', 'c')
foo('a', 'c') # OK
42 ~ @{42} : @{42f} : FBin64
42 ~ @{42i} !: @{42f} # Can not further restrict `i` to `f`
struct Foo<T : @{Size}>
end
Foo<42> # OK
# Foo<-42> # Panic!
struct Bar<T : (@{"foo"} || @{"bar"}) ~ @{String}>
end
Bar<"foo"> # OK
# Bar<"baz"> # Panic!
17.2. Underscores
Underscores can be used in a literal to either visually divide its parts without contributing into its value, or as a part of its value (true for text literals), or both.
1_000_i32 # Underscores do not contribute into the value
"Hello_world" # Underscore is a part of the value
"a_b"_u16 # Both
Literal suffixes and prefixes (see [literal-multiplier-prefix], [literal-encoding-prefix]) are treated as a whole and therefore can not contain underscores.
A preceding underscore is treated as (a part of) an identifier and therefore prohibited in literals, e.g. 1
is an identifier, and 'a'
is a error.
Sequential underscores are allowed, e.g. 1__000
.
Trailing underscores are allowed in numeric and text literals, but only after a prefix.
Due to the fact that suffixes are read from the end, trailing underscores are required in cases when it is desired to emphasize the absence of a suffix, for example: Example 23. Using trailing underscores to emphasize the absence of suffix in literals
|
17.3. Literal suffixes
A literal suffix is a combination of graphemes adjacent to a literal contributing into the literal’s constrainment.
Literal suffixes are only applicable to either scalar or magic literals.
TODO: A scalar literal is an nvalue.
A scalar literal wrapped in parentheses is still a literal instance.
Therefore, literal suffixes are applicable to scalar literals even wrapped in parentheses, e.g. (42)i32
, ("foo")utf16be
.
In a non-magic literal, literal suffixes follow the literal value, e.g. 1i
.
In a magic literal, literal suffixes precede the literal value(s), e.g. %i[1]
The first literal suffix in a literal, if any, is major. The following suffixes are minor.
In a literal, literal suffixes are parsed from right to left. A literal suffix has higher priority than other parts of a literal.
A literal suffix can not contain underscores.
Underscores may be used to visually separate suffixes from other parts of a literal.
Moreover, the fact that a trailing underscore is not allowed after a suffix makes it possible to distinguish between suffixes and, say, prefixes, for example 5f == 5.0f64 but 5f_ ~= 0.5e-15f64 .
|
A literal suffix may be defined for a type using the LiteralSuffix
annotation.
A type with a literal suffix defined must be a functor accepting a literal instance.
Whichever literal kinds are accepted by the functor defines which literal kinds the suffix may be applied to.
A custom literal suffix has higher precedence than a built-in literal suffix.
For example, ascii
would be parsed as ascii
if defined, and not asci + i
.
Examples | Restriction | Default type |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Regex | Applicable to | Constraint | Default type |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Regex | Type | Accepted literals | Notes |
---|---|---|---|
Integer literal suffixes |
|||
|
|
|
|
|
|
|
|
|
|
|
Turns a literal constrainment into |
|
|
|
Constrains a character literal charset to |
Real literal suffixes |
|||
|
|
|
|
|
|
|
|
|
|
|
Either |
|
|
|
Either |
|
|
|
Turns a literal constrainment from |
|
|
|
|
|
|
|
|
String literal suffixes |
|||
|
|
|
Constrains a string literal encoding |
|
|
|
Constrains a string literal encoding |
C integer literal suffixes |
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C real literal suffixes |
|||
|
|
|
|
|
|
|
|
|
|
|
to_*
and as_*
lookup algorithmIf a receiver R
does not have a to_*
or as_*
method defined explicitly, a compiler then looks up for a literal suffix identical to the *
part.
If such a suffix exists for a type T
, and R
has method to(:: T)
defined, it is called then.
Otherwise, a compiler panics because of missing method.
For example, if FBin64
had to(:: SInt32)
defined, then all fbin.to(SInt32)
, fbin.to_si32
and even fbin.to_i
would be valid.
Presence of the algoritm effectively allows hyphens and dollar sign in exclusively in function names beginning with to_
and as_
, for example to_Qe-7
and as_$f
.
17.4. Scalar literals
17.4.1. Numeric literals
TODO: Literal structure: base prefix, value: (numeric representation (including multipliers)), suffix.
Literals are designed to not scream until needed: the syntax is almost invariant in regards to casing, and most things are in lower case. Exceptions are the Q number that is widely known by its upper-case name, hexadecimals to distinguish them from other parts, and big multiplier prefixes. |
multiplier_prefix_iec =
"Yi" | "Zi" | "Ei" | "Pi" |
"Ti" | "Gi" | "Mi" | "Ki";
multiplier_prefix_si =
"Y" | "Z" | "E" | "P" | "T" | "G" | "M" | "k" | "h" | "da" |
"d" | "c" | "m" | "u" | "μ" | "n" | "p" | "f" | "a" | "z" | "y";
sint_suffix = ["s"], "i", {digit}; (* 32 bits by default *)
uint_suffix = "u", ["i"], {digit}; (* 32 bits by default *)
size_suffix = ["s" | "u"], "z"; (* Unsigned by default *)
fbin_suffix = "f", ["b"], {digit}; (* 64 bits by default *)
fdec_suffix = ["f"], "d", {digit}; (* 64 bits by default *)
(*
Design rationale: despite of the fact that `p` can only be in
lowercase in Onyx, it'd be confusing to see `42P8` and treat it
other than a "42 * 2 ** 8". Had to pick another symbol therefore.
*)
posit_suffix = "t", {digit};
posit_suffix = "p", {digit};
(*
Design rationale: would not use `B`, because
brain float is "smaller" than usual float.
*)
bfloat_suffix = "bf", {digit};
literal_suffix =
sint_suffix |
uint_suffix |
size_suffix |
fbin_suffix |
fdec_suffix |
xbin_suffix |
xdec_suffix |
posit_suffix |
bfloat_suffix;
sign = "-" | "+";
non_decimal_value ($prefix, $base) =
$prefix, {$base | "_"}, $base,
[".", $base, {$base | "_"}],
["p", [sign], {digit}-]
numeric_literal =
[sign],
(
non_decimal_value("0b", binary) |
non_decimal_value("0o", octal) |
non_decimal_value("0x", hex) |
(
digit, {
(*
Multiplier prefixes allow to
have multiple radix points.
*)
["."], digit,
(* The prefix families must not be mixed. *)
{digit | "_" | multiplier_prefix_iec | multiplier_prefix_si}
},
["e", [sign], {digit}-]
)
),
{"_"},
[literal_suffix];
Underscores may be used to:
|
A recommended chunk size should be preserved when applying underscores (e.g. 10_000_00 is confusing).
|
Numeric literals always have big-endian representation, i.e. most significant bits of data come first.
For example, 0x2a == 0x002a == 0b101010 == 0b0010_1010 == 42
.
TODO: Applying an exponent (e
for decimal, p
for other bases) makes a literal fractional. However, it still can be forced to be integral, if is allowed to.
Regardless of the literal representation endianess, the actual memory layout of a numeric value in runtime may be defined elsewhere. |
Only the decimal base may be used for floating and fixed decimal literals.
TODO: Which bases and literals allow signs?
Base prefix | Base | Recommended chunk size | Example |
---|---|---|---|
|
Binary |
4 (nibble) |
|
|
Octal |
3 |
|
|
Decimal |
3 (a thousand) |
|
|
Hexadecimal |
2 |
|
In decimal literals, e
is used to denote a decimal exponent (* 10 e
).
In literals with other bases, p
is used to denote a binary exponent, i.e. power of two (* 2
p
).
An omitted exponent defaults to 0 (* 2 0
or * 10
0
, which is always * 1
).
TODO: Non-decimal literals can be used for any number. Rules are the same:
0x10.1 == 16 + (1 * 16 ** -1) == 16.0625
0b10000.0001 = 16 + (1 * 2 ** -4) = 16.0625
0o20.04 = 16 + (4 * 8 ** -2) = 16.0625
0b101010f == 0b101010.0f
# , thus non-decimal literals are sometimes not exact representations.
An integral literal can be restricted to an integer type.
A likely-fractional literal still can be restricted to an integer type, but defaults to a non-integer type, e.g. 2e2 : SInt32
would be FBin64
by default.
A likely-fractional literal can be turned into integral by applying an integral suffix, e.g. 2e2i
would be SInt32
by default.
A fractional literal can not be restricted to an integer type, e.g. 2e-2 : FBin64
.
A literal with big multiplier prefixes (including multiplier radix points) and/or positive exponents is likely-fractional.
A literal with at least one small multipler prefix, or negative exponent, or fractional radix point; is fractional.
Multiplier prefixes
Numeric literals support both binary and decimal multiplier prefixes.
- big multiplier prefix
-
One having a positive exponent.
- small multiplier prefix
-
One having a negative exponent.
Prefix | Name | Multiplier |
---|---|---|
Big prefixes |
||
|
Yobi |
1024⁸ |
|
Zebi |
1024⁷ |
|
Exi |
1024⁶ |
|
Pebi |
1024⁵ |
|
Tebi |
1024⁴ |
|
Gibi |
1024³ |
|
Mebi |
1024² |
|
Kibi |
1024¹ |
Prefix | Name | Multiplier |
---|---|---|
Big prefixes |
||
|
Yotta |
10²⁴ |
|
Zetta |
10²¹ |
|
Exa |
10¹⁸ |
|
Peta |
10¹⁵ |
|
Tera |
10¹² |
|
Giga |
10⁹ |
|
Mega |
10⁶ |
|
Kilo |
10³ |
|
Hecto |
10² |
|
Deca |
10¹ |
Small prefixes |
||
|
Deci |
10⁻¹ |
|
Centi |
10⁻² |
|
Milli |
10⁻³ |
|
Micro |
10⁻⁶ |
|
Nano |
10⁻⁹ |
|
Pico |
10⁻¹² |
|
Femto |
10⁻¹⁵ |
|
Atto |
10⁻¹⁸ |
|
Zepto |
10⁻²¹ |
|
Yocto |
10⁻²⁴ |
Multiplier prefixes are case-sensitive.
Multiplier prefixes can only be used in decimal-based literals.
Multiplier prefixes from different standards must not be mixed, i.e. no binary and decimal prefixes in a single literal.
Multiplier prefixes from the same standard may be mixed in a single literal.
Multiple multiplier prefixes in a single literal must be in the same order as defined in the tables below.
For example, 10M300k == 10300000
is valid, but 300k10M
is not, because M
precedes k
in Table 9, “Decimal multiplier prefixes (SI)”.
Due to the suffix priority in literals, it may be desirable to split value and kind with an underscore for readability, e.g. Example 24. Readability of suffixes in literals
|
Integer literals
An unconstrained integral literal default type is SInt32
.
Floating-point literals
An unconstrained likely-fractional and fractional literal default type is FBin64
.
Binary fixed-point literals
(* For consistency, it is still
referenced as "`XBin` suffix". *)
xbin_suffix =
(* An optional signedness. *)
["s" | "u"],
(* The binary fixed-point (a.k.a.
Q number) literal symbol. *)
"Q",
(* An optional bitsize. *)
{digit},
(* An optional Fractional part size in bits. Can also
be read as an amount of bits to left-shiFt by. *)
["f", [sign], {digit}-];
xcite::term-q-number[]
A Q number literal is signed by default.
A Q number literal value may be in any base, contain optional multiplier prefixes, optional exponent and optional fractional radix point.
Either bitsize or _f_ractional part size is required in a Q number literal.
If bitsize is omitted, the nearest greater or equal byte size is inferred, e.g. 0Qe10 == 0Q16e10
, 0Qe-5 == 0Q8e-5
, 0Qe8 == 0Q16e8
.
If fractional part size is omitted, it equals to the bitsize minus one bit if signed, e.g. 0Q8 == 0Q8e7
, 0uQ8 == 0uQ8e8
.
1Qf-8 == 2 ** 8 == 256
0.625uQ8 + 0.0078125uQ8 ==
0b0.101uQ8 + 0b0.0000001uQ8 =
0b0.10100001uQ8 == 0.6328125d
# 1010 0000
# + 0000 0010
# = 1010 0010
# In signed Q numbers, the most
# significant bit is the sign bit,
# if the implementation is 2's complement.
#
# TODO: Come up with a valid example.
#
-0.625Q8 + 0.0078125Q8 ==
-0b0.101Q8e7 + 0b0.0000001Q8e7 =
-0b0.10110010Q8 == −0.60546875d
# 1011 0000
# + 0000 0010
# = 1011 0010
Decimal fixed-point literals
xdec_suffix =
(* An optional signedness. *)
["s" | "u"],
(* The decimal fixed-point literal symbol. *)
"D",
(* An optional total amount of digits. *)
{digit},
(* An optional amount of digits which are fractional. *)
["f", {digit}-];
A decimal fixed-point literal can only be in decimal base.
A decimal fixed-point literal is signed by default.
A decimal fixed-point literal requires either total amount of digits or amount of digits which are fractional, or both, set. In case if only the total amount is set, the fractional amount is deemed to be zero. In case if only the fractional amount is set, the total amount is deemed to be equal to the fractional.
xcite::dpd-bitsize-omitted[]
17.4.2. Text literals
- Abstract character
-
A character from the Unicode character set, which may consist of one or many codepoints.
- Source-encoded abstract character
-
An abstract character encoded in the source file encoding, for example ⟨a⟩ encoded as
a
(U+0061); ⟨å⟩ encoded aså
(U+61, U+030A) orå
(U+00E5).
A text literal is a sequence of abstract characters which are either source-encoded or made up from Section 17.4.2.1, “Numerical codepoints”.
An abstract character is normalized upon parsing. Then, the normalized abstract character is encoded into a single codepoint or a sequence of codepoints in the encoding the literal is constrained to, i.e. the target encoding, with preference given to a variant comprised of a lesser amount of codepoints. When the target encoding supports multiple codepoint sequences for a single abstract character representation, and a specific sequence variant is desired, numerical codepoints shall be used instead of source-encoded abstract characters. If the target encoding does not support the abstract character at all, a compiler panics.
The standard defined a number of escaped abstract character sequences consisting of a backslash (\
, U+005C) followed by a single Latin letter, which expand to a predefined abstract character.
Sequence | Character | Description |
---|---|---|
|
|
Alert (Beep, Bell) |
|
|
Backspace |
|
|
Escape character |
|
|
Formfeed Page Break |
|
|
Line Feed |
|
|
Carriage Return |
|
|
Horizontal Tab |
|
|
Vertical Tab |
|
|
Whitespace |
|
N/A |
Ignores the newline |
The following normalization scenarios are possible, yet to discuss.
|
Numerical codepoints
A numerical codepoint is a codepoint encoded as an explicit numerical value, e.g. ⟨f⟩ encoded as \102
.
A numerical codepoint may be in either octal (\o141
), decimal (\96
) or hexadecimal (\x61
) base.
In either base, regardless of the literal’s encoding, the numerical value has big endianess, for example ф
is encoded as \x84d1
.
A numerical codepoint may map to an invalid (i.e. not defined in) abstract character in the literal’s charset, but its bitsize must not exceed the maximum allowed by the charset.
For example, '\xffffffffffffffff'
would trigger panic for the most of the possible charsets, because no charset defines 64 bits as its codepoint’s size.
A numerical codepoint value may be wrapped in curly brackets to clearly define its boundaries.
For example, "{97}1" == "a1"
.
An unmatched bracket would trigger panic.
Without brackets, a numerical codepoint value is terminated either with a grapheme outside of the base’s allowed graphemes range, or at the end of the containing literal.
For example, in "\97f" == "af"
, the numerical codepoint in decimal base is terminated with a non-decimal grapheme f
.
Numerical codepoint values do not allow neither literal prefixes nor literal suffixes, for example {1ki8}
would trigger panic.
Numerical codepoint values allow non-trailing underscores.
numerical_codepoint =
("\o", {octal, "_"}-, "_"-) |
("\o", "{", {octal, "_"}-, "_"-, "}") |
("\", {digit, "_"}-, "_"-) |
("\", "{", {digit, "_"}-, "_"-, "}") |
("\x", {hex, "_"}-, "_"-) |
("\x", "{", {hex, "_"}-, "_"-, "}");
Character literals
A character literal is a text literal comprised of a single abstract character comprised of a single codepoint wrapped in single quotes ('
), for example 'a' == '\97'
.
A character literal is considered a scalar literal.
An unconstrained character literal has type Char<UCS>
.
UCS
is the language-defined default charset for a character literal.
A charset literal suffix is defined by applying the @[LiteralSuffix]
annotation to a static functor constrained to a single @(Char)
argument.
A charset suffix may be applied to a character literal, which alters the literal’s charset, for example 'a’ucs
.
Depending on the functor implementation, if the target charset does not contain the literal’s abstract character, a compiler should panic.
'a' == 'a'ucs : Char<UCS> # Default
@[LiteralSuffix(/(us)?ascii/, /iso646us/)]
namespace ISO646::US <~ Charset<7>
def (~@(Char)) ~@(Char<self>)
# Basically, return the codepoint if it's less than 128.
# Otherwise, panic!
end
end
'a'ascii : Char<ISO646::US> # Another charset applied
# 'ф'ascii # Panic! `ISO646::US` does not contain this abstract character
A character literal may have an integral literal suffix, which would constrain it to an integral literal of kind defined by the suffix equal to the codepoint’s numerical value.
For example, 'a’u8 == 97u8
, '豈’i16 == (0xF900)i16 == -1792i16
.
character_literal = "'", unicode | {numerical_codepoint}, "'";
String literals
A string literal is a text literal comprised of an arbitrary amount (including zero) of abstract characters wrapped in double quotes ("
) or slashes (/
), for example "f"
, "å\0"
, /bar/
.
A string literal is considered a scalar literal.
The NUL character is NOT appended implicitly to a string literal. Instead, it must be done manually. |
A string literal has type String<E, Z>
, where Z
is the exact amount of codeunits in the literal, for example "få\0" : String<UTF8, 4>
, "foo"ucs2 : String<UCS2, 6>
.
UTF8
is the language-defined default encoding for a string literal.
An encoding literal suffix is defined by applying the @[LiteralSuffix]
annotation to a static functor constrained to a single @(String)
argument.
An encoding suffix may be applied to a string literal, which alters the literal’s encoding, for example 'a’utf8
.
Depending on the functor implementation, if the charset defined by the target encoding does not contain at least one abstract character from contained in the literal, a compiler should panic.
"a" == "a"utf8 : String<UTF8, 1> # Default values
@[LiteralSuffix(/(us)?ascii/, /iso646us/)]
namespace ISO646::US <~ Encoding<self, 7>
def (~@(String)) ~@(String<self>)
# Basically, build a string containing the same
# codepoints while their values are less than 128.
# Otherwise, panic!
end
end
"a"ascii : String<ISO646::US, 1> # Another encoding applied
# "ф"ascii # Panic! `ISO646::US` does not contain this abstract character
A string literal may have an integral suffix, which would constrain it to an array of integral literals of kind defined by the suffix equal to the numerical values of codeunits' making up the literal’s abstract characters.
If the type defined by the literal can not contain a codeunit value in the literal’s encoding, a compiler panics.
For example, "aф"u8 == [0x61u8, 0xd1u8, 0x84u8] : UInt8[3]
.
string_literal = "\"", {unicode | numerical_codepoint}, "\""
string_literal = "/", {unicode}, "/";
Regex literals
A string literal wrapped in //
is called a regex literal.
Neither of the escape sequences work in a regex literal, but \/
.
Therefore, numerical codepoints are not usable in regex literals.
Heredocs
A heredoc is a string literal spanning through multiple lines. It begins and terminates with the same sequence of ASCII characters.
A heredoc literal value (the text) must be wrapped in newlines.
A heredoc opening sequence may be wrapped in parentheses and optionally applied literal suffixes to. Suffixes affect a heredoc literal in the same way as they would affect a simple string literal.
The first token of a heredoc terminating sequence must be the first non-control character in a source line.
Once the terminating sequence is matched, a compiler treates the heredoc literal as complete; it allows, for example, to call a method immediately after the terminating sequence, e.g. SQL.upcase
.
A heredoc is aligned at the least indentation.
Escaping a newline (\⏎
) may affect the least indentation.
The first line of a heredoc does not have a newline character prepended.
The last line of a heredoc does not have a newline character appended.
A good formatting practice is to always make the least indentation to align with the line containing the beginning heredoc sequence. The terminating sequence should be placed on a new line. Sequences should be in uppercase. |
assert(<<-SQL
SELECT foo
FROM bar
SQL == "SELECT foo\n FROM bar")
(
# This is ill-formatted: the line containing `SELECT`
# should be indented by four spaces instead of two.
# Note that the indentation of the
# terminating sequence does not matter.
assert(<<-(sql)ascii
SELECT foo
FROM bar
sql == " SELECT foo\nFROM bar")
)
assert(<<-SQL
SELECT
FROM bar\
WHERE SQL
SQL == "\nSELECT\nFROM bar WHERE SQL\n"
(* NOTE: Opening and closing sequences must be equal! *)
heredoc =
"<<-",
(
"(", seq(en | underscore), ")",
{literal_suffix}
) | seq(en | underscore),
nl,
seq(unicode),
nl,
seq(en | underscore);
17.4.3. Symbol literals
For a type restriction R
, a symbol literal may unambiguously match a type T
satisfying the restriction T ~ R
, whereas the symbol value matches the last element of full path identifier of T
in accordance to the Symbol literal matching algorithm.
The algorithm is defined as follows:
-
The exact value of a symbol is compared directly with the last part of every type path matching the expression, until equal. If matched, success.
-
If the symbol value is not explicit:
-
The symbol value is lower-cased, any non-letter and non-digit abstract character is removed; this is called a normalized value.
-
Last parts of every type path to be compared with which is not an explicit identifier, is also normalized.
-
The normalized symbol value is compared with each normalized last part, until equal. If matched, success.
-
-
No match, thus failure.
If a symbol literal expands to a compile-time constant value, it may be then suffix-casted to another type, e.g. (:max)u8 == 255u8
.
Symbols can only be used as literals and expand during compilation.
There is no built-in runtime Symbol
type.
The following example demonstrates suffix-casting of a symbol literal.
assert((:max)u8 == UInt8::Max)
assert((:Min)i == SInt32::Min)
The following example demonstrates enum values lookup.
enum Foo
val Bar
alias `Бар to Bar
val Baz
end
def is_baz?(foo : Foo)
# `Foo#==` expects `Foo`,
# thus `:bar` properly
# expands to `Foo::Bar`
foo == :bar
end
# Using an explicit symbol value here
assert(not is_baz?(:`Бар))
The following example demonstrates symbol expansion in case branches.
struct A::B::C;
final var = Std@rand(42, C())
case var
when :int # Properly expands to `Int`, which is legal
Std.out << "Is int"
when :c # Expands to `A::B::C`
Std.out << "Is A::B::C"
end
The following example demonstrates symbol ambiguity.
enum Foo
val Bar
val Baz
end
enum Qux
val Bar
val Quux
end
def is_bar?(e)
e == :bar
end
# # It would go through every type matching
# # the restriction (which is none), and panic
# # upon `Foo::Bar` vs. `Qux::Bar` ambiguity.
# assert(is_bar?(:bar)) # Panic!
# This works, because the specialization
# would have `e : Foo` restriction.
assert(is_bar?(Foo::Bar))
# # This would panic upon ambiguity between
# # `e == Foo::Bar` vs. `e == Qux::Bar`.
# assert(is_bar?(Std@rand(Foo::Bar, Qux::Bar))) # Panic!
17.5. Aggregate literals
TODO: Restrictions with aggregate types? ~ \(Array)
, ~ \(Array<Size>)
.
The generic argument determining the element type of the type an aggregate literal is constrained to is inferred to a variant of the literal’s values' types. If an aggregate literal consists of likely-fractional and fractional literals, then the element type is inferred to be a fractional type.
For a tuple literal, the type is inferred for each element separately.
[1, 1] : SInt32[2]
[1i16, 1] : SInt16[2]
[1u, 1] : UInt32[2]
[1u, 1i] : ?<UInt32, SInt32>[2]
[1, 0.5] : FBin64[2]
[1, 0.5d32] : FDec32[2]
[1u, 0.5] : ?<UInt32, FBin64>[2]
[1u, 1f] : ?<UInt8, FBin64>[2]
[1, 1p32] : Posit32[2]
[1p, 1p32] : Posit32[2]
# [1, 1p32, 1u8] # Panic! Can not unambiguosly infer type
# of `1` -- must constrain explicitly
[1u, 1p32, 1u8] : ?<Posit32, UInt8>[2] # OK
17.5.1. Array literals
An array literal is constrained to the Array
type.
17.5.2. Tensor literals
# r, l0 -- row
# c, l1 -- column
# It could work, because `(\SInt32)x2` is
# not allowed (must be a runtime type).
(SInt32)x4 : Vector<SInt32, 4> # (\i)x4
(FBin64)x4x4c : Matrix<FBin64, 4, 4, 0> # (\f)x4x4c
($double)x2x3x4l2 : Tensor<$double, 2, 3, 4, 2> # (\$d)x2x2x4l2
(XDec<2, 2>)x2 # (\Df2)x2
(XBin8)x3 # (\Q8)x3
17.6. Magic literals
# Quoted string
#
%q(), %qucs2[], %qutf16le<>
%utf32be[] # Works if `utf32be` is unambiguously `Charcoding`
%qutf16-EOF
Hello
EOF
# Range
#
%r[1 2], %ri(min 0), %fr[2 5), %ru8(1 max]
# Containers
#
%i[1 2] # Array
%i<1 2> # Vector
%i(1 2) # Tuple, but why?
%i(foo: 1, bar: 2) # That's why
%i|[1 2]|r # Tensor (row-major); `|m0` -- ditto
%c[hi] # Array of chars
%casciiu8<hi> # Vector of chars
%ucs[hello] # Works if `ucs` is unambiguous `Charset`
%w[hello world], %wutf16le(a b) # OK
Magic literals is a unified syntax for:
-
Array, tuple and vector container literals, allowing:
-
To apply literal suffixes on the container itself instead of on each of its elements;
-
To avoid repeated typing of separating commas.
-
A magic literal requires at least one literal suffix to be specified.
A magic literal which is a character container literal or a quoted string literal is a magic text literal.
Elements within a magic literal are separated by spaces, unless it is a magic text literal. In a magic text literal, every character is meaningful, including spaces.
Outside of a magic text literal, a sequence of non-digit characters is considered a symbol literal (assuming that :
is omitted).
magic_literal_values =
[
literal_value,
{" ", literal_value}
];
magic_tuple_literal =
"(", magic_literal_values, ")";
magic_array_literal =
"[", magic_literal_values, "]";
magic_vector_literal =
"<", magic_literal_values, ">";
magic_tensor_literal =
"|", magic_literal_values, "|",
[tensor_literal_dimension_appendix];
(* If literal suffixes are missing, the container
type is inferred from its values. *)
magic_literal =
"%",
{literal_suffix}-,
(
magic_tuple_literal |
magic_array_literal |
magic_vector_literal |
magic_tensor_literal
);
17.6.1. Word literals
A magic literal allows special w
and *w
major suffixes.
Either turns the literal into a container of space-separated strings, i.e. words.
The *w
variant expands to a container of hard string copies instead of pointers to strings.
assert(%wutf8(foo barbaz) ==
("foo", "barbaz") :
Tuple<
String<UTF8, 3>*s,
String<UTF8, 6>*s
>)
assert(%*w[foo barbaz] ==
[*"foo", *"barbaz"] :
Variant<
String<UTF8, 3>,
String<UTF8, 6>
>[2])
17.6.2. Quoted string literals
A qoted string literal begins with the q
suffix.
In a quoted string literal brackets act as the string quotes.
assert(%qutf8["foo"] == %q("foo") == "\"foo\"")
18. Interoperability
Onyx provides native means of interoperability with the C programming language.
TODO: An alternative would be to process C macro output, and if it’s a constant, update it accordingly.
So it’s like foo($SOME_MACRO)
⇒ foo($('a'))
.
Panic if got a non-constant C expression.
In this section, C terms may use smaller italic script; C keywords and types may use smaller monoscript
.
For example, Onyx struct is different from C struct, and Onyx const
is different from C const
.
18.1. C standard
Onyx supports Section 18.2, “Importing” and Section 18.3, “Exporting” a subset of C entities in accordance to C18; herein the C standard.
An Onyx compiler MUST be able to compile imported and exported C code.
C features other than included in the C standard (e.g. a specific C compiler extension) are NOT required by the Onyx standard.
18.2. Importing
A C file may be imported using an import
statement.
(*
The `import` statement shares syntax and semantics
with the `require statement`.
The following statements are equivalent:
```
# NOTE: `.h` is implicitly appended only if
# a file missing its extension is not found
import "foo", "bar" from "baz"
import "foo.h", "bar.h" from "baz"
import "baz/foo.h", "baz/bar.h"
import "baz/foo.h"
import "baz/bar.h"
```
As with `require`, the order of importing matters.
*)
import = "import", path, {",", path}, ["from", path];
The import
statement shares syntax and semantics with the require
statement with the following differences:
-
Instead of compilation-defined require paths it relies on import paths, possibly specified by passing an
-I
flag. -
The default imported file extension is
.h
if missing.
Akin to require
, re-importing the same file at the same path multiple times is allowed, leading to an actual importing happening exactly once at the moment when the first import
statement with the path is met.
Files with the same contents, but at different paths, are treated as distinct imports.
All variable declarations and definitions, function declarations and definitions, enum definitions, union declarations and definitions, struct declarations and definitions, typedef declarations with external linkage and preprocessor text macros are imported from a C header into the Onyx scope.
A C entity declared outside of an imported C header, but availaible from within it (e.g. via a nested #include
directive), is also imported.
An imported entity exists in the top-level Onyx namespace.
An imported entity can be referenced from the Onyx scope by prepending $
to its identifier, e.g. $int
.
Akin to Onyx identifiers, wrapping backticks may be used to reference multi-word entities, e.g. $`unsigned long`
.
The Macro API has means to access an imported entity information.
18.2.1. Type mapping
All C fundamental types are interchangeable with Onyx types.
Some C fundamental types may be target-dependent. For example, int
could map at least to SInt16
or SInt32
depending on the target data layout model.
The following table defines mapping rules between C fundamental types and Onyx types.
C type | Onyx type |
---|---|
|
|
|
TODO: Same as either |
|
TODO: |
|
|
|
|
|
At least 16 bits, usually |
|
At least 16 bits, , usually |
|
At least 16 bits, usually |
|
At least 16 bits, usually |
|
At least 32 bits, usually |
|
At least 32 bits, usually |
|
At least 64 bits, usually |
|
At least 64 bits, usually |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18.2.2. Importing variables
All imported variables are available in the Onyx scope as static variables mapped in accordance to the Section 18.2.1, “Type mapping” scheme.
An imported variable with const type qualifier is a final and immutable (if applicable) variable in Onyx.
An imported variable with volatile type qualifier is a volatile
variable in Onyx.
The restrict type qualifier is ignored by Onyx.
int i;
volatile const float j = 42;
struct point_t {
int x, y;
};
struct point_t point;
const struct point_t cpoint = { 10, 20 };
char sa[] = "foo"; // The type is completed as `char[4]`
char* sb = "bar";
const char sc[] = "qux"; // The type is completed as `char[4]`
const char* sd = "kax";
import "./variables.h"
@[Entry]
export void main () {
assert($i is? $int)
assert(&$i is? $int*srw)
assert($i == 0)
assert($j is? $float)
assert(&$j is? $float*sr)
assert($j == 42)
assert({{ nx.c.j.is_volatile }})
assert($point is? <mut $point_t>)
assert(&$point is? <mut $point_t>*srw)
assert($point.x += 1 == 1)
assert($cpoint is? <const $point_t>)
assert(&$cpoint is? $point_t*sr)
assert($cpoint.y == 20)
assert($sa is? <mut $char[4]>)
assert(&$sa is? <mut $char[4]>*srw)
assert($sa == %c[foo], $sa[2] == 'o')
assert($sb is? $char*srw)
# assert($sb == "bar") # There is no such a guarantee
assert((unsafe! $sb as String*sr)->eq?("bar"))
assert($sc is? <const $char[4]>)
assert(&$sc is? $char[4]*sr)
assert($sd is? $char*sr)
}
18.2.3. Importing functions
A function definition is compiled upon importing.
Calling an imported function is always unsafe.
An imported function argument can not be referenced by its name upon call.
// A function declaration.
int foo();
// A function definition.
int bar(int a, int b) {
return a + b;
}
import "./functions.h"
@[Entry]
export void main () {
# It's a linker's responsiblity to
# ensure the symbol is actually defined
unsafe! foo()
# Calling an actual definition
assert((unsafe! $bar(1, 2)) == 3)
}
18.2.4. Importing enums
Importing a enum imports its values as macros.
Imported enums can not be reopened.
enum color_t { RED, GREEN = 2, BLUE };
import "./enum.h"
@[Entry]
export void main () {
final color = $GREEN # Would evaluate to literal `2`, hence `$int`
}
18.2.5. Importing unions
An imported union may be initialized using the C struct initializer.
Accessing an imported union’s member is always unsafe.
Imported unions can not be reopened.
union union_t {
int a;
double b;
};
import "./union.h"
@[Entry]
export void main () {
final union = $union_t{ .a = 42 }
assert(unsafe! union.a == 42) catch return 1
}
18.2.6. Importing structs
An imported struct may be initialized using the C struct initializer.
Accessing a non-atomic member of a non-atomic imported struct is fragile. Accessing an atomic member of a non-atomic imported struct is threadsafe. Accessing a member of an atomic struct is unsafe. Accessing a member of a volatile struct is volatile.
Mutability modifers are applicable to imported structs.
By default, an imported struct type is implicitly const
.
Imported structs can not be reopened.
struct struct_t {
int a;
double b;
};
import "./struct.h"
@[Entry]
export void main () {
final strukt = mut $struct_t{ .a = 42, .b = 0.5 }
assert((strukt.a += 1) == 43) catch return 1
}
18.2.7. Importing typedefs
Referencing an imported typedef is the same as referencing the type it aliases.
typedef struct { double hi, lo; } range;
import "./typedef.h"
@[Entry]
export void main () {
final range = $range {
.hi = 0, .lo = 1 }
}
18.2.8. Importing preprocessor macros
An imported preprocessor text macro (hereby macro) may be referenced from the Onyx scope as a regular C entity by prepending the $
symbol to its identifier.
An imported macro reference allows arguments to be passed to it. It is a error to use parentheses on an object-like macro.
Once referenced, a macro is immediately evaluated in accordance to the Section 18.1, “C standard”. The evaluation result is then embedded directly into the source code.
Therefore, the concept of safety is not applicable to a macro reference itself. Instead, code generated by its evaluation is a subject to safety judgement.
Imported macros are also available from the Macro API.
#define FOO "foo"
#define DOUBLE(arg) #arg * 2
import "./macros"
@[Entry]
export void main () {
assert($FOO == "foo")
assert($DOUBLE(2) == 4)
}
18.3. Exporting
A C declaration, definition, preprocessor directive or a block of C code may be exported using an export
statement.
An exported entity is available in the assembly code iff it has external linkage.
After a entity is exported, it is then treated in the same way as if it was imported from a header.
Exported functions are compiled as soon as they’re met by an Onyx compiler.
Exported entities have their identifiers unmangled.
Exporting from within an Onyx namespace does not alter a entity’s identifier in any way.
An export
statement contents may be preprocessed with Onyx macros.
Onyx annotations are applicable to export
statements.
A non-block version of the export
statement is terminated in accordance to the rules defined by the Section 18.1, “C standard”.
For instance, a struct definition must be terminated with a semicolon, but a function definition does not have to.
A preprocessor directive is terminated with a newline unless it (the newline) is preceded by a backslash.
An exported block of code is enclosed in curly brackets and does not require a semicolon.
export
statement syntaxc = (? Raw C code ?)
c_var_decl = (? A C variable declaration ?);
c_var_def = (? A C variable definition ?);
c_struct_decl = (? A C struct declaration ?);
c_struct_def = (? A C struct definition ?);
c_union_decl = (? A C union declaration ?);
c_union_def = (? A C union definition ?);
c_enum = (? A C enumeration definition ?);
c_typedef = (? A C typedef ?);
c_directive = (? A C preprocessor directive ?);
export = "export",
(* An export statement may accept a raw block of C code... *)
("{", c, "}"),
(* or a single C declaration or definition... *)
c_var_decl |
c_var_def |
c_union_def |
c_struct_def |
c_union_decl |
c_struct_decl |
c_enum |
c_typedef |
c_directive |
(* or a function definition with C prototype, but Onyx body *)
nxc_function_def;
An Onyx compiler SHOULD provide a way to generate a C header file from source Onyx files, which MUST contain:
-
Declarations of exported function definitions.
-
Raw contents of any other
export
statement. -
Onyx comments related to
export
statements, as inline C comments.
18.3.1. Exporting functions
- Exported C function
-
A function definition contained in a block of C code, i.e. in the block version of the
export
statement.
An exported C function is written in C and compiled by an Onyx compiler.
- Exported Onyx function
-
A function definition immediately following the
export
keyword.
An exported Onyx function begins with a prototype written in C. The prototype ends with an opening curly bracket.
An exported Onyx function body is written in Onyx.
An exported Onyx function is terminated by a closing curly bracket matching the one the prototype ended with.
An exported Onyx function body has fragile environment.
Calling an exported Onyx function is always unsafe for consistency reasons. |
If an exported Onyx function is within an Onyx namespace, Onyx lookup rules are still applicable within its body.
An exported Onyx function arguments are accessible within its body in accordance to the following table.
C argument declaration | Onyx argument declaration |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other |
|
|
Accessible via |
|
|
|
|
c_proto = (? A C function prototype, e.g. `void main(void)` ?);
nxc_function_def = c_proto, "{", {expr}, "}";
# This is an exported Onyx function.
@[Entry] # An Onyx annotation!
export void main() {
# Onyx body
}
# Has a non-external linkage
export static _Alignas(8) const int i = 42;
# This comment is not going to be exported
# because it is not related to the statement
export struct struct_t {
int a, b;
};
export #ifndef FOO
export #define FOO 42, \
43
export #endif
export {
// All C comments would be exported
// # Onyx comments aren't allowed here
_Atomic _Alignas(double) int j;
int k;
#define OUTPUT(arg) puts ( #arg );
// This is an exported C function.
static void foo () {
// C code
}
}
A possible variant of a generated C header:
// This is an exported Onyx function.
void main();
static _Alignas(8) const int i = 42;
struct struct_t {
int a, b;
};
#ifndef FOO
#define FOO 42, \
43
#endif
// All C comments would be exported
// # Onyx comments aren't allowed here
_Atomic _Alignas(double) int j;
int k;
#define OUTPUT(arg) puts ( #arg );
// This is an exported C function.
static void foo () {
// C code
}
Appendix A: Compiled Syntax
The following is the entire compiled Onyx language syntax.
binary = "0" | "1";
octal = binary | "2" | "3" | "4" | "5" | "6" | "7" | "8";
digit = octal | "8" | "9";
(* Design rationale: hexadecimals are in upper-case to clearly
distingusigh them from exponents and prefixes (`Q` and `D`
are exceptions). Multiplier prefixes can not be used in
hexadecimal literals, so they don't mix. *)
hex = digit | "A" | "B" | "C" | "D" | "E" | "F";
(* Any Unicode abstract character, including
those consisting of multiple codepoints. *)
unicode = (? A Unicode character ?);
fragile_statement =
"fragile!",
expr | block();
unsafe_statement =
"unsafe!",
expr | block();
safety_modifier =
"threadsafe" |
"fragile" |
"unsafe" |
"undefsafe";
(* TODO: Move. *)
compl_mod =
(* !keyword(mod) *) "compl" |
(* !keyword(mod) *) "incompl";
reopen_statement =
[compl_mod],
(* !keyword(statement) *)
"reopen",
type_ref,
";" | block();
(* A unit type declaration. *)
unit =
type_visibility_mod,
(* !keyword(Unit type declaration) *)
"unit",
id,
[generic_args_decl],
";" | block();
trait_decl =
trait_decl_mod,
["decl"],
"trait",
id,
[nb, generic_arguments_decl],
";" | body();
trait_decl_mod = [type_visibility_mod];
derive =
["undefstor" | "static" | "instance"],
"derive",
id,
";" | block();
require_directive =
"require",
@unord(
(* Path(s) of the required file *)
(["from"], path, {",", path}),
(* The base path *)
["at", path]
);
(*
Import a file written possibly
in a language other than Onyx.
```
import from "stdio.h" in "C"
import in "Rust" from "main.rs", "aux.rs" at "/mypath/"
```
*)
import_directive =
"import",
@unord(
(* The source language of an imported file *)
("in", string),
(* Paths to the imported files *)
("from", string, {",", string}),
(* The base path for the imported files *)
["at", string]
);
using =
using_namespace |
using_refinement |
using_alias;
using_namespace =
"using",
["namespace"],
type_ref;
using_refinement =
"using",
["refinement"],
type_ref;
using_alias =
"using",
["alias"],
decl,
("=", "to"),
ref;
multiplier_prefix_iec =
"Yi" | "Zi" | "Ei" | "Pi" |
"Ti" | "Gi" | "Mi" | "Ki";
multiplier_prefix_si =
"Y" | "Z" | "E" | "P" | "T" | "G" | "M" | "k" | "h" | "da" |
"d" | "c" | "m" | "u" | "μ" | "n" | "p" | "f" | "a" | "z" | "y";
sint_suffix = ["s"], "i", {digit}; (* 32 bits by default *)
uint_suffix = "u", ["i"], {digit}; (* 32 bits by default *)
size_suffix = ["s" | "u"], "z"; (* Unsigned by default *)
fbin_suffix = "f", ["b"], {digit}; (* 64 bits by default *)
fdec_suffix = ["f"], "d", {digit}; (* 64 bits by default *)
(*
Design rationale: despite of the fact that `p` can only be in
lowercase in Onyx, it'd be confusing to see `42P8` and treat it
other than a "42 * 2 ** 8". Had to pick another symbol therefore.
*)
posit_suffix = "t", {digit};
posit_suffix = "p", {digit};
(*
Design rationale: would not use `B`, because
brain float is "smaller" than usual float.
*)
bfloat_suffix = "bf", {digit};
literal_suffix =
sint_suffix |
uint_suffix |
size_suffix |
fbin_suffix |
fdec_suffix |
xbin_suffix |
xdec_suffix |
posit_suffix |
bfloat_suffix;
sign = "-" | "+";
non_decimal_value ($prefix, $base) =
$prefix, {$base | "_"}, $base,
[".", $base, {$base | "_"}],
["p", [sign], {digit}-]
numeric_literal =
[sign],
(
non_decimal_value("0b", binary) |
non_decimal_value("0o", octal) |
non_decimal_value("0x", hex) |
(
digit, {
(*
Multiplier prefixes allow to
have multiple radix points.
*)
["."], digit,
(* The prefix families must not be mixed. *)
{digit | "_" | multiplier_prefix_iec | multiplier_prefix_si}
},
["e", [sign], {digit}-]
)
),
{"_"},
[literal_suffix];
(* For consistency, it is still
referenced as "`XBin` suffix". *)
xbin_suffix =
(* An optional signedness. *)
["s" | "u"],
(* The binary fixed-point (a.k.a.
Q number) literal symbol. *)
"Q",
(* An optional bitsize. *)
{digit},
(* An optional Fractional part size in bits. Can also
be read as an amount of bits to left-shiFt by. *)
["f", [sign], {digit}-];
xdec_suffix =
(* An optional signedness. *)
["s" | "u"],
(* The decimal fixed-point literal symbol. *)
"D",
(* An optional total amount of digits. *)
{digit},
(* An optional amount of digits which are fractional. *)
["f", {digit}-];
numerical_codepoint =
("\o", {octal, "_"}-, "_"-) |
("\o", "{", {octal, "_"}-, "_"-, "}") |
("\", {digit, "_"}-, "_"-) |
("\", "{", {digit, "_"}-, "_"-, "}") |
("\x", {hex, "_"}-, "_"-) |
("\x", "{", {hex, "_"}-, "_"-, "}");
character_literal = "'", unicode | {numerical_codepoint}, "'";
string_literal = "\"", {unicode | numerical_codepoint}, "\""
string_literal = "/", {unicode}, "/";
(* NOTE: Opening and closing sequences must be equal! *)
heredoc =
"<<-",
(
"(", seq(en | underscore), ")",
{literal_suffix}
) | seq(en | underscore),
nl,
seq(unicode),
nl,
seq(en | underscore);
magic_literal_values =
[
literal_value,
{" ", literal_value}
];
magic_tuple_literal =
"(", magic_literal_values, ")";
magic_array_literal =
"[", magic_literal_values, "]";
magic_vector_literal =
"<", magic_literal_values, ">";
magic_tensor_literal =
"|", magic_literal_values, "|",
[tensor_literal_dimension_appendix];
(* If literal suffixes are missing, the container
type is inferred from its values. *)
magic_literal =
"%",
{literal_suffix}-,
(
magic_tuple_literal |
magic_array_literal |
magic_vector_literal |
magic_tensor_literal
);
Appendix B: Optimization
Informative.
This section acts as a single source of expectations on when it is possible to rely on a compiler for certain optimizations rather than write optimized code manually. This allows a user to write simpler code but be sure on that this code would be optimized.
Appendix C: NXAPI
An Onyx compiler is able to generate documentation for exported functions only in form of C headers. C standard for header files is applied in such cases.
For Onyx language APIs, the current standard is applied, named NXAPI.
The NXAPI standard defines a schema to document entities declared in an Onyx program. The standard also provides standards for common interchange formats, including binary, JSON, DSON, YAML, XML and TOML.
Herein "API" stays for Onyx language API if not mentioned another.
API documentation is intended to provide the exact information as in the source file. Due to the inferring freedom of the Onyx language, it is often impossible to predetermine the conditions an API is going to be used in.
For example, a Foo
type mentioned from within an API function arguments list could possibly mean another type if the namespace containing the functions gets an type also named Foo
in a program-consumer of the API.
However, full path like ::Foo
is unambiguous and is considered a good practice.
Similarly, while it is possible to infer a function return type in the scope of the API, it (the return type) may turn out to be different when the API is used in a consumer program. Therefore, library API authors are encouraged to provide concrete, unambiguous return types to their functions whenever possible.
In fact, Onyx compilers are expected to have some kind of a strict compilation mode, which would require concrete typing everywhere. {paper-noindex}
NXAPI schema includes documentation for namespaces, types, functions, annotations and macros.
C.1. Schema
NXAPI schema definition describes a set of available NXAPI nodes. {paper-excerpt=25,-1}
<div class="toc" paper-noindex>
{{ paper-toc }}
</div>
C.2. Structure
Each node in the schema definition has a kind
value.
This value is used to determine the kind of a node in an NXAPI document.
In the schema definition, a node also has a set of fields
, which defines the set of properties for the node.
The schema definition uses the DSON format for simplicity.
The implicit root object of a NXAPI document is the anonymous top-level namespace, therefore an NXAPI document is an array of the top-level namespace declarations.
Consider the following example: {paper-noindex}
# That is an actual NXAPI document
[
{
kind = namespace, # Kind of this node is a namespace
path = "::Foo::Bar", # A `path` field with some value
declarations = [] # A `declaration` field with some values
}
]
C.3. Nodes
List of NXAPI nodes. {paper-excerpt=0,-1 paper-noindex}
C.3.1. Special nodes
There are some special nodes which can not be reasonably defined in the schema. {paper-excerpt=16,65 paper-noindex}
A decl
node is one of:
-
[
namespace_decl
](#nxapi-namespace_decl){paper-link}; or -
[
struct_decl
](#nxapi-struct_decl){paper-link}; or -
[
var_decl
](#nxapi-var_decl){paper-link}; or -
[
function_decl
](#nxapi-function_decl){paper-link}; or -
[
annotation_decl
](#nxapi-annotation_decl){paper-link}; or -
[
macro_decl
](#nxapi-macro_decl){paper-link}.
An expr
node…​
C.3.2. delayed_macro_expr
A delayed macro expression node could be used instead of any other node. {paper-excerpt=31}
{
kind = delayed_macro_expr,
fields = [{
name = expr,
type = string,
required = true
}]
}
C.3.3. namespace_decl
A namespace declaration node contains its name and declarations. {paper-excerpt=28}
A namespace declaration can not be virtual.
Namespace and type declarations must have fully resolved paths, e.g. ::Foo::Bar
. {#nxapi-namespace-full-path}
{
kind = namespace_decl,
fields = [{
name = doc,
type = string,
required = false
}, {
name = path,
type = string,
required = true
},{
name = declarations,
type = array<decl>,
required = true # May be empty
}]
}
C.3.4. var_decl
A variable declaration node may be virtual. {paper-excerpt=28}
{
kind = var_decl,
fields = [{
name = doc,
type = string,
required = false
}, {
name = virtual,
type = bool,
required = true
}, {
name = accessibility,
type = enum<let, get, set, final>,
required = true
}, {
name = storage,
type = enum<static, instance>,
required = undefined # Only applicable to struct variables
}, {
name = visibility,
type = enum<public, protected, private>,
required = false # Undefined if missing
}, {
name = name,
type = string,
required = true
}, {
name = restriction,
type = type_ref,
required = false
}, {
name = value,
type = expr,
required = false
}, {
name = annotations,
type = array<annotation_call>,
required = true # May be empty
}]
}
C.3.5. function_decl
A function declaration node may be virtual. {paper-excerpt=27}
{
kind = function_decl,
fields = [{
name = doc,
type = string,
required = false,
}, {
name = virtual,
type = bool,
required = true,
}, {
name = implementation,
type = enum<decl, impl, def, redef>,
required = true,
}, {
name = storage,
type = enum<static, instance>,
required = undefined # Only applicable to trait,
# struct and enum functions
}, {
name = mutability,
type = enum<mut, const>,
required = false # Undefined if missing
}, {
name = visibility,
type = enum<public, protected, private>,
required = false # Undefined if missing
}, {
name = name,
type = string,
required = true
}, {
name = arguments,
type = array<arg_decl>,
required = true # May be empty
}, {
name = return,
type = type_ref,
required = false # Undefined if missing
}, {
name = annotations,
type = array<annotation_call>,
required = true # May be empty
}, {
name = foralls,
# A function declaration may include multiple foralls, e.g.
# `def foo(arg : T) forall T : List<U> forall U : Numeric`
type = array<array<forall_element>>,
required = true # May be empty
}, {
# An optional `where` macro expression,
# e.g. `def foo() where \{{ expr }}`
#
name = where,
type = delayed_macro_expr,
required = false
}]
}
C.3.6. forall_element
A forall
element declaration, e.g. forall T : U
. {paper-noindex}
{
kind = forall_element,
fields = [{
name = name,
type = string,
required = true
}, {
name = restriction,
type = type_ref,
required = false
}]
}
C.3.7. arg_decl
An argument declaration node. {paper-excerpt=0,-1 paper-noindex}
The semantics is shared for both function and generic type argument declarations, hence a single node. {#arg-shared-semantics}
TODO: Move these to main spec, and just reference from here.
Generic type arguments are always type-restricted, unlike function arguments which may be either instance- or type-restricted:
struct Int<Bitsize: S :: %unum>
# `another` is instance-restricted.
def add(another : T) forall T :: Int;
# `target` is type-restricted.
def to(target: :: T) forall T :: Int;
end
{
kind = arg_decl,
fields = [{
name = doc,
type = string,
required = false
}, {
name = annotations,
type = array<annotation_call>,
required = true # May be empty
}, {
name = name,
type = string,
required = undefined # Must be absent if it is a
# `type_restricted` function argument
}, {
name = alias,
type = string,
required = false
}, {
# Determine whether is it a type-restricted argument,
# e.g. `foo :: T`. A type-restricted argument must not have its
# `restriction` field left empty and if it is a function argument,
# it also can not have `name` field set.
name = type_restricted,
type = bool,
required = true,
}, {
name = restriction,
type = type_ref,
required = undefined # Required only if it is
# a type-restricted argument
}, {
name = default_value,
type = expr,
required = false
}]
}
C.3.8. arg_pass
An argument node passed to a callee. {paper-excerpt=0,-1 paper-noindex}
Due to [the shared argument semantics](#arg-shared-semantics){paper-link} argument passing to a callee is defined as a single node.
{
kind = arg_pass,
fields = [{
# An explicit name of an argument, e.g. `call(foo: bar)`
name = explicit_name,
type = string,
required = undefined # Incompatible with `explicit_order`
}, {
# An explicit order of an argument, e.g. `call([0]: bar)`
name = explicit_order,
type = number,
required = undefined # Incompatible with `explicit_name`
}, {
name = value,
type = expr,
required = true
}]
}
C.3.9. type_expr
A valid type expression node. {paper-excerpt paper-noindex}
{
kind = type_expr,
fields = [{
name = expr,
type = array<variant<type_expr_op, type_ref>>,
required = true
}]
}
C.3.10. type_expr_op
A type expression operator node. {paper-excerpt paper-noindex}
{
kind = type_expr_op,
fields = [{
name = op,
# One of `(`, `)`, `&`, `|`, `^` or `!`
type = enum<paren_open, paren_close, and, or, xor, not>,
required = true
}]
}
C.3.11. type_ref
A reference to a type node. {paper-excerpt paper-noindex}
A type reference node path
field may be unresolved.
{
kind = type_ref,
fields = [{
name = path,
type = string,
required = true
}, {
name = generic_arguments,
type = array<arg_pass>,
required = true # May be empty
}]
}
C.3.13. struct_decl
A struct declaration node name
field [must be a fully resolved path](#nxapi-namespace-full-path){paper-link}. {paper-excerpt="0,26,."}
{
kind = struct_decl,
fields = [{
name = doc,
type = string,
required = false
}, {
name = virtual,
type = bool,
required = true
}, {
name = path,
type = string,
required = true
}, {
name = mutability,
type = enum<mut, const>,
required = false # Undefined if missing
}, {
name = visibility,
type = enum<public, private>,
required = false # Undefined if missing
}, {
name = annotations,
type = array<annotation_call>,
required = true # May be empty
}, {
name = declarations,
type = array<decl>,
required = true # May be empty
}]
}
C.3.14. enum_decl
A enum declaration node. {paper-excerpt paper-noindex}
{
kind = enum_decl,
fields = [{
name = statement,
type = enum<virtual, reopen, declare>,
required = true
}, {
name = is_flag,
type = bool,
required = true
}, {
name = doc,
type = string,
required = false
}, {
name = type,
type = type_ref,
required = undefined # Required if `is_flag`,
# otherwise `SInt32` if missing
}, {
name = values,
type = array<enum_val_decl>,
required = true # May be empty for virtual enums
}, {
name = declarations,
type = array<decl>,
required = true # May be empty
}]
}
C.3.15. enum_val_decl
A enum value declaration node. {paper-excerpt paper-noindex}
{
kind = enum_val_decl,
fields = [{
name = doc,
type = string,
required = false
}, {
name = name,
type = string,
required = true
}, {
name = value,
type = expr,
required = false
}]
}
C.3.16. annotation_decl
An annotation declaration node. {paper-noindex}
{
kind = annotation_decl,
fields = [{
name = is_virtual,
type = bool,
required = true
}, {
name = doc,
type = string,
required = false
}, {
name = path, # A fully resolved path
type = string,
required = true
}, {
name = arguments,
type = array<arg_decl>,
required = true # May be empty
}]
}
C.3.17. annotation_call
An annotation call can not have any documentation.
{
kind = annotation_call,
fields = [{
name = annotation,
type = type_ref,
required = true
}, {
name = arguments,
type = array<arg_pass>,
required = true # May be empty
}]
}
C.3.18. intrinsic_decl
An intrinsic (i.e. macro
) declaration node. {paper-excerpt paper-noindex}
{
kind = intrinsic_decl,
fields = [{
name = doc,
type = string,
required = false
}, {
name = annotations,
type = array<annotation_call>,
required = true # May be empty
}, {
name = is_virtual,
type = bool,
required = true
}, {
name = visibility,
type = enum<public, private>,
required = true
}, {
name = name,
type = string, # Without `@`
required = true
}, {
name = arguments,
type = array<argument_decl>,
required = true # May be empty
}]
}
C.3.19. delayed_intrinsic_call
A delayed intrinsic call node may replace any non-declaration node.
{
kind = delayed_intrinsic_call,
fields = [{
name = content,
type = string,
required = true
}]
}
C.4. Examples
require "io" from "std"
# This is the main function.
@[Entry]
def main
@cout << "Hello, world!"
end
[{
kind = function,
implementation = def,
name = main,
documentation = "This is the main function."
annotations = [{
kind = annotation,
type = Entry
}]
}]
Appendix D: Core API
trait Core::Aggregate;
require "./aggregate.nx"
decl primitive Core::Array<Type: T, Size ~ %z>
derive Core::Aggregate;
# Return a reference to an element at *index*.
decl def [](index: Size) : T&ir(w: \mut?) throws Core::SizeError
end
require "./scalar.nx"
# Operations on atomic types may be synchronized, depending on
# provided memory ordering constraints and fences.
# Atomic operations are still `fragile` and require extra attention
# from the developer to become truly `threadsafe`.
#
# The memory layout of an atomic type is undefined.
#
# Theoretically, any `Numeric` type can be wrapped in `Atomic`.
# In practice, only `Int`, `Float`, `Fixed` and their
# `Imaginary` counterparts are expected to so.
#
# `Int` and `Float` variants are expected to be interchangeable
# with their `_Atomic` counterparts, where appropriate
# (see `as` method declarations in according reopenings).
decl primitive Core::Atomic<Type: T ~ (Numeric)>
derive Scalar;
# Non-atomically retreive the underlying value.
# This method is allowed to be called on a `final` variable.
decl raw_get : T
# Non-atomically update the underlying value.
# Does not return anything to avoid confusion with `swap`.
decl raw_set(val : T) : Void
# Atomically fetch the underlying value.
# This method is allowed to be called on a `final` variable.
#
# ```
# final a = Core::Atomic(42)
# Core.assert(a == a.fetch == 42)
# ```
decl fetch(
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
alias == to fetch
# Atomically store the underlying value.
# Does not return anything.
# To store and also get the old value, use `swap` instead.
#
# ```
# let a = Core::Atomic(42)
# a.store(43)
# Core.assert(a == 42)
# ```
#
# TODO: Do we really need this method?
decl store(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : Void forall SS : @~string
# Atomically swap the underlying value.
# Returns the old value.
#
# ```
# let a = Core::Atomic(42)
# Core.assert(a = 43 == 42)
# ```
decl swap(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
alias = to swap
# TODO:
decl compare_and_exchange(
old: T,
new: T,
success_ordering: Memory::Ordering = :seqcst,
failure_ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : Bool forall SS : @~string
alias cmpxchg to compare_and_exchange
# Atomically add to the underlying value.
# Returns the old value.
#
# Wraps if called on a `Rational`.
#
# ```
# let a = Core::Atomic(127i8)
# Core.assert(a += 1 == 127)
# Core.assert(a == -128)
# ```
decl add(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
alias += to add
# Atomically subtract from the underlying value.
# Returns the old value.
#
# Wraps if called on a `Rational`.
#
# ```
# let a = Core::Atomic(-128i8)
# Core.assert(a -= 1 == -128)
# Core.assert(a == 127)
# ```
decl subtract(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
alias sub, -= to subtract
end
reopen Core::Atomic<Int>
# Atomically apply AND operation to the underlying value.
# Returns the old value.
#
# ```
# let a = Core::Atomic(0b1001_0110)
# Core.assert(a &= 0b0011_0011 == 0b1001_0110)
# Core.assert(a == 0b0001_0010)
# ```
decl and(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
alias &= to and
# Atomically apply NAND operation to the underlying value.
# Returns the old value.
#
# ```
# let a = Core::Atomic(0b1001_0110)
# Core.assert(a.nand(0b0011_0011) == 0b1001_0110)
# Core.assert(a == 0b1110_1101)
# ```
decl nand(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
# Atomically apply OR operation to the underlying value.
# Returns the old value.
#
# ```
# let a = Core::Atomic(0b1001_0110)
# Core.assert(a |= 0b0011_0011 == 0b1001_0110)
# Core.assert(a == 0b1011_0111)
# ```
decl or(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
alias |= to or
# Atomically apply XOR operation to the underlying value.
# Returns the old value.
#
# ```
# let a = Core::Atomic(0b1001_0110)
# Core.assert(a ^= 0b0011_0011 == 0b1001_0110)
# Core.assert(a == 0b0100_1010)
# ```
decl xor(
val: T,
ordering: Memory::Ordering = :seqcst,
syncscope: SS = ""
) : T forall SS : @~string
alias ^= to xor
# Convert to an `_Atomic` integer counterpart.
#
# MUST be implemented if the target supports `_Atomic` types,
# i.e. `$__STDC_NO_ATOMICS__` is not defined.
#
# Non-fixed width integer types (e.g. `$int`) MUST be supported.
# Fixed (from `stdint.h`) width integer types SHOULD be supported.
#
# Signednesses of the types must match. A `final` atomic type
# can not be converted to a non-const type.
#
# It is not possible to directly convert to a lesser-bitsize type.
# Target information contained in the Macro API may be used
# to aid in implementing proper conversion logic.
#
# ```
# final a = Core::Atomic(42) : Core::Atomic<$int>
# a as $`const _Atomic int` # OK
# # a as $`_Atomic int` # Panic! Constantness mismatch
# # a as $`const _Atomic unsigned int` # Panic! Signedness mismatch
# a as $`const _Atomic int32_t` # May work, depending on target
#
# let b = Core::Atomic(42u16)
# a as $`const _Atomic unsigned short` # OK, turns into const
# ```
@[MayPanic("If target bitsize is less than the type's")]
@[MayPanic("On signednesses mismatch")]
@[MayPanic("On constantnesses mismatch")]
virtual decl as(:: T) forall T
end
reopen Core::Atomic<Float>
# Convert to an `_Atomic` C floating-point type.
#
# MUST be implemented if the target supports `_Atomic` types,
# i.e. `$__STDC_NO_ATOMICS__` is not defined.
#
# The target memory layout is defined by the
# `nx.c.const.[type].layout` macro constraint.
#
# Direct conversion is only possible to the same
# or greater bitsize floating point type, defined by the
# `nx.c.const.[type].bitsize` constraint.
# Otherwise, the `as!` method should be used.
#
# ```
# let a = Core::Atomic(42f64) : Core::Atomic<FBin64>
#
# # Is likely to succeed, unless `double`
# # has bitsize greater than 64
# a as $double # OK
#
# # as as $float # Most likely to fail
# ```
@[MayPanic("On a possibility of precision loss")]
virtual decl as(:: T) forall T
# Convert to an `_Atomic` C floatin-point type,
# allowing possible precision loss.
#
# ```
# let a = Core::Atomic(42f64) : Core::Atomic<FBin64>
# a as! $float # OK, but lesser precision
# ```
virtual decl as!(:: T) forall T
end
require "./numeric.nx"
require "./hypercomplex.nx"
require "./imaginary.nx"
# The memory layout of a complex number matches such of an array
# of two numbers of the type `T`, whereas the first element
# is the real part, and the second element is the imaginary part.
decl primitive Core::Complex<Type: T ~ Real>
# A complex number is a specialization of a hypercomplex number.
derive Hypercomplex<T>;
# A built-in constructor for a complex number.
#
# ```
# using Core
#
# assert(Complex(1, 2j) : Complex<$int> == Complex(1, .(2)))
# assert(Complex<$double>() == Complex(0.0, .(0))
# ```
static nothrow decl ()(
real : T = 0,
imaginary : Imaginary<T> = .(0))
# Return a reference to the real part of the complex number.
#
# NOTE: Would return a read-only reference if the number is final.
decl nothrow real() : T&irw
# Return a reference to the imaginary part of the complex number.
#
# NOTE: Would return a read-only reference if the number is final.
decl nothrow imaginary() : Imaginary<T>&irw
# Convert the complex number to the same
# or higher-bits representation.
# Would call `as!` on its parts recusively.
@[MayPanic("If data loss is possible")]
@[MayPanic("In case of underlying type mismatch")]
decl nothrow as(~ Complex)
# Convert the complex number to another (including lower-)
# bits representaion, allowing data loss.
# Would call `as!` on its parts recusively.
@[MayPanic("In case of underlying type mismatch")]
decl nothrow as!(~ Complex)
{% for t, _ in ipairs{"float", "double", "long double"} do %}
# Convert to the same or higher-bits complex floating C type.
#
# NOTE: This method is only declared if `T ~ Float`.
@[MayPanic("If data loss is possible")]
virtual decl nothrow as(: $`{{ t }} _Complex`)
# Convert to another (including lower-) bits
# complex floating C type, allowing data loss.
#
# NOTE: This method is only declared if `T ~ Float`.
virtual decl nothrow as!(: $`{{ t }} _Complex`)
{% end %}
end
struct Core::SizeError;
require "./fractional.nx"
# A fixed-point number.
decl primitive Fixed<
Base: 2 || 10,
# Size of the integral part in `Base`.
Integral: ~%Natural = 0,
# Size of the fractional part in `Base`.
# May be negative for right-shift.
Fractional: ~%Integer = 0
>
end
alias XBin<*> = Fixed<2, *>
alias XDec<*> = Fixed<10, *>
require "./fractional.nx"
# An IEEE-754 floating-point number.
decl primitive Float<
Base: ~%Natural,
Precision: ~%Natural,
EMax: ~%Natural>
derive Fractional;
end
alias FBin<
SignificandBits: ~%Natural,
ExponentBits: ~%Natural
> = Float<
2,
Precision: SignificandBits,
EMax: \{{ nx.util.natural(2 ^ (nx.ctx.ExponentBits - 1) - 1) }}
>
alias FBin16 = FBin<11, 5> # Not basic, but often implemented
alias FBin32 = FBin<24, 8>
alias FBin64 = FBin<53, 11>
alias FBin128 = FBin<113, 15>
alias FDec<
Digits: ~%Natural,
ExponentBits: ~%Natural
> = Float<
10,
Precision: Digits,
EMax: \{{ nx.util.natural(3 * 2 ^ (nx.ctx.ExponentBits - 2) / 2) }}
>
alias FDec32 = FDec<7, 8> # Not basic, but often implemented
alias FDec64 = FDec<16, 10>
alias FDec128 = FDec<34, 14>
# It is only implemented for targets supporting the extended
# double-precision floating-point number format natively.
decl type FBin64E;
# The [Brain floating-point number format][1]
# is ubiquitous in AI computing.
#
# [1]: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format.
alias BFloat16 = FBin<8, 8>
require "./real.nx"
# A fractional, i.e. non-whole, rational number.
trait Fractional
derive Rational;
end
require "./numeric.nx"
trait Core::Hypercomplex<Type: T ~ Real>
derive Numeric;
end
require "./real.nx"
# ```
# final i = 5j
# assert(i :? Imaginary<SInt32>)
# assert(i ** 2 == -25)
# ```
@[LiteralSuffix(/j/)]
decl primitive Imaginary<Type: T ~ Real>
derive Numeric;
decl initialize(~ T)
# # Cast a `Real` literal into an `Imaginary` literal.
# @[LiteralSuffix(/j/)]
# static decl (~ %(Real)) : %(Imaginary)
# Coerce to the same or higher bits.
# Would call `as` on the underlying value.
#
# ```
# 1uj.as_u64j : Imaginary<UInt64>
# # 1uj.as_u16j # Panic!
#
# 1ij.as_i32 : SInt32
# # 1ij.as_i16 # Panic!
# ```
@[MayPanic("If data loss is possible")]
decl nothrow as(~ (\Imaginary | \Real))
# Coerce to other (including lower) bits, allowing data loss.
# Would call `as!` on the underlying value.
#
# ```
# 1uj.as_u16j! : Imaginary<UInt16>
# 1j.as_i16! : SInt16
# ```
decl nothrow as!(~ (\Imaginary | \Real))
{% for t, _ in ipairs{"float", "double", "long double"} do %}
# Convert to the same or higher-bits complex floating C type.
@[MayPanic("If data loss is possible")]
decl nothrow as(: $`{{ t }} _Complex`)
# Convert to another (including lower-) bits
# complex floating C type, allowing data loss.
decl nothrow as!(: $`{{ t }} _Complex`)
{% end %}
end
require "./integer.nx"
# You can go from `Natural` to `Integer`, but not vice versa
%!Rational, %!ℚ # -0.5, 1/3, ∞, ∅
%!Integer, %!ℤ # -1, 0, 1
%!Natural, %!ℕ # 0, 1
%!Boolean, %!𝔹 # true, false
%!Range<T> # 0..0.5
%!Tensor<T> # |[0, 0.5]|
%!Vector<T> # <0, 0.5> # Vector can be treated as `Tensor`
%!Character # 'f'
%!String # "f"
%!Symbol # :f
# A binary 2's complement integer.
decl primitive Int<Signed ~ %Bool, Bitsize ~ %Size>
derive Integer;
decl initialize(value : self)
end
alias SInt<*> = Int<true, *>
alias UInt<*> = Int<false, *>
alias SInt8 = SInt(8)
alias SInt16 = SInt(16)
alias SInt32 = SInt(32)
alias SInt64 = SInt(64)
alias SInt128 = SInt(128)
alias UInt8 = UInt(8)
alias UInt16 = UInt(16)
alias UInt32 = UInt(32)
alias UInt64 = UInt(64)
alias UInt128 = UInt(128)
alias Bit, Bool = UInt(1)
alias Byte = UInt8
# # The following aliases are implementation-defined.
# # They must be interchangeable with according C types.
# #
# # Every valid character is valid integer, but not vice versa.
# # Therefore, character types are distinct aliases.
# #
# decl distinct alias UChar # Usually `UInt8`
# decl distinct alias SChar # Usually `SInt8`
# decl distinct alias Char # `$char` (`SChar` or `UChar`)
# decl distinct alias WChar # `$wchar_t` (C17§7.19.2)
# virtual alias SShort # Usually `SInt16`
# virtual alias SInt # Usually `SInt16` or `SInt32`
# virtual alias SLong # Usually `SInt32`
# virtual alias SLongLong # Usually `SInt32` or `SInt64`
# virtual alias UShort # Usually `UInt16`
# virtual alias UInt # Usually `UInt16` or `UInt32`
# virtual alias ULong # Usually `UInt32` or `UInt64`
# virtual alias ULongLong # Usually `UInt64`
# virtual alias Size, USize # `$size_t` (C17§7.19.2)
# virtual alias SSize # Signed version of `Size`
# # The `Fast` family aliases to the implementation-defined
# # fastest types with at least provided bitsize (C17§7.20.1.3).
# # They must be interchangeable with according C types,
# # for example `UFast16` and `$uint_fast16_t`.
# virtual alias SFast8
# virtual alias SFast16
# virtual alias SFast32
# virtual alias SFast64
# virtual alias UFast8
# virtual alias UFast16
# virtual alias UFast32
# virtual alias UFast64
# # The `Least` family aliases to the implementation-defined
# # smallest types with at least provided bitsize (C17§7.20.1.2).
# # They must be interchangeable with according C types,
# # for example `ULeast16` and `$uint_least16_t`.
# virtual alias SLeast8
# virtual alias SLeast16
# virtual alias SLeast32
# virtual alias SLeast64
# virtual alias ULeast8
# virtual alias ULeast16
# virtual alias ULeast32
# virtual alias ULeast64
# # Fixed-width character types
# # must be interchangeable with
# # according C types (C17§7.28).
# #
# decl distinct alias Char16 to ULeast16; # `$char16_t`
# decl distinct alias Char32 to ULeast32; # `$char32_t`
require "./real.nx"
# A whole number.
trait Integer
derive Real;
end
require "./aggregate.nx"
decl primitive Core::Array<Type: T, Size ~ %z>
derive Core::Aggregate;
# Return a reference to an element at *index*.
decl def [](index: Size) : T&ir(w: \mut?) throws Core::SizeError
end
require "./tensor.nx"
# A matrix is a 2-dimensional specialization of `Tensor`.
#
# Matrices have literals:
#
# ```
# let m1 = |[1, 2],
# [3, 4]|r ~ (\%n)x2x2r : (SInt32)x2x2r
#
# let m2 = %i|[1 2]
# [3 4]|l1 : (SInt32)x2x2c
# ```
alias Core::Matrix<
Type: T,
Rows: R,
Columns: C,
Leading: L
> to Tensor<T, R, C, L>
require "./scalar.nx"
trait Core::Numeric
derive Scalar;
decl equals?(~ Numeric) : Bool
alias eq?, == to equals?
decl add(~ Numeric)
alias + to add
decl subtract(~ Numeric)
alias sub, - to subtract
decl multiply(~ Numeric)
alias mul, * to subtract
decl divide(~ Numeric)
alias div, / to divide
# Return the modulo operation result, where the result
# is either zero or has the same sign as the argument.
decl modulo(~ Numeric)
alias mod, % to modulo
# Return the remainder from the division, where the result
# is either zero or has the same sign as the callee.
decl remainder(~ Numeric)
alias rem to remainder
decl power(~ Numeric)
alias pow, ** to power
end
# > The quotient of two directed lines in a three-dimensional space.
# > -- <cite>William Rowan Hamilton</cite>
decl primitive Core::Quaternion<Type: T ~ Real>
derive Hypercomplex<T>;
end
namespace Quantum
primitive Bit
# Set the qubit to a superposition,
# i.e. apply the Hadamard gate.
#
# ```
# let q = QBit()
# q.unset()
# q.get() # Returns either 0 or 1
# ```
decl mut unset()
# Flip the qubit state,
# i.e. apply the Pauli-X gate.
#
# ```
# let q = QBit()
# q = 1
# assert(~q == 0)
# ```
decl mut flip()
alias ~ to flip
# Set the qubit state.
#
# ```
# let q = QBit()
# q = 1
# assert(q == 1)
# ```
#
# NOTE: Change the underlying variable value with `q := QBit()`.
decl mut set(state : ::Bit)
alias = to set
# Measure the qubit value.
# Returns a binary bit: either 0 or 1.
#
# #!see(set)
decl get() : ::Bit
# Measure the qubit value and compare it with another qubit's.
decl equals?(qubit : self) : Bool
# Measure the qubit value and compare it with another value.
decl equals?(value : ::Bit) : Bool
alias eq?, == to equals?
end
end
alias QBit, Qubit to Quantum::Bit
require "./scalar.nx"
# TODO: IEEE-1788.
# 0..3.each(1).to_a == [0, 1, 2, 3] # %R[0, 3], %Rf[0 3]
# 0...3.each(1).to_a == [0, 1, 2] # %R[0, 3), %Ri[0 3)
# 0....3.each(1).to_a == [1, 2] # %R(0, 3), %Ri32j(0 3)
# Range<SInt32, false, true>(0, 3) # %R(0, 3], %RQ8e-6(0 3]
decl primitive Range<
Type: T ~ Numeric,
BoundLower: ~ %Boolean = true,
BoundUpper: ~ %Boolean = true
>
derive Scalar;
decl ()(lower: T,upper: T) : self
decl .lower() : T
decl .upper() : T
decl lower=(self&w, T) : T
decl upper=(self&w, T) : T
decl .includes?(T) : Bool
alias ∋ to includes?
# Is self subset of another?
decl .sub?(self) : Bool
alias ⊆ to sub?
# Is self superset of another?
decl .super?(self) : Bool
alias ⊇ to super?
# Is self subset and not equals to another?
decl .strictsub?(self) : Bool
alias ⊊ to strictsub?
# Is self superset and not equals to another?
decl .strictsuper?(self) : Bool
alias ⊋ to strictsuper?
end
# let range = %R[1, 10]
# range.lower = 0
# trait Foo
# decl .foo()
# end
# struct Point
# let (.x, .y, .z) : FBin64
# let stat : SInt32 = 42
# def .double: self
# self(x * 2, y * 2)
# end
# end
# TODO: A reference (T&) does not neccessarily occupies any memory!
require "./fractional.nx"
# A rational number.
decl primitive Rational<Type: T ~ Integer>
derive Fractional;
# An arithmetic overflow.
struct Overflow;
end
alias Rat<Bitsize: B>, SRat<Bitsize: B> = Rational<SInt<B>>
alias URat<Bitsize: B> = Rational<UInt<B>>
alias Rat8, SRat8 = SRat(8)
alias Rat16, SRat16 = SRat(16)
alias Rat32, SRat32 = SRat(32)
alias Rat64, SRat64 = SRat(64)
alias URat8 = URat(8)
alias URat16 = URat(16)
alias URat32 = URat(32)
alias URat64 = URat(64)
alias Rat, SRat = Rational<SInt>
alias URat = Rational<UInt>
alias ShortRat, SShortRat = Rational<SShort>
alias IntRat, SIntRat = Rational<SInt>
alias LongRat, SLongRat = Rational<SLong>
alias LongLongRat, SLongLongRat = Rational<SLongLong>
alias UShortRat = Rational<UShort>
alias UIntRat = Rational<UInt>
alias ULongRat = Rational<ULong>
alias ULongLongRat = Rational<ULongLong>
alias FastRat<Bitsize: B>, SFastRat<Bitsize: B> = Rational<SFast<B>>
alias UFastRat<Bitsize: B> = Rational<UFast<B>>
alias FastRat8, SFastRat8 = FastRat(8)
alias FastRat16, SFastRat16 = FastRat(16)
alias FastRat32, SFastRat32 = FastRat(32)
alias FastRat64, SFastRat64 = FastRat(64)
alias UFastRat8 = UFastRat(8)
alias UFastRat16 = UFastRat(16)
alias UFastRat32 = UFastRat(32)
alias UFastRat64 = UFastRat(64)
require "./numeric.nx"
# A real number.
trait Real
derive Numeric;
end
trait Core::Scalar;
# Slice maintains a bit-list of defined elements.
#
# ```
# let s = mut Slice<SInt8, 5>()
#
# assert(@bitsizeof(s) == 45) # 5 * 8 + 5
# assert(@sizeof(s) == 6) # (45f / 8).ceil.to_z
# assert(s.size != s<Size>)
#
# assert(s.size == 0)
# # s[0] # Would throw, because no element is defined yet
# assert(s[0] = 42 == 42)
# assert(s.size == 1)
# assert(s.remove(0) == 42)
# # s[0] # Would throw again
# ```
decl primitive Slice<Type: T, Size: S : @{Size}>
# Would throw if element is not defined.
decl mut? get(index: Size): T&i(w: \mut?) throws SizeError
alias [] to get
# Define or rewrite an element at *index*.
decl mut set(index: Size, value: T): T&iw throws SizeError
alias []= to set
# Un-define an element at *index*.
# Further `get` calls would throw.
decl mut remove(index: Size): T throws SizeError
# Return an actual size.
decl size: Size
end
require "./aggregate.nx"
# A growing stack of values.
decl mut primitive Core::Stack<Type: T, Size: S : @~size>
derive Aggregate;
# NOTE: The returned reference is temporal because the stack
# may be shrinked, so the memory region becomes undefined.
decl push(value: T) : T&tw throws SizeError
decl pop() : T throws SizeError
decl pop?() : T?
decl const size : $size
# NOTE: The returned reference is temporal because the stack
# may be shrinked, so the memory region becomes undefined.
decl mut? [](index: Size) : T&t(w = \mut?) throws SizeError
end
decl primitive String<Encoding: E, Size: S : @{Size}>
# Access character at *index*. Depending on the encoding,
# the operation has different complexity.
# For example, `UTF8` has O(N), while `UCS2` has O(1).
decl [](index: Size) : Char<E>
end
trait Char::Encoding<
# A size in bits of a single code unit.
UnitSize: UZ : @{Size},
# How many code units are at most
# required to encode a code point.
PointSize: PZ : @{Size}
>
enum Error
val Encoding
end
# Re-encode *char* in this encoding.
#
# ```
# UCS2.encode('a'ascii_) == 'a'ucs2
# ```
decl encode(char: Char<E>): Char<this> throws Error forall E
# A literal initialize macro for this encoding.
#
# The macro is called implicitly for each char or string literal.
#
# The restriction argument is set in accordance to
# the autocasting rules. A macro implementation is expected
# to accept at least `Char<this>` and `Int` restrictions.
#
# By default, char and string literals have `UTF16` encoding.
# Otherwise, the encoding type is calculated by upcasing
# the literal prefix. A prefix can not contain underscores,
# thus an encoding type name should not as well.
#
# This macro is likely to be implemented natively for built-in
# encodings. Custom encodings, however, are expected to implement
# it in user-code if literal suffix availability is desired.
#
decl macro @(literal ~ @{Char}, restriction :: Char<this> || Int)
end
# Represents a single codepoint in given `Encoding`.
# A char's size in bits equals to the encoding's
# codeunit size times codepoint size.
#
# ```
# assert(@sizeof('f'ascii_) == 1)
# assert(@sizeof('f'utf8) == 4)
# assert(@sizeof('f') == 2) # UCS2 by default
# ```
primitive Char<Encoding: E ~ Encoding>
# Initialize from an explicit
# integral codepoint value.
#
# ```
# 'f' # Calls `.initialize(literal)`
# '\66' # Calls THIS initializer
# ```
decl initialize(codepoint~ Int);
# Initialize from a character literal.
#
# ```
# 'f' # Calls THIS initializer
# '\66' # Calls `.initialize(codepoint)`
# ```
#
# An example implementation:
#
# ```
# reopen Char<Windows::CP1251>
# @[Inline]
# impl initialize(literal ~~ L) forall L ~ @{Char}
# # Within macros, text values are normalized to Unicode.
# \{% if L.value == 'ф' then %}
# return self(0xF4)
# ```
decl initialize(literal~~ @{Char});
# ```
# final a = 'a'ascii_
#
# assert(a.to_utf16 == a to Char<UTF16> ==
# a.to(Char<UTF16>) == 'a'utf16)
# ```
decl to(:: C) : C forall C ~ Char
# Compare to a character with the same encoding.
decl equals?(: self<E>): Bool
alias eq?, == to equals?
end
%utf8["Привет!\0"] : Char<UTF8>[10] # 40 bytes = 320 bits
%<"Ditto"> : Char<UCS2>[5] # 10 bytes = 80 bits
"Привет"utf8u8 # ?
# String and char literals can have encoding prefix and kind suffixes.
"foo" : Array<Char<UTF16>, 3>
"foo"u8 : Array<UInt8, 3> # Only if allowed to do so
# "фу"u8 # Panic!
"фу"utf8u8 : Array<UInt8, 4> # UTF-8 code units take one byte
"фу"utf8 : Array<Char<UTF8>, 4>
"Hello world" : Array<Char<UTF16>, 11>
# Char<ASCII>[12]* can be cast to $char*
# Char<UTF8>[]* as well.
# Any char with encoding unit size == 1 can be?
unsafe! $puts(&"Hello world\0"ascii_) # Address of a passed rvalue
"Привет!\0"utf8
"Привет!\0" # 16 bytes (UCS-2)
module ASCII < Char::Encoding<8, 1>;
module UCS2 < Char::Encoding<16, 1>;
module UCS4 < Char::Encoding<32, 1>;
module UTF8 < Char::Encoding<8, 4>;
module UTF16 < Char::Encoding<16, 2>;
module UTF32 < Char::Encoding<32, 1>;
module UTF16LE < Char::Encoding<16, 2>;
module UTF32LE < Char::Encoding<32, 1>;
module UTF16BE < Char::Encoding<16, 2>;
module UTF32BE < Char::Encoding<32, 1>;
decl primitive Codeunit<Encoding>;
decl primitive Codepoint<Encoding>;
'f' : CPoint<UCS2> # 2 bytes
"f" : Array<CUnit<UCS2>, 1> # 2 bytes
'f'utf8 : CPoint<UTF8> # 4 bytes
"f"utf8 : Array<CUnit<UTF8>, 1> # 1 byte
'ф'.cunits : Stack<CPoint<UCS2>, 1>
'ф'.cunits.each => Std.print(&.to_u16) # 1 time
"ф".each => Std.print(&) # 1 time
"ф"utf8.each => Std.print(&) # 2 times
require "./aggregate.nx"
# A *D*-dimensional tensor of type *T*
# with *L* being the leading dimension.
#
# As an aggregate type, a tensor has optional mutability.
#
# Tensor types can be shortcut in the following form:
#
# ```
# let t = (SInt32)x2x3x4r
# ```
#
# Where `SInt32` is the type, `2`, `3` and `4` are dimensions,
# and `a` is from "aisle", which is shortcut for setting
# the third dimension leading. Explicit `lN` form can be used
# to specify leading dimension, beginning from 0.
# There are also `r` (`l0`) for "row" and `c` (`l1`) for "column".
decl primitive Tensor<
Type: T,
Dimensions: *D ~ %z,
Leading: L ~ %z
>
derive Aggregate;
# TODO: It's not a product, but something else.
# Nonetheless, it demonstrates passing index pairs:
# `t1.product(t2, 1 => 2, 2 => 4)`.
decl product(another: self, *index_pairs[D::Size]: %z => %z)
end
require "./fractional.nx"
# (Type III universal numbers)[1], also known as posits with quires.
#
# Currently, posits are rarely implemented in hardware, therefore
# they shall be used in cases when increased precision is preferable
# over performance loss.
#
# [1]: https://en.wikipedia.org/wiki/Universal_numbers_(data_format).
#
# Type III unums part, a posit.
# Used for floating-point numbers representation.
decl primitive Posit<Bitsize: Z ~ %Size, Exponent: E ~ %Size>
derive Fractional;
decl initialize(value : self)
decl initialize(literal ~ %!Rational)
# For floating-point numbers, precision loss is considered a norm.
# Therefore, there is no need in distinct `to_*!` methods family.
decl to([0]: \T): T forall T ~ Real
end
# The recommended posit sizes and corresponding
# exponent bits and quire sizes:
#
# Posit size (bits) | Number of exponent bits | Quire size (bits)
# --- | --- | ---
# 8 | 0 | 32
# 16 | 1 | 128
# 32 | 2 | 512
# 64 | 3 | 2048
#
# NOTE: 32-bit posit is expected to be sufficient to solve
# almost all classes of applications[citation needed].
alias Posit8<Exponent = 0> = Posit<8, Exponent>
alias Posit16<Exponent = 1> = Posit<16, Exponent>
alias Posit32<Exponent = 2> = Posit<32, Exponent>
alias Posit64<Exponent = 3> = Posit<64, Exponent>
# Type III unums part, a quire.
# Used for floating-point numbers computation.
#
# ```
# let q = Quire<128>()
# assert(q.add(10p, 20p).to_p == 30)
# ```
decl primitive Quire<Bitsize: B ~ %Size>
# Fused-multiply–add, i.e. `a * b + c`.
#
# ```
# let q = Quire<128>()
# assert(q.fmadd(10p, 2p, 5p).to_p == 25)
# ```
decl fmadd(multiplied: P1, factor: P2, added: P3): self
forall P1 ~ Posit, P2 ~ Posit, P3 ~ Posit
# Fused-dot-product add *a* to *b*.
#
# ```
# let q = Quire<128>()
# assert(q.add(10p, 20p).to_p == 30)
# ```
decl add([0]: P1, [0]: P2): self
forall P1 ~ Posit, P2 ~ Posit
# Fused-dot-product subtract *b* from *a*.
#
# ```
# let q = Quire<128>()
# assert(q.sub(30p, 20p).to_p == 10)
# ```
decl subtract(minued: P1, subtrahend: P2): self
forall P1 ~ Posit, P2 ~ Posit
alias sub = subtract
decl to([0]: \T): T forall T ~ Real
end
# TODO: Do we need variadic arguments in the core?
#
# ```
# export void simple_printf(const char* fmt, ...) {
# final varg = CVArg.init
#
# # We assume that the pointer is always valid
# fmt = unsafe! fmt as! $char*cr
#
# while (*fmt != '\0')
# if (*fmt == 'd')
# final i = unsafe! vargs.fetch($int)
# unsafe! $printf("%d\n", i)
# else if (*fmt == 'c')
# final c = unsafe! vargs.fetch($int)
# unsafe! printf("%c\n", c)
# else if (*fmt == 'f')
# final d = unsafe! vargs.fetch($double)
# printf("%f\n", d)
# end
#
# fmt += 1
# end
# }
#
# export int main() {
# unsafe! $simple_printf("dcff", 3, 'a', 1.999, 42.5)
# }
# ```
virtual type CVArg
# Initialize a `CVArg` instance.
# Must panic if called from not within an exported function.
# Can be called at most once from a function; panic othwerwise.
virtual static def init : CVArg
# Unsafely interpret the next variadic argument as *type*.
virtual def unsafe fetch(type: :: T) forall T
# Shallow-copy a `CVArg` instance.
virtual def copy : CVArg
end
# A variant is considered a scalar object rather than a container,
# thus it does not have mutability. "Mutating" methods such as `set`
# are only callable on non-final variant instances. Check methods
# such as `is?` are subject to standard local behavioural erasure technics.
decl primitive Variant<Types: *T>
# ```
# # Would panic if `var` is not local,
# # otherwise behavioural erasure is applied.
# (var : Variant<SInt32, FBin64> = SInt32(2)) += 3
# ```
decl .set(U) : this forall U in T
alias .= to .set
# Check if the actual variant value is of exact *type*.
#
# ```
# final v = @rand(42, "foo")
# if v.is?(SInt32) then (v : SInt32) += 1
# assert(v == 43 || v == "foo")
# ```
virtual def nothrow is?(type :: U) : Bool forall U
# Check if the actual variant value is of *type*,
# i.e. perform a fuzzy match.
#
# ```
# final v = @rand(42, "foo")
# if v.of?(Int) then (v : SInt32) += 1
# assert(v == 43 || v == "foo")
# ```
virtual def nothrow of?(type :: U) : Bool forall U
# Try interpreting the actual variant value as *type* exactly.
# If the value is not the given *type*, returns `Void`.
#
# ```
# final v = @rand(42, "foo")
# final i = v.as?(SInt32)
# if i then (i : SInt32) += 1
# assert(v == 43 || v == "foo")
# ```
virtual def nothrow as?(type :: U) : U? forall U
# Try interpreting the actual variant value as *type* exactly,
# throwing `AssertionError` in case of mismatch.
#
# ```
# final v = @rand > 0 ? 42 : "foo"
# v.as!(SInt32) += 1
# assert(v == 42)
# ```
virtual def as!(type :: U) : U throws AssertionError forall U
# Unsafely intepret actual variant value as *type*.
#
# PERFORMANCE: It is faster due to not checking the switch value.
#
# ```
# final v = @rand > 0 ? 42 : "foo"
# unsafe! v.as!!(SInt32) += 1
# assert(v == 42)
# ```
virtual unsafe def nothrow as!!(type :: U) : U
# Check if the actual variant value is `Void`.
#
# ```
# final v = @rand(42, Void) : Variant<SInt32, Void> : SInt32?
# if not v.void? then (v : SInt32) += 1
# assert(v == 43 || v.void?)
# ```
virtual def nothrow void? : Bool
# Ensure that the actual variant value
# is not `Void`, throwing otherwise.
#
# ```
# final v = @rand > 0 ? 42 : Void
# v.novoid! += 1
# assert(v == 43)
# ```
virtual def novoid! : Undef throws AssertionError
# Check if the actual variant value equals to *value*.
# It delegates `==` call to underlying types, which may throw.
#
# ```
# final v = @rand(42, "foo")
# assert(v == 42 or v == "foo")
# if v == 42 then assert(v :? SInt32)
# ```
virtual def maythrow equals?(value : U) : Bool forall U
alias eq?, == to equals?
# Set the actual variant value to a new *value*.
# The old value is finalized.
# Copy-returns the **new** value if the call has a receiver.
# Acts similarly to (simple) assignment (`=` or `<-`).
# Setting is not applicable to final variant instances.
#
# ```
# let v := @rand(42, "foo")
# assert((v = 43) == v.set(43))
# ```
virtual def set(value : U) : (\recv? ? U : Void)
# Replace the actual variant value with a new *value*.
# Move-returns the **old** value if the call has a receiver.
# Otherwise, finalizes the old value.
# Acts similarly to replace-assignment (`<<=` or `<<-`).
# Replacing is not applicable to final variant instances.
#
# ```
# let v := @rand(42, "foo")
# final old := v <<= 43 # Or `v.replace(43)`, `v.replace(<- 43)`
# assert(old == 42 || old == "foo")
# ```
virtual def replace(value : U) : (\recv? ? (<-U) : Void)
# Swap the actual variant value with target's,
# updating the switches accordingly.
# Acts similarly to swap-assignment (`<=>`).
# Swapping is not applicable to final variant instances.
#
# ```
# let v1 := @rand(42, "foo")
# let v2 := @rand(43, "bar")
# final new := v1 <=> v2 # Or `v1.swap(v2)`, `v1.swap(<- v2)`
# assert(new == 43 || new == "bar")
# ```
virtual def swap(target : self*crw) : (\recv? ? U : Void)
end
require "./tensor.nx"
# A vector is a 1-D specialization of `Tensor`.
#
# ```
# let v : (SInt32)x4 = <1, 2, 3, 4>
# let v = %i<1 2 3 4>
# ```
alias Core::Vector<Type: T, Size: S> to Tensor<T, S, 0>
end
Appendix E: Macros API
-- This file defines the Onyx Macro API.
--
-- The following identifiers: `string`, `nil`, `boolean`, `number` and
-- `table`; represent fundamental Lua types. Additional identifiers
-- `int` and `uint` are used to represent whole numbers. Logical
-- operators may be used to express variativity of possible values.
-- Concrete literals may be used to represent exact possible values.
-- Arrays are represented using the `{ type }` notation,
-- e.g. `{ string }`.
--
-- For example, `{foo = { string } or 42 or nil}` means that the `foo`
-- entry may contain either an array of strings, exact number 42, or nil.
--
-- Sum of tables represents an extensions. For example,
-- `foo = bar + { baz = 42 }` means that the `foo` table
-- includes the contents of the `bar` table.
--
-- Special strings beginning from `@` are local to this documentation
-- and should not be implemented. Purposes of each special string are
-- documentented individually.
-- The global Onyx table accessible from any Onyx source code
-- file without any thread-safety considerations.
nx = {
-- Access current compilation context; the contents may be
-- sensitive to the position within a source file.
ctx = {
-- Access current function implementation, if any.
impl = nx.Node.FunctionDef or nil,
-- Access current type (which may
-- also be the top-level namespace).
type = nx.Node['@Type'],
-- Access information about the source code
-- file currently being compiled.
file = {
path = string
},
-- A function performing Onyx path lookup in accordance to
-- the regular lookup rules, from the current context.
lookup = function (path)
-- Returns `nil` if lookup failed
return nx.Node or nil
end,
},
-- Access current compilation target information.
target = {
-- The target Instruction Set Architecture.
-- It may contain additional fields
-- defined in the Targets Appendix.
isa = {
id = string,
},
-- A list of deviant target Instruction Set Extensions,
-- which may be empty. The list of possible ISEs
-- is defined in the Targets Appendix.
ise = { string },
-- An exact target Processing Unit, which may be unknown.
pu = string or nil,
-- The target Operating System information,
-- which may be empty. It may contain additional
-- fields defined in the Targets Appendix.
os = {
id = { string },
},
-- A list of target Application Binary Interfaces,
-- which may be empty. An ABI may have additional
-- fields defined in the Targets Appendix.
abi = { { id = string } }
},
-- C interoperability constraints and utilities. Note that these
-- do not neccessarily align with the target's capatibilites.
--
-- The table defines some C constraints which are barely express-able
-- by the C standard, e.g. signedness representation of integers.
-- Additionally, this information is useful in cases when a program
-- does not have access to target-specific _hosted_ C headers (§4.6).
--
-- Note that it should always be assumed that a program has access
-- to the target-specific _freestanding_ C headers (§4.6):
-- float.h, iso646.h, limits.h, stdalign.h, stdarg.h, stdbool.h,
-- stddef.h, stdint.h and stdnoreturn.h.
c = {
-- A function performing C path lookup.
lookup = function (path)
-- Returns `nil` if lookup failed
return nx.c.Node or nil
end,
-- Access C integer environment.
integer = {
-- "Sign and magnitude", two's, or one's complement.
signedness = 'SM' or '2C' or '1C',
-- > ... is whether the value with sign bit 1 and all value bits
-- zero (for the first two), or with sign bit and all value bits 1
-- (for ones’ complement), is a trap representation or a normal
-- value. In the case of sign and magnitude and ones’ complement,
-- if this representation is a normal value it is called a
-- negative zero.
--
-- Therefore, `{signedness = 'SM', can_trap = false}` means that
-- negative zero is supported.
can_trap = bool
},
-- TODO:
-- -- Access C floating-point environment.
-- floating_point = {
-- -- Is `INFINITY` supported?
-- infinity = boolean,
-- -- Is negative zero supported?
-- negative_zero = boolean,
-- -- Are NaNs supported? Note that C treats all NaNs as qNaNs.
-- nan = boolean,
-- },
type = {
float = {
format = 'IEEE-754'
},
double = {
format = 'IEEE-754'
},
'long double' = {
format = 'IEEE-754' or 'X87' or 'PPC'
},
char = {
signed = bool -- A char's sign is not defined by the standard
},
short = {
bitsize = int
},
int = {
bitsize = int
},
long = {
bitsize = int
},
'long long' = {
bitsize = int
}
},
Node = {
Macro = {
id = value,
arity = int,
call = function (args)
-- Return the result of macro evaluation
return string
end
},
Constant = {
Integer = {
-- The type may be unknown for freestanding constants
type = nx.c.Node.Type.Int or nil,
base = 'decimal' or 'octal' or 'hex',
value = int,
suffixes = {'unsigned' or ('long' or 'long long')}
},
Char = {
-- The type may be unknown for freestanding constants
type = nx.c.Node.Type.Int or nil,
value = string,
-- TODO: > (byte) If c-char is not representable as a single
-- byte in the execution character set, the value is
-- implementation-defined.
-- TODO: > (16bit, 32bit, wide) If c-char is not
-- or maps to more than one 16-bit character, the behavior is
-- implementation-defined.
-- TODO: > (multichar) has [...] implementation-defined value
kind = 'byte' or -- E.g. `'a'` or `'\n'` or `'\13'`
'utf8' or -- E.g. `u8'a'`
'16bit' or -- E.g. `u'貓'`, but not `u'🍌'`
'32bit' or -- E.g. `U'貓'`, but not `U'🍌'`
'wide' or -- E.g. `L'β'` or `L'貓'`
'multichar' -- E.g. `'AB'`
},
Floating = {
-- The type may be unknown for freestanding constants
type = nx.c.Node.Type.Int or nil,
base = 'decimal' or 'hex',
significand = {
whole = int,
fraction = uint
},
exponent = int or nil,
suffix = 'f' or 'l' or nil
},
String = {
-- The type may be unknown for freestanding constants,
-- e.g. `char* = "foo"` has type, but `"foo"` does not.
type = nx.c.Node.Type.Pointer or nx.c.Node.Type.Array or nil,
value = string,
-- TODO: > (byte) If c-char is not representable as a single
-- byte in the execution character set, the value is
-- implementation-defined.
-- TODO: > (16bit, 32bit, wide) If c-char is not
-- or maps to more than one 16-bit character, the behavior is
-- implementation-defined.
-- TODO: > (multichar) has [...] implementation-defined value
kind = 'byte' or -- E.g. `'a'` or `'\n'` or `'\13'`
'utf8' or -- E.g. `u8'a'`
'16bit' or -- E.g. `u'貓'`, but not `u'🍌'`
'32bit' or -- E.g. `U'貓'`, but not `U'🍌'`
'wide' or -- E.g. `L'β'` or `L'貓'`
'multichar' -- E.g. `'AB'`
}
},
Type = {
Int = {
kind = 'short' or 'int' or 'long' or 'long long',
signed = boolean,
atomic = boolean,
constant = boolean,
volatile = boolean,
alignment = int or nil
},
Float = {
kind = 'float' or 'double' or 'long double',
complex = boolean,
imaginary = boolean,
atomic = boolean,
alignment = int or nil
},
Pointer = {
to = nx.c.Node.Type,
alignment = int
},
Reference = {
to = nx.c.Node.Type,
alignment = int
},
Struct = {
},
Enum = {
},
Union = {
},
Function = {
},
},
TypeDef = {
id = string,
target = nx.c.Node.Type or nil
},
FunctionDecl = {
}
}
},
Node = {
-- Represent any referencable type.
['@Type'] = nx.Node.Namespace or
nx.Node.Trait or
nx.Node.Struct or
nx.Node.Enum or
nx.Node.Primitive or
nx.Node.TypeAlias
-- Represent a declaration.
['@Decl'] = nx.Node['@Type'] or
nx.Node.FunctionDecl or
nx.Node.FunctionDef or
nx.Node.VarDecl or
nx.Node.AnnotationDef or
nx.Node.MacroDef
-- A namespace node. Any other type declaration
-- is also considered a name space.
Namespace = {
name = string, -- E.g. `'Bar'` for `::Foo::Bar`
-- Would be nil for the top-level namespace.
parent = nx.Node['@Type'] or nil,
declarations = { nx.Node['@Decl'] },
-- Return a fully qualified path to the namespace in form of
-- an array of strings, e.g. `{'Foo', 'Bar'}` for `::Foo::Bar`.
path = function ()
return {string}
end
},
-- A trait node
Trait = nx.Node.Namespace + {
annotations = { nx.Node.AnnotationCall },
-- A trait may derive from other traits
derivees = {
[1] = {
trait = nx.Node.TypeRef,
declarations = { nx.Node['@Decl'] }
}
}
},
-- A struct node derives from the Trait node
Struct = nx.Node.Trait,
-- A enum node, which may also be a flag
Enum = nx.Node.Namespace + {
is_flag = true or false,
-- The enum type, which is `$int` by default
type = nx.Node.TypeRef,
elements = { nx.Node.Enum.Element },
annotations = { nx.Node.TypeRef },
-- A enum element
Element = {
name = string,
value = nx.Node.Literal or nil
}
},
Primitive = nx.Node.Namespace,
FunctionDecl = {
parent = nx.Node['@Type'],
visibility = 'public' or 'protected' or 'private' or nil,
storage = 'static' or 'instance' or nil,
safety = 'threadsafe' or 'fragile' or 'unsafe' or nil,
mutability = 'mut' or 'const' or nil,
name = string,
arguments = { nx.Node.ArgDecl },
annotations = { nx.Node.AnnotationCall }
},
FunctionDef = {
parent = nx.Node['@Type'],
proto = nx.Node.FunctionDecl, -- A prototype per specialization
body = { nx.Node.Expr },
annotations = { nx.Node.AnnotationCall }
},
VarDecl = {
parent = nx.Node['@Type'],
-- Would be nil if undefined
is_static = true or false or nil,
-- Would be nil if undefined
visibility = 'public' or 'protected' or 'private' or nil,
-- Always defined
is_final = true or false,
-- Would be false for `set foo`, always defined
has_getter = true or false,
-- Would be false for `get foo` or `final foo`, always defined
has_setter = true or false,
name = 'MyVar',
type = nx.Node.TypeRef or nil,
annotations = { nx.Node.AnnotationCall }
},
-- An annotation definition node
AnnotationDef = {
parent = nx.Node['@Type'],
name = string,
-- An annotation consists of macro code (may be empty)
macros = { nx.Node.Macro },
-- Resolve a full path to the annotation
path = function,
-- Annotations may be annotated themselves!
annotations = { nx.Node.AnnotationCall }
},
-- A macro definition node
MacroDef = {
parent = nx.Node['@Type'],
is_builtin = boolean,
visibility = 'public' or 'protected' or 'private' or nil,
name = string,
arguments = { nx.Node.ArgDecl },
body = { nx.Node.Macro }
}
-- An annotation call (i.e. the act of annotating) node
AnnotationCall = {
annotation = nx.Node.AnnotationDef,
arguments = { nx.Node.ArgCall }
},
TypeAlias = {
parent = nx.Node['@Type'],
name = string,
arguments = { nx.Node.ArgDecl },
target_type = nx.Node['@Type'],
target_arguments = { nx.Node.ArgCall }
},
TypeRef = {
type = nx.Node['@Type'],
arguments = { nx.Node.ArgCall }
},
ArgDecl = {
name = string or nil,
alias = string or nil,
is_static = boolean,
type = nx.Node.TypeRef or nil,
annotations = { nx.Node.AnnotationCall },
has_default_value = boolean
},
ArgCall = {
parent = { nx.Node.TypeRef or nx.Node.FunctionCall },
-- A link to the target argument declaration, if resolved
declaration = nx.Node.ArgDecl or nil,
-- How the argument is referenced.
-- Would be a number for `[0]:` reference,
-- a string for named reference,
-- and nil for a simple sequence
ref = number or string or nil,
value = nx.Node.Expr
}
}
}
Glossary
- Standard
-
A widely accepted specification.
- Specification
A document that states requirements.
- Abstract character
-
A entry in a character set representing a grapheme or a sememe, usually mapped to a single codepoint.
A grapheme may be represented in multiple ways in a single character set, and also as a sequence of multiple abstract characters. For example ⟨å⟩ in Unicode is
å
(U+61 followed by U+030A) and alsoå
(U+00E5). - Allograph
-
One of many glyphs representing the same grapheme.
- Character set
-
A mapping of abstract characters to codepoints. Examples of character set are ISO/IEC 10646, ISO/IEC 646:US.
- Codepoint
-
A numerical value representing an abstract character from a character set.
A codepoint may be encoded in one or multiple codeunits.
- Codeunit
-
A sequence of bits in a certain encoding.
Depending on the encoding, a single or multiple codeunits make up a codepoint.
- Combined grapheme
-
A grapheme containing at least one combining abstract character.
- Combining abstract character
-
An abstract character usually representing a sememe and intended to modify other abstract characters when in a sequence, e.g.
◌̊
(U+030A). - Character encoding
-
A way to encode a codepoint into a sequence of bits.
Character encoding is not to be confused with character set. However, sometimes a character encoding has the name of the character set, for example ISO/IEC 646:US (a.k.a. US-ASCII).
Other examples of character encoding are UTF-8, UCS-2.
- Glyph
-
An elemental printable symbol intended to represent a grapheme.
- Grapheme
-
In the scope of this standard, a grapheme is what Unicode defines as a printable character
A unit of writing which is (1) lexically distinctive, (2) has linguistic value (mostly by referring to phonemes, syllables, morphemes, etc.), and is (3) minimal.
— Meletis, D.
Graphemes include not only letters and digits, but also other printable symbols, such as arrows. A grapheme is notated with angle brackets, e.g. ⟨a⟩, ⟨A⟩, ⟨å⟩, ⟨1⟩, ⟨→⟩.
In a character set, a grapheme is represented by one or multiple abstract characters.
- Sememe
-
An abstract semantic unit of meaning.
In a character set, a sememe is an abstract character.