Specification Reference

ODIN-L 1.0, Open Data Interchange Notation Language

Overview

ODIN is a data interchange format designed for the AI era, combining the token efficiency of CSV, the nesting capability of JSON, the type safety of Protocol Buffers, and the human readability of TOML, all in a single notation that machines and humans can parse, generate, and reason about without ambiguity.

Self-describing typed values

Prefixes (#, ?, @, ^, ~, #$, ##) eliminate parsing ambiguity

Inline modifiers

! required, * confidential, - deprecated, metadata travels with values

Tabular mode

Eliminates key repetition for lists, cutting payload size 20-50% vs JSON

Strict syntax

One valid parse, no barewords, no leniency, no edge cases

Mixed modes

Flat assignments, nested paths, and typed tables in one document

Token-efficient

30% fewer tokens and 40% smaller than JSON on average, while carrying more semantic information per byte. Less data, more meaning.

ODIN is not a standard. Standards imply committees, politics, and "implementation flexibility" that defeats interoperability. ODIN is a notation, one way to write it, one way to read it.

Design Principles

Self-describing: Types are in the data, not just schema
Line-based: One assignment per line, no wrapping
Flat: Explicit paths, no implicit tree traversal
Diffable: Meaningful diffs in version control
Token-efficient: Minimal structural overhead
Deterministic: Same data produces same output
Composable: Documents can chain and reference each other

Why Symbol Density?

ODIN uses prefix symbols (#, ##, #$, ?, @, ^, ~, !, *, -) rather than keywords.

For machines: Symbols are unambiguous single-token markers. A parser sees # and knows "number follows" without lookahead. No reserved words to conflict with data values.

For AI/LLM processing: JSON's structural overhead, braces, brackets, colons, quotes around every key, is token waste. Compare:

JSON

{"vehicle": {"year": 2022, "make": "Honda", "model": "Accord"}}

ODIN

vehicle.year = #2022
vehicle.make = "Honda"
vehicle.model = "Accord"

The ODIN version is more tokens of actual data, fewer tokens of syntax. When processing millions of documents through LLMs, this matters.

Why Flat Paths?

Nested formats like JSON, YAML, and XML have a fundamental problem: merge conflicts. When two branches modify different fields in the same object, the entire object block conflicts because the closing braces don't match up.

Tab-based and whitespace-based nesting (YAML, Python-style) adds another problem: depth is ambiguous. A misplaced tab shifts an entire subtree. Whether a field is nested two or three levels deep is invisible without counting whitespace characters. Flat paths make depth explicit and unambiguous.

ODIN's flat paths eliminate both problems:

Git-friendly flat paths

; Branch A adds:
policy.coverage.collision = #500

; Branch B adds:
policy.coverage.comprehensive = #250

; Git merges cleanly, different lines, no conflict

Headers provide visual grouping for humans while maintaining the flat, diffable structure underneath:

Headers for readability

{policy.coverage}
collision = #500
comprehensive = #250
liability = #100000

Why Self-Describing Types?

This is the single biggest reason ODIN exists. JSON gives you six types: string, number, boolean, null, array, object. XML gives you essentially nothing. Everything is a string until an XSD says otherwise. Both rely on out-of-band documentation to tell you what the data actually means.

ODIN encodes type information directly on the wire. Every value tells you what it is. No schema lookup, no data dictionary, no guessing.

JSON: Documentation-Dependent

Looks reasonable, but...

{
  "policyNumber": "POL-2024-001",
  "premium": 1250.00,
  "discount": 0.125,
  "deductible": 500,
  "effective": "2024-06-15",
  "expires": "2025-06-15",
  "duration": "P1Y",
  "ssn": "123-45-6789",
  "drivers": 12,
  "active": true,
  "lastClaim": null
}

Now answer these questions without a data dictionary:

premium: dollars? cents? what currency? USD or CAD or EUR?
discount: 0.125. Is that 12.5% or 0.125%? A ratio? A factor?
deductible: integer dollars or float dollars?
effective / expires: dates? Or just strings that happen to look like dates?
duration: ISO 8601 duration string? Or some other format?
ssn: just a string. Nothing flags it as PII. Nothing prevents it from leaking to logs.
drivers: integer 12 or float 12.0? Different parsers will give you different answers.
lastClaim: explicitly null, or just missing? The schema can't tell you.

Every field needs documentation. Every parser needs a schema validator to round-trip the types back. Every integration project starts with a 50-page data dictionary that goes stale the moment someone adds a field.

ODIN: The Data Tells You

Same data, fully self-describing

{policy}
number = "POL-2024-001"
premium = #$1250.00:USD
discount = #%12.5
deductible = ##500
effective = 2024-06-15
expires = 2025-06-15
duration = P1Y
ssn = *"123-45-6789"
drivers = ##12
active = ?true
lastClaim = ~

Now read it again:

premium: #$1250.00:USD. Currency, US dollars, period. No ambiguity.
discount: #%12.5. Percent. 12.5%, not 0.125 of a ratio.
deductible: ##500. Integer. Will never be parsed as 500.0.
effective: 2024-06-15. An actual date, not a string. ISO 8601 syntax baked into ODIN.
duration: P1Y. An actual duration value. Parsers return a typed Duration object.
ssn: *"...". The asterisk prefix marks it as PII. SDKs can auto-redact in logs.
drivers: ##12. Integer, every SDK in every language returns an integer.
active: ?true. Explicitly boolean, not a string that happens to be "true".
lastClaim: ~. Explicitly null. Distinct from missing.

Why It Matters

When a Rust parser, a TypeScript parser, a Python parser, and a Java parser all read #$1250.00:USD, they all return a Currency object with amount=1250.00 and code="USD". The type isn't determined by the parser, the schema, the documentation, or the application code. It's determined by the bytes on the wire.

This eliminates an entire class of bugs:

No more "wait, is the API returning cents or dollars this week?"
No more JavaScript silently truncating large integer IDs to floats
No more "2024-13-45" sneaking through as a "valid" string
No more PII leaking into logs because nothing flagged it as sensitive
No more "the schema says number, but the data has strings" mismatches

For insurance, healthcare, finance, and legal, industries where ambiguity has real cost, this is foundational. The data documents itself, and every system that reads it gets the same answer.

Value Types

Core Types

Type	Prefix	Format	Examples
String	(none)	Quoted	`"Honda"`, `"Price = $50"`
Boolean	`?`	Prefix required	`?true`, `?false`
Null	`~`	Literal	`~`
Reference	`@`	Path	`@parties[0]`, `@vehicles[0].garaging`
Binary	`^`	Base64	`^SGVsbG8=`, `^sha256:ABC123...`
Verb	`%`	Expression	`%upper @name`, `%concat @first " " @last`

Strings are normally quoted on a single line. A triple-quoted (""") block is a multiline string: its content is kept verbatim across newlines until the closing """.

Numeric Types

Type	Prefix	Description	Examples
Number	`#`	Any numeric (integer or decimal)	`#2022`, `#-45.50`, `#1.2e10`
Integer	`##`	Whole number only	`##42`, `##-100`, `##0`
Currency	`#$`	Monetary value with precision preserved	`#$100.00`, `#$-50.25`, `#$99.99:USD`, `#$1.00000000:BTC`
Percent	`#%`	Percentage as decimal (0-1 range)	`#%0.15`, `#%1.0`, `#%0.055`

Numeric Rules

# accepts any numeric precision, use for general numbers
## explicitly marks value as integer; decimal values are invalid
#% marks value as a percentage (stored as 0-1 decimal, where 0.5 = 50%)
Exponential notation (1.2e10) valid with # prefix
Negative values: place - after prefix (#-45, ##-100, #$-50.00)

Currency Codes

#$ marks a value as currency. Decimal precision is preserved from the source (minimum 2, up to 18 digits), making it suitable for both traditional finance and crypto.

An optional currency code suffix identifies the denomination. Codes follow ISO 4217 for fiat currencies (USD, EUR, GBP, JPY) and common ticker symbols for digital assets (BTC, ETH, USDC):

Currency codes

price = #$99.99:USD          ; US Dollars
cost = #$1234.56:EUR         ; Euros
amount = #$50.00:GBP         ; British Pounds
crypto = #$1.00000000:BTC    ; 8-digit precision for Bitcoin
local = #$100.00             ; No code = local currency assumed

Temporal Types

Type	Prefix	Format	Examples
Date	(none)	`YYYY-MM-DD`	`2024-06-15`
Timestamp	(none)	ISO 8601	`2025-12-06T14:30:00Z`
Time	`T`	ISO 8601 time	`T14:30:00`, `T09:00:00.500`
Duration	`P`	ISO 8601 duration	`P6M`, `P1Y`, `PT30M`, `P1DT12H`

Sections & Paths

Headers

Headers set a path prefix for subsequent assignments, providing visual grouping while maintaining flat, diffable structure.

Syntax	Meaning	Example
`{path}`	Set absolute prefix	`{vehicles[0]}`
`{.path}`	Set relative prefix	`{.garaging}`
`{}`	Reset to root	`{}`

Relative headers

{vehicles[0]}
vin = "1HGCM82633A004352"
year = #2022

{.garaging} ; resolves to vehicles[0].garaging
line1 = "123 Main Street"
city = "Columbus"

{.lienholder} ; resolves to vehicles[0].lienholder
name = "First National Bank"

{drivers[0]} ; absolute, resets context
name.first = "John"

Metadata & Root Reset

With no header, assignments are already at the document root. The {$} header lifts the context up to the reserved $ metadata layer, and that context persists like any other header - so you write {} to come back down to root.

Returning to root after metadata

title = "Quarterly Report"   ; no header yet -> root: title

{$}
odin = "1.0.0"
created = 2025-12-06         ; -> $.created (metadata)

{} ; back down to root
author = "Jane Smith"       ; -> author (root)

A blank line does not end a header - only another header or {} changes the context. Without the {} above, author would land in the metadata layer as $.author.

Path Reference

Element	Syntax	Example
Simple path	`segment.segment`	`policy.effective`
Array access	`segment[n]`	`vehicles[0]`
Nested array	`segment[n].segment[n]`	`drivers[0].violations[2]`
Extension	`&domain.path`	`&com.acme.custom_field`

Extension paths use reverse domain notation to namespace custom fields:

Extension paths

&com.acme.priority = ##3
&org.opendata.region = "NA"
&io.example.custom = "value"

Arrays & Tables

Arrays are ordered collections accessed by zero-based index. They are created implicitly by assigning to indexed paths.

Array creation

items[0].name = "First"
items[0].price = #10.00
items[1].name = "Second"
items[1].price = #20.00

Array Rules

Zero-based: First element is [0]
Contiguous: Indices must be sequential with no gaps
Object elements: Array elements are objects with fields, not primitive values
Implicit creation: Assigning to field[0] creates the array

Empty and null arrays

items[] = ~                ; explicit empty/null array (no elements)

Tabular Mode

For arrays of flat objects (primitives only), tabular syntax provides a compact representation that eliminates key repetition:

Tabular syntax

{line_items[] : sku, description, qty, price}
"ABC-001", "Widget", ##10, #$5.99
"ABC-002", "Gadget", ##5, #$12.50
"XYZ-100", "Cable, 6ft", ##20, #$3.25

This is equivalent to the expanded form:

Expanded equivalent

line_items[0].sku = "ABC-001"
line_items[0].description = "Widget"
line_items[0].qty = ##10
line_items[0].price = #$5.99
line_items[1].sku = "ABC-002"
line_items[1].description = "Gadget"
line_items[1].qty = ##5
line_items[1].price = #$12.50
; ...

Tabular Rules

Header declares columns: {path[] : col1, col2, col3} defines array path and column names
Rows follow header: Each non-blank, non-comment line is a row
Comma-separated values: Values separated by , with optional whitespace
Standard type prefixes: Values use #, ##, #$, @, ^, ~, true, false, dates, etc.
Quote strings with commas: Use "value, with comma" for strings containing commas
Null cells: Use ~ for null values
Absent cells: Empty cell (nothing between commas) means field is absent
Exit tabular mode: Next header {...} or document separator --- ends tabular section

Cell Value Semantics

Syntax	Meaning	Path Created?
`~`	Null value	Yes (with null)
`""`	Empty string	Yes (with "")
(empty)	Absent/missing	No

Relative column names reduce repetition when multiple columns share a common parent path:

Relative column names

; These two headers are equivalent:
{holders[] : name, address.line1, address.city, address.state, address.postal, active}
{holders[] : name, address.line1, .city, .state, .postal, active}

Primitive Arrays

For arrays of primitive values (no object fields), use the special ~ column marker with {path[] : ~} syntax. Each row is one value. Types can be mixed within the same array:

Primitive array syntax

; Integer array
{txIndexes[] : ~}
##8208220048659020
##2830423323628866
##3871696279527106

; String array
{tags[] : ~}
"urgent"
"important"
"reviewed"

; Mixed-type array
{values[] : ~}
"text"
##42
?true
~
#$9.99

Modifiers

Modifier	Symbol	Position	Meaning
Critical	`!`	After `=`	Field is required; absence fails validation
Confidential	`*`	After `!` or `=`	Field contains sensitive data; systems should protect accordingly
Deprecated	`-`	After `=`	Field is obsolete; may be removed in future

Modifier Order: = [!][-][*][type_prefix]value

Modifier examples

field = !"value"         ; critical string
field = *"value"         ; confidential string
field = -"value"         ; deprecated string
field = !*"value"        ; critical + confidential string
field = !-"value"        ; critical + deprecated (still required but obsolete)
field = -*"value"        ; deprecated + confidential
field = !-*"value"       ; critical + deprecated + confidential
field = !#100            ; critical number
field = !##42            ; critical integer
field = !#$99.99         ; critical currency

Modifier Semantics

Critical (!): Field must be present and non-null; validation fails if absent
Confidential (*): Signals that the field contains sensitive data (SSN, account numbers, etc.). This is a hint to consuming systems, the value itself is transmitted as-is, but systems should mask it in logs, displays, and non-secure outputs.
Deprecated (-): Field exists for backward compatibility; consumers should migrate to alternatives

Comments & Directives

Comments begin with semicolon (;) and extend to end of line:

Comments

; This is a full-line comment
name = "John Smith"     ; This is an inline comment

; Comments can appear anywhere
{policy} ; even after headers
effective = 2024-06-15  ; and after values

Comment Rules

Comments begin with ; character and extend to end of line only
No block comments, multi-line comments require ; on each line
Stripped in canonical form
Semicolons in quoted strings are not comments: desc = "a; b" is valid

Lines starting with @ are directives (import, schema, conditional):

Directives

@import ./other.odin      ; Import directive
@schema https://...       ; Schema directive
@if condition             ; Conditional directive

Document Chaining

ODIN supports composing multiple documents into a single stream or referencing external documents. This enables layered data models where a base document is modified by subsequent documents.

The --- separator divides multiple ODIN documents within a single stream:

Document chaining

{$}
odin = "1.0.0"
id = "policy_base_001"
role = "base"

{policy}
number = "PAP-2024-001"
effective = 2024-06-15
term = P6M

{vehicles[0]}
vin = "1HGCM82633A004352"
year = #2022
make = "Honda"
model = "Accord"

---

{$}
odin = "1.0.0"
id = "endorsement_001"
role = "endorsement"
parent = @policy_base_001
effective = 2024-09-01

{vehicles[1]}
vin = "5YJSA1E26MF123456"
year = #2023
make = "Tesla"
model = "Model 3"

Document Roles

Roles are free-form strings, any value meaningful to your domain. Common patterns include:

Role	Purpose	Example Domain
`base`	Original document or transaction	Any
`amendment`	Modification to a base document	Contracts, legal
`revision`	Updated version of a prior document	Publishing, regulatory
`header`	Shared entity data that persists across transactions	Insurance, finance
`endorsement`	Mid-term modification	Insurance
`renewal`	New term based on prior agreement	Insurance, leasing
`cancellation`	Terminates an active record	Insurance, subscriptions
`correction`	Fixes errors in a prior document	Finance, healthcare
`supplement`	Adds information to a prior document	Legal, medical records

Encoding & File Metadata

Property	Value
Character set	UTF-8
Line endings	LF (U+000A) preferred, CRLF accepted
Line length	Unlimited (no wrapping)
Case sensitivity	Case-sensitive throughout
Reserved words	None
File extension	`.odin`
MIME type	`text/odin-l`

File Naming Conventions

Pattern	Purpose	Example
`*.odin`	General ODIN data documents	`policy.odin`
`*.schema.odin`	ODIN Schema definitions	`auto.schema.odin`
`*.transform.odin`	ODIN Transform definitions	`xml-to-odin.transform.odin`

Document Metadata

The $ path is reserved for document-level metadata:

Document metadata

{$}
odin = "1.0.0"
id = "doc_abc123"
created = 2025-12-06T14:30:00Z
source.format = "al3"
source.version = "2024.1"
hash = ^sha256:e3b0c44298fc1c149afbf4c8996fb924...
signature = ^ed25519:SGVsbG8gV29ybGQhIFRoaXM...

EBNF Grammar

The complete formal grammar for ODIN-L 1.0, written in ISO 14977 EBNF. Every literal terminal is quoted; only { } and [ ] are used for repetition and optional groups. This grammar is the canonical source - it reflects the exact behavior of the Odin parser.

(* ===========================================================================
   ODIN-L 1.0 — Core Notation Grammar
   ---------------------------------------------------------------------------
   Canonical EBNF for the Open Data Interchange Notation Language.

   Notation: ISO 14977 EBNF.
     { x }   means zero or more repetitions of x
     [ x ]   means x is optional (zero or one)
     ( x )   groups
     a | b   alternation
     "lit"   terminal literal
     a , b   concatenation (comma optional in this file for readability)

   Every terminal is enclosed in double quotes. There are no bare repetition
   operators (no `*`, no `+`). The only metacharacters are `{ } [ ] ( ) | "`.

   This grammar reflects the exact behavior of the Odin parser. Deviations
   from this grammar are parser bugs, not language extensions.
   =========================================================================== *)


(* --------------------------------------------------------------------------
   1. DOCUMENT STRUCTURE
   -------------------------------------------------------------------------- *)

document          = [ bom ] { document_element } ;
bom               = ? UTF-8 BOM, U+FEFF, stripped if present at offset 0 ? ;

document_element  = blank_line
                  | comment_line
                  | header
                  | assignment
                  | directive
                  | document_separator ;

blank_line        = newline ;
document_separator = "---" newline ;

newline           = "\n" | "\r\n" | "\r" ;


(* --------------------------------------------------------------------------
   2. COMMENTS
   -------------------------------------------------------------------------- *)

(* Comments begin with ";" and consume the rest of the line. They may appear
   on their own line or trailing an assignment, header, or tabular row. They
   may NOT appear inside quoted strings, header braces, or array indices. *)

comment_line      = comment newline ;
comment           = ";" { char_except_newline } ;


(* --------------------------------------------------------------------------
   3. HEADERS
   -------------------------------------------------------------------------- *)

header            = "{" header_body "}" [ trailing_comment ] newline ;

header_body       = ""                              (* {} resets context     *)
                  | [ "." ] header_path [ tabular_clause ] ;

header_path       = meta_path | regular_path ;

meta_path         = "$" [ "." path_segment { ( "." | array_index ) path_segment } ] ;

regular_path      = path_segment { ( "." | array_index ) path_segment } ;

path_segment      = identifier | "&" identifier { "." identifier } ;

(* Tabular clause turns a header into an array-of-records (or primitive
   array) declaration. Subsequent lines until the next header are data rows. *)

tabular_clause    = ":" [ primitive_marker ] [ column_list ] ;
primitive_marker  = "~" ;
column_list       = column_name { "," column_name } [ "," ] ;
column_name       = "." identifier
                  | identifier [ "." identifier ] ;


(* --------------------------------------------------------------------------
   4. ARRAY INDICES
   -------------------------------------------------------------------------- *)

(* An array index is the bracketed segment that follows a path component. *)

array_index       = "[" array_index_body "]" ;

array_index_body  = ""                                    (* tabular sentinel *)
                  | digits                                (* normal index     *)
                  | jsonpath_filter                       (* filter expr      *)
                  | key_list ;                            (* keyed lookup     *)

jsonpath_filter   = "?" "(" { char_except ( ")" ) } ")" ;
key_list          = identifier { "," identifier } ;


(* --------------------------------------------------------------------------
   5. ASSIGNMENTS
   -------------------------------------------------------------------------- *)

assignment        = path "=" [ modifiers ] value [ trailing_directives ]
                    [ trailing_comment ] newline ;

path              = path_start { path_continuation } ;
path_start        = identifier | "$" | "&" identifier ;
path_continuation = "." path_element
                  | array_index
                  | ".@" identifier ;       (* XML attribute reference *)
path_element      = identifier | "true" | "false" ;

(* Identifiers permit ASCII letters, digits, underscores, and hyphens. They
   must begin with a letter or underscore. *)

identifier        = ident_start { ident_cont } ;
ident_start       = letter | "_" ;
ident_cont        = letter | digit | "_" | "-" ;


(* --------------------------------------------------------------------------
   6. MODIFIERS
   -------------------------------------------------------------------------- *)

(* Modifiers prefix the value. Each may appear at most once. Order is not
   semantically significant; the parser accepts them in any order. *)

modifiers         = { modifier } ;
modifier          = "!"   (* required / critical                  *)
                  | "*"   (* confidential — masked downstream     *)
                  | "-"   (* deprecated                            *) ;


(* --------------------------------------------------------------------------
   7. VALUES
   -------------------------------------------------------------------------- *)

(* Every value carries its own type via a one- or two-character prefix, with
   the exception of bare booleans (true / false) and quoted strings. Bare
   unquoted strings are NOT permitted as values. *)

value             = quoted_string
                  | multiline_string
                  | currency
                  | percent
                  | integer
                  | number
                  | boolean
                  | null
                  | reference
                  | binary
                  | timestamp
                  | date
                  | time
                  | duration
                  | extension_value ;


(* 7.1 Strings *)

quoted_string     = '"' { string_char | escape_seq } '"' ;
multiline_string  = '"""' { multiline_char | escape_seq } '"""' ;

string_char       = ? any character except '"', '\', '\n', '\r' ? ;
multiline_char    = ? any character except the closing '"""' or '\' ? ;

escape_seq        = "\" ( "\" | '"' | "n" | "t" | "r" | "0"
                        | "u" hex_digit hex_digit hex_digit hex_digit
                        | "U" hex_digit hex_digit hex_digit hex_digit
                              hex_digit hex_digit hex_digit hex_digit ) ;


(* 7.2 Numeric types *)

number            = "#"  [ "-" ] digits [ "." digits ] [ exponent ] ;
integer           = "##" [ "-" ] digits ;
currency          = "#$" [ "-" ] digits [ "." digits ] [ ":" currency_code ] ;
percent           = "#%" [ "-" ] digits [ "." digits ] ;

exponent          = ( "e" | "E" ) [ "+" | "-" ] digits ;
currency_code     = letter letter letter ;     (* ISO 4217, parser uppercases *)


(* 7.3 Boolean and null *)

boolean           = "?" "true" | "?" "false" | "true" | "false" ;
null              = "~" ;


(* 7.4 References *)

(* References point at another path within the document or its metadata.
   A leading "." denotes a relative path; "$" denotes the metadata root. *)

reference         = "@" reference_target ;
reference_target  = relative_ref | absolute_ref | meta_ref ;
relative_ref      = "." path_element { ( "." | array_index ) path_element } ;
absolute_ref      = path_element { ( "." | array_index ) path_element } ;
meta_ref          = "$" "." path_element { "." path_element } ;


(* 7.5 Binary *)

binary            = "^" [ algorithm ":" ] base64_content ;
algorithm         = identifier ;
base64_content    = { base64_char } [ "=" [ "=" ] ] ;
base64_char       = letter | digit | "+" | "/" ;


(* 7.6 Temporal types *)

(* Dates use ISO 8601. The parser semantically validates month and day
   ranges; values like 2024-13-01 or 2024-02-30 are rejected at parse
   time, not just at validation time. *)

date              = digit digit digit digit "-" digit digit "-" digit digit ;

timestamp         = date "T" digit digit ":" digit digit ":" digit digit
                    [ "." digits ]
                    [ tz_offset ] ;
tz_offset         = "Z"
                  | ( "+" | "-" ) digit digit [ ":" digit digit ] ;

time              = "T" digit [ digit ]
                    [ ":" digit [ digit ]
                      [ ":" digit [ digit ] [ "." digits ] ] ] ;

duration          = "P" [ digits "Y" ] [ digits "M" ] [ digits "W" ] [ digits "D" ]
                    [ "T" [ digits "H" ] [ digits "M" ] [ digits "S" ] ] ;


(* 7.7 Extension values *)

(* The "&" prefix marks an implementation-defined extension value. The
   payload after the prefix is parsed as identifier-dotted namespace plus
   any of the standard value forms. *)

extension_value   = "&" identifier { "." identifier } [ value ] ;


(* --------------------------------------------------------------------------
   8. TRAILING DIRECTIVES
   -------------------------------------------------------------------------- *)

(* Directives that follow a value attach metadata such as positional info
   for fixed-width input, length bounds, or transform flags. *)

trailing_directives = { ":" directive_name [ directive_value ] } ;
directive_name      = identifier ;
directive_value     = number | integer | quoted_string | identifier ;


(* --------------------------------------------------------------------------
   9. TOP-LEVEL DIRECTIVES
   -------------------------------------------------------------------------- *)

directive         = import_directive
                  | schema_directive
                  | conditional_directive ;

import_directive  = "@import" whitespace import_path
                    [ whitespace "as" whitespace identifier ]
                    [ trailing_comment ] newline ;
import_path       = quoted_string | unquoted_path ;
unquoted_path     = { char_except ( whitespace | newline | ";" ) } ;

schema_directive  = "@schema" whitespace url [ trailing_comment ] newline ;
url               = quoted_string | unquoted_path ;

conditional_directive
                  = "@if" whitespace condition [ trailing_comment ] newline ;
condition         = { char_except_newline } ;


(* --------------------------------------------------------------------------
   10. LEXICAL PRIMITIVES
   -------------------------------------------------------------------------- *)

letter            = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I"
                  | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R"
                  | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
                  | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i"
                  | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r"
                  | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ;

digit             = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digits            = digit { digit } ;

hex_digit         = digit
                  | "a" | "b" | "c" | "d" | "e" | "f"
                  | "A" | "B" | "C" | "D" | "E" | "F" ;

whitespace        = " " | "\t" ;

char_except_newline = ? any character except "\n" and "\r" ? ;
char_except         = ? any character except the listed exclusions ? ;

trailing_comment  = whitespace comment ;


(* --------------------------------------------------------------------------
   11. SEMANTIC CONSTRAINTS
   --------------------------------------------------------------------------
   These rules constrain otherwise-valid grammar productions. They are
   enforced by the parser even though they cannot be expressed in pure
   context-free EBNF.

   * No path may be assigned more than once within a document.            (P007)
   * Array indices for the same path must be contiguous starting at 0.    (P016)
   * Array indices may not exceed MAX_ARRAY_INDEX.                        (P015)
   * Total path nesting depth may not exceed MAX_NESTING_DEPTH.           (P010)
   * Quoted strings may not contain unescaped newlines.                   (P004)
   * Numbers may not have an exponent without digits.                     (P001)
   * Currency codes are normalized to uppercase.
   * Bare unquoted strings are forbidden as values; quote them.           (P002)
   * Comments are not recognized inside header braces; ";" inside "{ ... }"
     is treated as a literal character.
   * Relative headers (leading ".") resolve against the most recent
     ABSOLUTE header, not the most recent header of any kind.
   * Date and timestamp values are validated for month (01-12) and
     day-of-month at parse time.
   * Modifiers ("!", "*", "-") may only precede a value.
   * The "$" path is reserved for document metadata; assignments under
     "$.xxx" are stored on the document metadata map, not the assignment map.
   -------------------------------------------------------------------------- *)