The Ultimate Guide to TOON
TOON (Token-Oriented Object Notation) is a compact, deterministic, lossless representation of JSON designed for anyone working with LLM pipelines, structured data, or high-volume prompts. This document explains how TOON works, how to convert JSON to TOON and TOON to JSON, how to use a TOON validator, and how to reliably validate TOON format using strict structural guarantees.
1. What Is TOON Format?
TOON (Token-Oriented Object Notation) is a lossless alternative representation of JSON. It does not invent new data types. It does not alter semantics. It simply removes structural redundancy and uses syntax that is more efficient for LLMs to parse. TOON keeps:
- objects
- arrays
- strings
- numbers
- booleans
- null
The key difference: TOON expresses these elements using indentation and an optional "tabular array" notation that replaces repeated JSON field names with a schema declaration.
2. How TOON Works
TOON has two core mechanisms:
2.1 Indentation Instead of Braces
JSON:
{
"user": {
"id": 3,
"name": "Ada"
}
}TOON:
user: id: 3 name: Ada
No braces. No quotes unless required.
2.2 Tabular Arrays for Uniform Objects
When JSON contains an array of objects with identical keys:
[
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]TOON compresses this:
users[12]{id,name,role}:
1,Alice,admin
2,Bob,userKey features:
[12]= declared array length{id,name,role}= schema- rows = field-aligned values
3. TOON vs JSON: Why Use TOON?
3.1 Efficiency
JSON repeats field names in every row. TOON does not. Token usage typically drops by 30–60%. On time-series workloads, reductions over 59% are common.
3.2 Accuracy in LLM outputs
Retrieval benchmarks show:
- TOON accuracy: 73.9%
- JSON accuracy: 69.7%
Models benefit from explicit structure and fewer distractor tokens.
3.3 Built-in Validation
TOON encodes its own schema:
- expected row count
- expected field count
- required field order
A TOON validator catches malformed data instantly.
3.4 Minimal Quoting
Strings only require quotes when ambiguous. Reduces token noise and helps models focus.
4. TOON Format Documentation (Concise)
4.1 Objects
Indentation defines structure:
settings: enabled: true retries: 3
4.2 Primitive Arrays
names[3]: mei,alicia,jamal
4.3 Tabular Arrays
items[4]{id,title,price}:
1,Book,12.50
2,Pen,1.20
3,Notebook,4.75
4,Map,9.004.4 Delimiters
Use:
- comma
- tab
- pipe
Tabs tokenize best.
4.5 Quoting Rules
Quote only when:
- the string is empty
- contains special characters
- starts/ends with whitespace
- resembles numbers or booleans
5. TOON Examples
Example: Mixed Structure
config:
version: 1
owners[2]: chen,mina
servers[3]{id,host,port}:
1,api,443
2,cache,6379
3,worker,9000Example: Log Export
export:
generatedAt: 2025-02-10
entries[3]{id,status,latencyMs}:
1,ok,12
2,ok,14
3,fail,2006. Validation: Using a TOON Validator
A TOON validator examines:
- indentation correctness
- strict delimiter usage
- valid quoting
- correct
[N]row count - correct field count for each row
- valid primitive types
Because TOON encodes its own schema, validation is deterministic.
Common failure cases caught by validators:
- a row missing a column
- an extra column in a row
- missing rows (truncation)
- stray indentation
- malformed field names
7. JSON ↔ TOON Conversion
7.1 JSON to TOON
Use:
- CLI converters
- library calls
- convert json to toon online tools
This is commonly done before inserting structured data into an LLM prompt.
7.2 TOON to JSON
Decoders auto-detect the format.
This supports:
- reading LLM output
- reintegrating structured data
- downstream processing
7.3 Determinism
decode(encode(x)) returns normalized JSON.
Non-JSON values (NaN, dates, bigints) normalize as JSON-compatible types.
8. When TOON Should Be Used
Best-fit scenarios:
- large uniform arrays
- structured tabular data
- datasets intended for LLM ingestion
- reproducible evaluation benchmarks
- applications that require strict schema adherence
Example high-value use cases:
- time-series analytics
- embeddings metadata
- evaluation datasets
- multi-item reasoning workloads
- synthetic dataset generation
9. When TOON Isn't Ideal
Avoid TOON when:
- the data structure is deeply nested
- uniformity is low (mixed objects)
- CSV is sufficient (purely flat tables)
- CPU-bound latency is higher priority than token count
TOON's sweet spot is uniform arrays with primitive fields.
10. Real Engineering Notes
TOON is valuable because:
- LLMs emit it more reliably than JSON
- Validation is far more strict
- Tabular arrays reduce hallucinations
- It avoids "JSON fixing" scripts
- It integrates into pipelines with minimal friction
- It is human-readable but machine-strict
- It acts as a drop-in compression layer over JSON without semantic loss