License: Apache License 2.0
Steward: Hackolade (IntegrIT SA/NV) -- pending governance evolution
Repository: github.com/xdbml/xdbml-spec
Canonical home: xdbml.org
Abstract
xDBML (eXtended Database Markup Language) is an open markup language for describing the shape of structured data across heterogeneous storage technologies -- relational databases, document stores, columnar warehouses, search engines, graph databases, serialization formats, and API contracts. xDBML is a strict superset of DBML 3.13.6 (the Database Markup Language maintained by Holistics under Apache 2.0) and extends it with constructs for nested hierarchical structures, polymorphism, polyglot vocabulary, first-class JSON data, named reusable types (object-shaped and scalar), AI-readiness metadata, explicit namespace-level declarations, precise relationship cardinality, property-bearing edges for graph models, views, and a module system for splitting schemas across multiple files.
xDBML is designed from the ground up for AI-assisted data modeling and AI-mediated schema interchange. The language matches the way modern LLMs already describe schemas: nested structures are first-class, polymorphism uses the same oneOf/anyOf/allOf vocabulary as JSON Schema, paths into nested fields use unambiguous dotted notation, and every construct accepts synonyms:, business_term:, tags:, and granularity: settings that let natural-language queries resolve to canonical schema elements without guesswork. An LLM asked to "find the monthly recurring revenue field" should not have to infer from column naming conventions; xDBML lets the schema declare the synonyms explicitly.
xDBML is designed as the schema layer of a modern data architecture: upstream of physical engine DDLs (SQL, Cassandra CQL, MongoDB validators, Cypher), downstream of contract layers (ODCS) and semantic layers (OSI, dbt MetricFlow). It serves human authors, AI-assisted data modeling workflows, and bidirectional translation with the major serialization formats (Avro, Parquet, JSON Schema, OpenAPI).
xDBML extends DBML into a unified metadata and semantic data modeling language with richer support for validation, semantics, cardinality, annotations, and AI-friendly metadata, while deliberately staying readable and Git-friendly. It is designed for AI-assisted data modeling and AI-mediated schema interchange.
Strict-superset claim: every valid DBML document parses correctly under xDBML rules, and every DBML construct (used in a way valid in DBML) means the same thing in xDBML as in DBML. When a file declares xdbml: 0.2 (or later), it has opted into xDBML's extended semantics; the semantic meaning of constructs in that context may legitimately differ from DBML's, because the file is no longer claiming to be DBML.
What's new in v0.2: the module system (§26), allowing schemas to be split across multiple files via use and reuse directives, with optional inline clone blocks for file autonomy; and scalar Named Types (§14.7), extending Named Types to describe single-shape declarations rather than only object-shaped structures.
Table of contents
- Introduction
- Document structure
- Lexical conventions
- Version declaration
- Project
- Container
- Entity
- Field
- Index
- Checks -- entity-level constraints
- Relationship and cardinality
- Edge -- relationships with properties
- View -- derived shapes with source queries
- Named type
- Enum
- TablePartial and TableGroup
- Diagram View
- Note
- Path syntax
- Polymorphism
- JSON as a first-class data type
- BSON scalar types
- AI-readiness and semantic-interop settings
- Validation constraints
- Records -- sample data
- Module system
- AST representation
- Round-trip semantics
- Relationship to other standards
- Conformance
Appendix A. Reserved keywords
Appendix B. Lexical grammar reference
Appendix C. Worked end-to-end examples
Appendix D. Compatibility with DBML 3.13.6
Appendix E. Sources and acknowledgments
1. Introduction
xDBML is born to solve the same frustrations DBML was designed for (difficulty building a mental "big picture" of database structure, opaque field meanings, ER diagrams and DDL that are hard to read and usually outdated), plus additional pains DBML cannot solve: AI-readiness metadata, LLM-portable schemas, nested structures and polymorphism, schema drift across polyglot stacks, and property-bearing graph edges. For the full positioning relative to DBML, see the FAQ.
1.1 Scope
xDBML describes the structural and semantic layer of data: entities, fields, types, relationships, classifications, validation rules, and AI-readiness metadata -- the shape and meaning of data.
xDBML is the format for humans and AI to exchange schemas with an xDBML tool. It is not the round-trip format between a data modeling tool and a target technology. That tool-to-target round-trip happens in native DDL or schema -- the tool understands the target's complete capability surface (partitioning, sharding, tablespaces, replication, PL/SQL, triggers, identity columns, advanced constraints, refresh schedules) and preserves it in the tool's own canonical model. xDBML carries the parts of that model with meaning across boundaries.
In scope:
- Entity, table, collection, record, and similar structural constructs (§7)
- Fields with their declarative type, default, nullability, and validation constraints (§8)
- Named reusable types and composition (§14)
- Relationships and cardinality, including cross-engine references (§11)
- Nested structures: objects, arrays, maps, sets, tuples, and polymorphism (§20, §21)
- Property-bearing graph edges (§12)
- Views as declarative shape + opaque source query (§13)
- Indexes in declarative form (§9)
- AI-readiness metadata: synonyms, business terms, classification tags, granularity hints (§23)
- Custom properties for organization-specific extensions
- Target name declarations and per-Container target overrides (§5)
- Notes and prose annotations at every level (§18)
Out of scope, by design:
- Engine operational features. Partitioning strategies, sharding configuration, tablespaces, storage models, replication topology, materialized view refresh schedules, clustering keys, time-travel configuration. These stay native to each target.
- Procedural code. PL/SQL, T-SQL, stored procedures, triggers, server-side functions, computed columns with engine-specific functions. xDBML expresses declarative shape and metadata, not behavior.
- Identity and sequencing details. xDBML can declare a field as a primary key with auto-generation; the exact sequence configuration is engine-specific.
- Wire-protocol and evolution rules. Avro schema evolution rules, Protobuf reserved fields, GraphQL federation directives, OpenAPI endpoint operations.
- Query languages. SQL, Cypher, MQL, GraphQL queries. xDBML generates schemas; consumers write queries.
- Computational meaning. Measures, metrics, aggregations belong in OSI, dbt MetricFlow, LookML.
- Contractual meaning. Data quality rules, SLAs, ownership, pricing belong in ODCS.
- Inferential reasoning. OWL and knowledge graph standards operate at a different layer.
- Provisioning and operations. Terraform, operators, security policies handle the infrastructure layer.
Round-trip is lossy by design with respect to target-specific features. An xDBML schema lifted from Oracle DDL preserves the logical content (tables, columns, types, declarative constraints, relationships, indexes in declarative form). It does not preserve operational specifics (partitions, storage configuration, triggers, PL/SQL) -- those are deliberately outside xDBML's scope and remain in the data modeling tool's native representation.
The language is also:
- Polyglot by default -- no construct privileges relational databases over document, columnar, graph, or exchange-format targets.
- Closed-world -- a field not declared in an entity cannot exist in valid data.
- Human-authorable, machine-readable, and AI-generable in equal measure.
xDBML is a format, not a data modeling tool. The playground at xdbml.org demonstrates the language and works well for learning, prototyping, and small schemas, but sustained enterprise data modeling -- live-database reverse-engineering, target-native DDL generation across many engines, schema diffing and impact analysis, lineage and governance integration, and multi-user collaboration -- requires a purpose-built data modeling tool like ER/Studio, Erwin Data Modeler, or Hackolade. xDBML is designed as the textual exchange format and AI interaction surface that those tools can read, write, and round-trip with: complementing them, not replacing them.
1.2 Design principles
The language follows seven principles that govern every construct:
- Strict superset of DBML. Every valid DBML 3.13.6 document is a valid xDBML document with identical semantics.
- Polyglot by default. No construct privileges relational databases over document, columnar, graph, or other paradigms.
- Vocabulary-permissive at every level. Container, Entity, and type-level keywords accept target-native synonyms.
- Type-name pass-through. Scalar type names are accepted as-is and lowered by target-specific generators. BSON-specific types (
objectId,Decimal128,BinData, etc.) are recognized scalar names; see §22. - Explicit is better than implicit, where it matters. Path syntax requires explicit array iteration in relationships; polymorphic paths require explicit alternative selectors.
- Declarative metadata is welcome; computational and contractual meaning are not. xDBML carries the declarative meaning of stored data -- names, descriptions, synonyms, glossary references, classification tags, validation constraints -- and leaves computational meaning (measures, metrics, aggregations) to semantic-layer formats and contractual concerns (SLAs, ownership, pricing) to data-contract formats like ODCS.
- Versioned evolution. Every xDBML document declares the spec version it conforms to.
1.3 A first look
A complete xDBML document spanning Oracle (RDBMS), MongoDB (document), Avro (events), and an LPG graph model:
xdbml: 0.1
Project ecommerce {
targets: [Oracle, MongoDB, Neo4j]
Note: 'Reference example for the xDBML v0.1 specification'
}
Type Address {
street varchar [not null]
city varchar [not null]
zip varchar
country varchar [default: 'US']
}
Type MonetaryAmount {
amount Decimal128 [not null]
currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3,
note: 'ISO 4217 currency code']
}
Container core [type: schema, target: Oracle] {
Table customers {
id int [pk]
email varchar [unique, not null, pattern: '^[^@]+@[^@]+$']
display_name varchar [not null]
primary_address Address
created_at timestamp [granularity: second, default: `SYSTIMESTAMP`]
}
}
Container orders_store [type: database, target: MongoDB] {
Collection orders {
_id objectId [pk]
customer_id int32 [not null]
placed_at Date [granularity: second]
total MonetaryAmount
line_items array [
line_item object {
sku string [not null]
quantity int32 [not null, minimum: 1]
unit_price MonetaryAmount
}
]
payment_method oneOf {
card object { last4 string [maxLength: 4], brand string }
bank object { iban string }
wallet object { provider string, account string }
} [discriminator: method_kind]
}
}
Container catalog [type: schema, target: Oracle] {
Table products {
sku varchar [pk]
name varchar [not null, synonyms: ['product name', 'item name']]
price MonetaryAmount
metadata json {
weight_grams int
dimensions object { length int, width int, height int }
tags array [varchar]
}
}
View top_sellers [materialized: true, refresh_schedule: 'daily'] {
source_query: '''
SELECT p.sku, p.name, COUNT(*) AS order_count
FROM products p
JOIN orders_store.orders o ON o.line_items.sku = p.sku
GROUP BY p.sku, p.name
ORDER BY order_count DESC
FETCH FIRST 100 ROWS ONLY
'''
sku varchar [pk]
name varchar
order_count int
}
}
Container social [type: database, target: Neo4j] {
Edge FOLLOWS [source: core.customers, target: core.customers,
source_cardinality: '0..*', target_cardinality: '0..*'] {
since date [not null]
is_close boolean [default: false]
}
}
Ref: orders_store.orders.customer_id > core.customers.id [source: '1..*', target: '1..1']
Ref: orders_store.orders.line_items.[*].sku > catalog.products.skuThis single document expresses an Oracle relational schema, a MongoDB document collection with nested arrays and polymorphism, a catalog with both an entity and a materialized view, an LPG social-graph keyspace with a property-bearing FOLLOWS edge, two reusable named types, and cross-container relationships including one that traverses an array. It would round-trip to Oracle DDL, MongoDB collection validators, JSON Schema, Avro, Neo4j Cypher schema, and the ODCS schema section.
2. Document structure
An xDBML document consists of, in order:
- A version declaration (§4), optionally followed by an
experimental:opt-in clause. - Zero or one
Projectdeclaration (§5). - Any number of top-level declarations in any order:
Container(§6),Entityat project level (§7),Edge(§12),View(§13),Type(§14),Enum(§15),Ref(§11),TablePartial,TableGroup(§16),DiagramView(§17),Note(§18).
Comments may appear anywhere whitespace is permitted.
3. Lexical conventions
3.1 Whitespace and comments
Whitespace is insignificant except as a token separator.
// Single-line comment, runs to end of line
/* Block comment;
may span lines */3.2 Identifiers
A bare identifier matches [A-Za-z_][A-Za-z0-9_]*. Quoted identifiers are written in double quotes and may contain any character: "my-table", "first name".
3.3 String literals
Single-quoted ('a string') with \' and \\ escapes. Triple-quoted multi-line ('''line one\nline two''') normalizes indentation.
3.4 Number literals
Conventional syntax: 42, 3.14, -100, 1.5e10.
3.5 Expression literals
Backtick-quoted strings pass through as engine-native expressions: `SYSTIMESTAMP`, `uuid_generate_v4()`.
3.6 Block delimiters
Curly braces { ... } group block bodies. Square brackets [ ... ] carry settings and array type expressions.
3.7 Setting separator (the colon)
xDBML uses a colon : to separate a setting name from its value. The colon is required wherever a setting has both a name and a value:
Project ecommerce {
targets: Oracle // setting name "targets", value Oracle
Note: 'reference example' // setting name "Note", value "reference example"
}
Container core [type: schema] { // setting name "type", value schema
Table customers {
Note: 'customer master' // setting name "Note", value "customer master"
id int [pk]
email varchar [unique, not null, default: 'unknown'] // setting name "default", value "unknown"
}
}The colon is not used for:
- Construct keywords:
Project ecommerce {,Container core [...] {,Table customers {,Type Address {. The construct keyword and the declared name are juxtaposed without a colon. - Field type declarations:
id int,name varchar. The field name and type are juxtaposed. - Boolean-style flags inside bracket lists:
[pk],[not null],[unique],[increment]. These are markers, not key-value pairs.
One exception: the Ref: statement keyword includes the colon as part of its syntax (Ref: source > target). This is a DBML inheritance preserved for compatibility, even though Ref is a top-level declaration rather than a setting.
3.8 Case sensitivity
xDBML follows a two-tier case-sensitivity policy that balances tolerance (accept variant casings users naturally type) with precision (preserve identifier casing that target databases require).
Keywords are case-insensitive. All language keywords -- construct names (Project, Container, Table, Entity, Type, Edge, View, Note), settings (targets, target, default, pattern, not null, pk), reserved type-related words (int, varchar, objectId, Decimal128), and structural words (oneOf, anyOf, array, object) -- match any casing. The parser accepts note, Note, NOTE, and nOtE as the same keyword.
Identifiers are case-sensitive. Names you give to entities, fields, containers, named types, edges, views, and enums preserve user-supplied case. customer_email and Customer_Email are different identifiers. This is essential because target databases have different casing conventions:
- MongoDB: typically camelCase (
orderId,customerEmail) - Cassandra: lower_snake_case strictly enforced
- Oracle: UPPER_SNAKE_CASE by default
- PostgreSQL: lower_snake_case by convention
A roundtrip from one of these targets to another may require explicit identifier renaming; xDBML does not silently re-case identifiers.
Target name values are case-insensitive and alias-normalized. Values declared in targets: (Project) and target: (Container) settings are matched case-insensitively and resolved through an alias table (§5.1). So targets: postgres, targets: Postgres, targets: PostgreSQL, and targets: POSTGRESQL all normalize to the canonical PostgreSQL.
Documentation convention. This specification and all official examples use a consistent stylistic convention even though the parser is tolerant:
- Construct keywords are capital-initial:
Project,Container,Table,Entity,Type,Edge,View - Block-body settings starting with capital are major annotations:
Note:at Project/Container/Table/Entity level - Bracket-list settings are lowercase:
[pk],[not null],[default: 'x'],[note: 'inline note'] - Type and value keywords are lowercase:
int,varchar,string,array,object,oneOf
This pattern is inherited from DBML and recommended for new schemas. The parser accepts variant casings but emitting tools, formatters, and the spec itself follow this convention.
4. Version declaration
xdbml: 0.1SemVer-style MAJOR.MINOR[.PATCH] numbering. Appears at the top of the file, before any other construct.
4.1 Parser behavior
| Document declares | Parser supports | Behavior |
|---|---|---|
| (no declaration) | any | Treated as DBML 3.13.6 -- no xDBML extensions recognized. |
xdbml: 0.1 | 0.1 | Parsed with v0.1 feature set. |
xdbml: 0.1 | 0.2+ | Parsed successfully (MINOR-version backward compatibility). |
xdbml: 0.2 | 0.1 | Warning + refuse to parse. |
4.2 Experimental features
xdbml: 0.1
experimental: [graph_path_expressions, temporal_validity]Without the opt-in, experimental constructs are not recognized. No backward-compatibility guarantee for experimental features.
5. Project
Project ecommerce {
targets: Oracle
Note: 'Reference example'
}For polyglot schemas spanning multiple targets, targets: accepts an array:
Project polyglot_ecommerce {
targets: [Oracle, MongoDB, Avro]
Note: 'Customer master in Oracle, orders in MongoDB, events in Avro'
}At most one Project may appear. Recognized settings: targets, Note. The legacy database_type: setting is also accepted for DBML compatibility (see §5.3).
5.1 Target name vocabulary
Target names are bare identifiers when they have no spaces or special characters (Oracle, PostgreSQL, MongoDB, Snowflake, Neo4j, Avro, Cassandra, BigQuery, Databricks). They must be quoted when they contain spaces or punctuation ('SQL Server', 'Cosmos DB', 'JSON Schema').
Target name values are matched case-insensitively and resolved through the alias table below to a canonical form. The canonical form is what appears in the normalized AST and what generators key off of.
| Canonical name | Recognized aliases |
|---|---|
Oracle | -- |
PostgreSQL | Postgres, pg |
'SQL Server' | MSSQL, 'Microsoft SQL Server', 'T-SQL' |
MySQL | -- |
MariaDB | -- |
SQLite | -- |
Snowflake | -- |
Databricks | -- |
BigQuery | 'Google BigQuery' |
Redshift | 'Amazon Redshift' |
'Synapse Analytics' | 'Azure Synapse' |
TimescaleDB | -- |
ClickHouse | -- |
MongoDB | Mongo |
DocumentDB | 'AWS DocumentDB' |
'Cosmos DB' | Cosmos, 'Azure Cosmos DB' |
Couchbase | -- |
RavenDB | -- |
DynamoDB | -- |
Cassandra | 'Apache Cassandra' |
ScyllaDB | -- |
Neo4j | -- |
Memgraph | -- |
Neptune | 'Amazon Neptune' |
JanusGraph | -- |
Avro | 'Apache Avro' |
Parquet | 'Apache Parquet' |
Protobuf | 'Protocol Buffers', Proto |
'JSON Schema' | -- |
OpenAPI | Swagger |
GraphQL | -- |
Elasticsearch | -- |
OpenSearch | -- |
The list is non-exhaustive. Target name values outside the table are accepted by the parser but may produce generic output from target-aware generators. The official list grows over time as the ecosystem adds first-class support for new targets.
5.2 Container target resolution
Each Container has an effective target -- the storage technology, exchange format, or API platform it lowers to. The effective target is resolved as follows:
- If the Container declares
target:explicitly, that value is the effective target. - Otherwise, if the Project declares exactly one entry in
targets:, the Container inherits that single target. - Otherwise (Project declares multiple targets but Container declares none), the Container's effective target is undefined and the parser MUST emit an error.
- If the Container declares a
target:value that does not appear in the Project'stargets:list, the parser MUST emit an error.
5.3 DBML compatibility: the database_type: alias
xDBML is a strict superset of DBML 3.13.6, which uses database_type: to declare a single target engine at the Project level. xDBML preserves this keyword as a legacy alias for single-target schemas:
// DBML-compatible (legacy)
Project simple {
database_type: 'PostgreSQL'
}
// xDBML preferred equivalent
Project simple {
targets: PostgreSQL
}Compatibility rules:
database_type:accepts only a single string or bare identifier value. Array values are not permitted (DBML never supported them).- A Project that declares
database_type: Xis semantically equivalent to a Project that declarestargets: X. Parsers MUST normalize the AST to usetargets:with a single-element list. - A Project MUST NOT declare both
database_type:andtargets:. Doing so produces a parser error (ambiguous intent). - New schemas SHOULD prefer
targets:overdatabase_type:. The legacy form remains supported for round-trip fidelity with DBML tooling and for documents authored before xDBML v0.1. - The
database_type:keyword has no Container-level equivalent. Container-level target declarations always usetarget:.
// Single-target project; Container target is implicit
Project simple {
targets: PostgreSQL
}
Container app [type: schema] { // implicit target: PostgreSQL
Table users { ... }
}
// Polyglot project; each Container must declare its target
Project polyglot {
targets: [Oracle, MongoDB]
}
Container core [type: schema, target: Oracle] { // explicit
Table customers { ... }
}
Container orders_store [type: database, target: MongoDB] { // explicit
Collection orders { ... }
}6. Container
The namespace-level construct between Project and Entity.
6.1 Declaration
Container core [type: schema] {
Note: 'Core domain entities'
Entity customers { ... }
Entity accounts { ... }
}6.2 Keyword synonyms
| Keyword | Native to |
|---|---|
Container | xDBML canonical |
Schema | Oracle, PostgreSQL, SQL Server, Snowflake |
Database | MongoDB, Cosmos DB |
Keyspace | Cassandra, ScyllaDB |
Namespace | Avro, ORC, GraphQL |
Dataset | BigQuery |
Bucket | Couchbase |
6.3 The type: setting
Names the target-native flavor (schema, database, keyspace, namespace, dataset, bucket, index).
6.4 Container settings
Container core [type: schema, target: Oracle] { ... }
Container kv_store [type: keyspace,
target: Cassandra,
replication: '{class: NetworkTopologyStrategy, dc1: 3, dc2: 2}',
durable_writes: true] { ... }The target: setting declares the specific target technology for the Container. It is optional in single-target projects (where it is inherited from Project.targets) and required in polyglot projects with more than one declared target. See §5.2 for the resolution rule.
Other Container settings are open-vocabulary key-value pairs. Generators consume what they understand.
6.5 Optional containers
Entities are not required to belong to a container. An entity belongs to at most one container.
6.6 Cross-container references
Relationships, edges, and inline references use dotted notation with explicit container prefix:
Container core [type: schema] {
Table customers { id int [pk] }
}
Container sales [type: schema] {
Table orders {
id int [pk]
customer_id int [ref: > core.customers.id]
}
}
Ref: sales.orders.customer_id > core.customers.idThe dotted path is container_name.entity_name.field_path.
7. Entity
A named field-bearing construct.
7.1 Declaration
Table customers {
id int [pk]
email varchar [unique, not null]
name varchar
}7.2 Keyword synonyms
| Keyword | Native to |
|---|---|
Entity | xDBML canonical |
Table | Oracle, RDBMS, Cassandra, Hive, BigQuery, Databricks, Snowflake,... |
Collection | MongoDB, Cosmos DB, Couchbase |
Record | Avro, Parquet, GraphQL |
7.3 Schema-qualified declaration
Table core.users { ... }Produces an Entity named users inside an implicit container core. Explicit Container { Entity ... } blocks are equivalent and preferred for new xDBML.
7.4 Aliases
Entity very_long_user_table as U { ... }
Ref: U.id < posts.user_id7.5 Entity settings
| Setting | Meaning |
|---|---|
note | Documentation |
headercolor | Visualization color |
synonyms | Alternative names (§23.1) |
business_term | Glossary reference (§23.2) |
8. Field
field_name field_type [field_settings]8.1 Field types
- Scalar type name:
int,varchar,decimal(19,4),timestamp, etc. - BSON scalar type (§22):
string,int32,int64/long,double,bool,null,objectId,Decimal128,BinData,Date,Timestamp,Regex,JavaScript,MinKey,MaxKey,Symbol,DBPointer. - Complex type expression (§8.2):
object,array,map,set. - Polymorphism construct (§20):
union,oneOf,anyOf,allOf. - Named type reference (§14): bare name of a previously declared
Type. - JSON-with-schema (§21):
json,jsonb,variant.
8.2 Complex type keywords
| Concept | Primary | Aliases |
|---|---|---|
| Structured value | object | struct, record |
| Ordered list | array | list |
| Heterogeneous tuple | array (with positional indices) | list |
| Unique collection | set | -- |
| Key-value collection | map | dict, dictionary |
8.3 Object type
address object {
street string [not null]
city string [not null]
zip string
country varchar [default: 'US']
} [not null, note: 'Primary mailing address']8.4 Array of scalars
tags array [varchar]
scores array [int [not null]]8.5 Array of named objects
line_items array [
line_item object {
sku varchar [not null]
quantity int [not null, default: 1]
unit_price decimal(10,2) [not null]
}
] [not null]8.6 Heterogeneous tuple
addresses array [
[0] billing object { street string, city string, zip string }
[1] shipping object { street string, city string, zip string }
]Positions are zero-indexed and must form a contiguous range starting at 0.
8.7 Maps and sets
flags map [string, boolean]
permissions set [varchar]8.8 Common field settings
| Setting | Behavior |
|---|---|
pk / primary key | Primary key |
null / not null / required | Cardinality |
unique | Single-field unique constraint |
default: value | Default value |
increment | Auto-increment |
note: 'string' | Documentation |
ref: <op> target | Inline foreign key (§11) |
check: expr | Field-level check constraint |
| Validation constraints | §24 |
| AI-readiness | §23 |
required is accepted as a synonym for not null and is provided for authors familiar with JSON Schema, OpenAPI, and Avro vocabularies where required is the conventional keyword. Parsers MUST normalize required to not null in the AST. Tools that emit xDBML SHOULD use not null as the canonical form. Both forms are semantically identical: the field must be present and must have a non-null value.
8.9 Cardinality vs. type-level nullability
null / not null settings express cardinality -- whether a field is required to be present and whether null is acceptable. The token null inside a union [ ... ] list (§20.1) expresses type membership -- that the null literal is one of the permitted runtime types. These are distinct concepts.
9. Index
Table bookings {
id int
country varchar
booking_date date
created_at timestamp
indexes {
(id, country) [pk]
created_at [name: 'created_at_index']
booking_date
(country, booking_date) [unique]
booking_date [type: hash]
(`id*2`)
}
}9.1 Index forms
| Form | Example |
|---|---|
| Single field | created_at [name: 'idx_created'] |
| Composite | (col1, col2) |
| Expression | (`expr`) |
9.2 Index settings
| Setting | Values |
|---|---|
type | btree, hash, engine-specific |
name | string |
unique | flag |
pk | flag |
note | string |
9.3 Indexes on nested fields
Index paths use the path syntax of §19:
indexes {
shipping.address.country // object navigation
line_items.sku // implicit array iteration
line_items.[*].sku // explicit array iteration
addresses.[0].city // tuple positional
flags.["dark_mode"] // map literal key
method.card.brand // polymorphic alternative
}For homogeneous arrays, the implicit iteration form is equivalent to the explicit form. For heterogeneous tuples and Ref source paths, explicit syntax is required.
10. Checks -- entity-level constraints
A checks { } block declares multi-column constraint expressions on an entity. It is the entity-level counterpart to the field-level check: setting (§8.8): where the field-level form expresses a constraint on a single field, the entity-level block expresses constraints that span two or more fields.
The block is a peer of the indexes { } block (§9) -- both are entity-level declarations that constrain or annotate the entity's data shape.
10.1 Declaration
Entity users {
id int [pk]
wealth decimal(15,2)
debt decimal(15,2)
checks {
`debt + wealth >= 0` [name: 'chk_positive_net_worth']
`wealth >= 0` [name: 'chk_non_negative_wealth']
`debt >= 0`
}
}Each line inside checks { } is a check expression -- a backtick-wrapped boolean expression in the target engine's expression language. xDBML treats expressions as opaque strings; the parser does not interpret or validate them. Generators emit them verbatim as the target engine's CHECK constraint or equivalent.
10.2 Check settings
Each check expression accepts optional bracket settings:
| Setting | Value | Purpose |
|---|---|---|
name | string | Name of the constraint as it appears in the generated DDL (CONSTRAINT chk_name CHECK (...)). If omitted, the generator chooses a name. |
note | string | Free-text annotation. Carries through to generators that support constraint comments. |
When name: is omitted, generators MAY produce a deterministic synthetic name (e.g., <entity>_chk_<index>) for reproducible DDL.
10.3 Check expressions and the target engine
Check expressions are not portable across targets in general. An expression valid in PostgreSQL may not parse in MongoDB, and a MongoDB document-level validator uses an entirely different expression syntax.
xDBML preserves the expression as written. Round-trip behavior:
- Single-target schemas: the expression is emitted verbatim to the target.
- Polyglot schemas: the expression is emitted to targets where it is syntactically valid; for other targets, generators MAY emit a comment placeholder or omit the constraint with a warning.
For maximum portability, authors are encouraged to use expressions that work across the target families they care about, or to use field-level check: settings (§8.8) which are more often portable.
10.4 Relationship to field-level check:
The field-level check: setting (§8.8) is preferred for single-column constraints because it sits next to the field it constrains:
Entity products {
price decimal(10,2) [check: `price > 0`]
}The entity-level checks { } block is for constraints that cannot be expressed at field level -- typically because they reference two or more fields:
Entity reservations {
start_date date
end_date date
checks {
`start_date <= end_date` [name: 'chk_valid_date_range']
`end_date - start_date <= 30` [name: 'chk_max_30_days']
}
}Both forms generate CHECK constraints in SQL targets. Field-level constraints are scoped to a single column in the generated DDL; entity-level constraints reference multiple columns by name.
10.5 Relationship to validation constraints (§24)
xDBML provides two complementary constraint mechanisms:
Validation constraints (§24) -- declarative shape constraints (
pattern,minLength,maxLength, range, enum membership). Portable across targets. Lower naturally to JSON Schema, OpenAPI, Avro validators, and SQLCHECKfor the constraints SQL can express.Check expressions (this section, plus §8.8 for field-level) -- arbitrary boolean expressions in the target engine's language. Maximum expressiveness, but coupled to the target.
Use validation constraints when the shape constraint can be expressed declaratively (email matches pattern X, age between 0 and 150). Use check expressions when you need cross-column logic or engine-specific predicates (CASE WHEN status = 'shipped' THEN tracking_number IS NOT NULL END).
11. Relationship and cardinality
11.1 Cardinality operators
| Operator | Maximum-cardinality |
|---|---|
< | one-to-many |
> | many-to-one |
- | one-to-one |
<> | many-to-many |
11.2 Declaration forms
// Long form
Ref name_optional {
schema1.entity1.field1 < schema2.entity2.field2
}
// Short form
Ref name_optional: schema1.entity1.field1 < schema2.entity2.field2
// Inline form
Entity schema2.entity2 {
id integer
field2 integer [ref: > schema1.entity1.field1]
}Relationships are declared at project level. They may span containers.
11.3 Cross-container references
Ref: sales.orders.customer_id > core.customers.idDotted notation: container_name.entity_name.field_path. For relationships into nested fields, the path continues into the nested structure.
11.4 Composite foreign keys
Ref: merchant_periods.(merchant_id, country_code) > merchants.(id, country_code)11.5 Relationships into nested fields
Ref: orders.shipping.address.country > countries.iso_code
Ref: customers.addresses.[0].zip > postal_codes.code11.6 Explicit .[*] required for array crossings
When a Ref source path crosses a homogeneous array, the explicit .[*] form is required:
Ref: orders.line_items.[*].sku > products.sku // valid
Ref: orders.line_items.sku > products.sku // rejected11.7 Explicit cardinality
The operators express maximum cardinality with optionality inferred from FK nullability. For precise cardinality on both ends, the source: and target: settings carry UML-style 'min..max' strings:
// Compact form: operators imply defaults
Ref: orders.customer_id > customers.id
// Explicit form: precise cardinality on both ends
Ref: orders.customer_id > customers.id [source: '1..*', target: '1..1']
// Optional parent (each Pet may have no Owner)
Ref: pets.owner_id > people.id [source: '0..*', target: '0..1']Recognized cardinality strings:
| String | Meaning |
|---|---|
'1..1' | Exactly one |
'0..1' | Zero or one |
'1..*' | One or more |
'0..*' | Zero or more |
'N..M' | N to M (non-negative integers; * allowed for M) |
Alternative four-key form:
Ref: pets.owner_id > people.id [min_source: 0, max_source: '*',
min_target: 0, max_target: 1]Both forms produce the same AST. The compact-string form is canonical.
11.8 Default cardinality inference
| Operator | FK nullable? | Source | Target |
|---|---|---|---|
> | no | '1..*' | '1..1' |
> | yes | '0..*' | '1..1' |
< | no | '1..1' | '1..*' |
< | yes | '1..1' | '0..*' |
- | no | '1..1' | '1..1' |
- | yes | '1..1' | '0..1' |
<> | (n/a) | '0..*' | '0..*' |
11.9 Relationship settings
Ref: products.merchant_id > merchants.id [delete: cascade, update: no action]
Ref: legacy.user_id > users.id [inactive, color: '#999999', note: 'historical FK; superseded by audit_log.user_ref']| Setting | Values |
|---|---|
delete / update | cascade, restrict, set null, set default, no action |
color | #rgb or #rrggbb |
source / target | Cardinality strings (§11.7) |
min_source / max_source / min_target / max_target | Cardinality components |
inactive | Flag (no value). Marks the relationship as inactive; visualization tools render it as a dotted line and may exclude it from cardinality computation. The relationship still exists structurally; this is a visualization and documentation hint, not a removal. |
note | Free-text annotation. |
The inactive flag is useful for documenting historical or deprecated foreign keys that remain in the schema but are no longer enforced or relied upon by application code. It has no effect on DDL generation in targets that don't distinguish active from inactive relationships.
Settings are not supported on inline ref: declarations.
12. Edge -- relationships with properties
Labeled Property Graph (LPG) databases -- Neo4j, Memgraph, JanusGraph, Amazon Neptune (LPG mode) -- and RDF-star treat relationships as first-class entities that carry properties.
12.1 Declaration
Edge KNOWS [source: Person, target: Person] {
since date [not null]
intimacy int [minimum: 0, maximum: 10]
context varchar
}
Edge RATED [source: User, target: Movie] {
rating int [not null, minimum: 1, maximum: 5]
rated_at timestamp [granularity: second, not null]
review varchar
}An Edge declaration carries a name, source and target settings pointing to connected entities, and a body of fields.
12.2 Cardinality on edges
Edges use source_cardinality and target_cardinality settings (distinct from source: and target: which name the connected entities):
Edge OWNS [source: Person, target: Pet,
source_cardinality: '0..*', target_cardinality: '0..1'] {
acquired_date date
purchase_price decimal(10,2)
}12.3 Directionality
Edge FRIENDS_WITH [source: Person, target: Person, undirected: true] {
since date
}When undirected: true, source and target are interchangeable.
12.4 Multiple edges between the same entities
Edge LIKES [source: User, target: Post] { liked_at timestamp }
Edge SHARED [source: User, target: Post] { shared_at timestamp, audience varchar }
Edge BLOCKED [source: User, target: Post] { blocked_at timestamp, reason varchar }12.5 Edges with named types and partials
Type RelationshipMetadata {
created_at timestamp [granularity: second]
created_by varchar
source_system varchar
}
Edge KNOWS [source: Person, target: Person] {
~RelationshipMetadata
since date
intimacy int
}12.6 Indexes on edges
Edge RATED [source: User, target: Movie] {
rating int [not null]
rated_at timestamp
indexes {
rated_at
(rated_at, rating)
}
}12.7 Round-trip across targets
| Target | Edge representation |
|---|---|
| Neo4j / LPG | Native relationship type with properties |
| RDF-star | Quoted triple with annotation properties |
| RDBMS | Junction table with FKs plus property columns |
| MongoDB | Array of subdocuments on source-side entity |
| Avro | Edge events with source_id, target_id, property fields |
| RDF (plain) | Reified statements (lossy) |
13. View -- derived shapes with source queries
13.1 Declaration
View active_customers [materialized: false] {
source_query: '''
SELECT id, email, name, created_at
FROM customers
WHERE deleted_at IS NULL
'''
id int [pk]
email varchar
name varchar
created_at timestamp [granularity: second]
}
View monthly_revenue [materialized: true,
refresh_schedule: 'daily',
source_database: 'Oracle'] {
source_query: '''
SELECT TRUNC(placed_at, 'MM') AS month, SUM(total) AS revenue
FROM orders
GROUP BY TRUNC(placed_at, 'MM')
'''
month date [pk]
revenue decimal(15,2) [granularity: month]
}13.2 Materialized vs. virtual
materialized: false(default) -- virtual view; query executes on access.materialized: true-- materialized view; query precomputed and stored.
| Setting | Meaning |
|---|---|
refresh_schedule | Refresh schedule (hourly, daily, weekly, engine-specific) |
refresh_on | Source entities triggering refresh |
storage_options | Engine-specific storage hints |
13.3 Source query as opaque metadata
xDBML does not parse the source_query string -- it's target-specific. The query is captured verbatim and passed to generators. xDBML can express views for non-SQL query languages (Cypher, MongoDB aggregation pipelines, KSQL) by storing the engine-native query and tagging with source_database.
13.4 Views in containers
Container analytics [type: schema] {
Entity orders { ... }
View revenue_by_month [materialized: true] {
source_query: '...'
month date
revenue decimal(15,2)
}
}Cross-container references treat views and entities identically.
13.5 View settings
| Setting | Meaning |
|---|---|
source_query | Defining query (opaque string) |
materialized | true or false (default false) |
refresh_schedule | Refresh schedule |
refresh_on | Source entities |
source_database | Database flavor of source query |
note | Documentation |
synonyms, business_term | AI-readiness (§23) |
13.6 Round-trip across targets
| Target | View representation |
|---|---|
| Oracle / PostgreSQL / SQL Server / Snowflake | CREATE VIEW or CREATE MATERIALIZED VIEW |
| MongoDB | db.createView() for virtual; $merge for materialized |
| BigQuery | CREATE VIEW / CREATE MATERIALIZED VIEW |
| Databricks | CREATE OR REPLACE VIEW / CREATE MATERIALIZED VIEW |
| Elasticsearch | Continuous transforms (materialized) |
14. Named type
A Type declaration defines a named, reusable type expression.
14.1 Declaration
Type Address {
street varchar [not null]
city varchar [not null]
zip varchar
country varchar [default: 'US']
}
Type MonetaryAmount {
amount Decimal128 [not null]
currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3]
}14.2 Reference
Table orders {
shipping_address Address
billing_address Address
total MonetaryAmount [not null]
}
Entity customers {
addresses array [address Address]
}Built-in type keywords always take precedence; named types cannot shadow them.
14.3 Composition and recursion
Type ContactInfo {
email varchar
phone varchar
address Address
}
Type TreeNode {
value int
children array [child TreeNode] // self-reference
}14.4 Type settings
Type Address [note: 'Used across customer and order entities'] {
...
}Recognized: note, synonyms, business_term.
14.5 Scope
Project-scoped. Name collisions across Type, Entity, View, Edge, Enum, TablePartial are rejected at parse time.
14.6 Relationship to TablePartial
Type when... | TablePartial when... |
|---|---|
| Reusing a value shape | Reusing a set of entity fields |
| Reference from nested structures | Splice fields at entity top level |
| Cross-references into type's fields | Fields appear as direct columns |
| One canonical definition for many | Each entity gets own copy of fields |
14.7 Scalar named types (new in v0.2)
The Type declarations in §14.1 through §14.6 describe object-shaped reusable structures -- a Type body contains one or more named inner fields. v0.2 extends Named Types to also support scalar shapes -- a Type whose entire content is a single field's worth of declaration.
Syntax:
Type Email varchar [pattern: '^[^@]+@[^@]+$', tags: ['pii'], note: 'Standard enterprise email']
Type CountryCode varchar [pattern: '^[A-Z]{2}$', minLength: 2, maxLength: 2]
Type Percentage decimal(5,2) [minimum: 0, maximum: 100]The Type's base is a built-in type (or another Named Type), and the bracketed settings carry the full field-level surface (validation constraints, notes, AI-readiness metadata, custom properties).
Usage: scalar named types are used identically to object-shaped Types -- as the type of a field declaration:
Type Email varchar [pattern: '^[^@]+@[^@]+$', tags: ['pii']]
Entity users { id int [pk], email Email }
Entity admins { id int [pk], email Email }
Entity contacts { id int [pk], email Email }The three entities each get an email field of type varchar with the pattern validation and tags attached.
Importability: scalar Named Types are importable via the type element type in the module system (§26):
reuse { type Email } from './enterprise-types'
Entity employees {
work_email Email
}Relationship to field-level imports: scalar Named Types and field-level imports (§26.9) both follow the declaration-vs-placement pattern: the construct declares a reusable shape; placement happens by using the shape's name as a field's type. The difference is in where the reusable shape originates: scalar Named Types are explicitly declared as Type; field-level imports clone a shape from an existing entity field. Both mechanisms are valid; authors choose based on context.
15. Enum
enum job_status {
created [note: 'Waiting to be processed']
running
done
failure
}
enum grade {
"A+"
"A"
"A-"
}Values with non-identifier characters must be quoted. Enums may be container-qualified.
16. TablePartial and TableGroup
16.1 TablePartial
TablePartial base_template {
id int [pk, not null]
created_at timestamp [default: `SYSTIMESTAMP`]
updated_at timestamp [default: `SYSTIMESTAMP`]
}
Entity users {
~base_template
name varchar
}Conflict resolution: local entity definitions override partials; among partials, last-injected wins. TablePartials may also be injected into Edge declarations.
16.2 TableGroup
TableGroup e_commerce [color: '#3498DB', note: 'Commerce entities'] {
orders
products
inventory
}TableGroup settings:
| Setting | Values |
|---|---|
color | #rgb or #rrggbb. Visualization color for the group's frame in diagram renderers. |
note | Free-text annotation. |
The color setting controls how renderers visually frame the group's member entities. Implementations that don't support group colors ignore the setting.
16.3 Naming
TablePartial and TableGroup retain their DBML names without Entity* synonyms.
17. Diagram View
DiagramView full_view {
Tables { * }
Notes { * }
TableGroups { * }
Containers { * }
Views { * }
Edges { * }
}
DiagramView sales_view {
Tables { users; orders; products }
}Categories: Tables, Notes, TableGroups, Containers, Views, Edges.
18. Note
Inline notes are settings or blocks on Project, Container, Entity, Field, Index, Edge, View, Type, TableGroup. Standalone notes act as canvas annotations:
Note design_decision {
'''
We chose to denormalize line items into the orders entity
rather than maintaining a separate line_items entity because
read patterns dominate writes by 100:1.
'''
}19. Path syntax
19.1 Segment forms
| Segment | Meaning |
|---|---|
.field_name | Navigate into named field |
.[N] | Specific positional array element (zero-indexed) |
.[*] | All array elements (iteration) |
.["literal_key"] | Map literal key |
.[*] (map context) | All map values |
."quoted name" | Non-identifier name |
.AlternativeName | Polymorphic alternative selector |
19.2 Path examples
customers.addresses.[0].city // tuple positional
orders.line_items.[*].sku // every sku across line items
events.[42].payload.user_id // 43rd event's payload
flags.["dark_mode"] // specific map key
data."user.id" // quoted field name
payments.method.card.brand // polymorphic alternative
sales.orders.line_items.[*].sku // cross-container with array crossing19.3 Implicit array iteration in indexes
Allowed only in index paths. Ref paths and tuple positions require explicit syntax.
19.4 JSONPath alias acceptance
JSONPath-style brackets (no leading dot) are accepted at parse time and normalized to the dot-prefixed canonical form:
addresses[0].city // parsed; normalized to addresses.[0].city19.5 Formatter behavior
Explicit .[*] whenever path depth exceeds two segments; implicit auto-iteration otherwise. Tuple positions and quoted segments always emitted explicitly.
20. Polymorphism
20.1 Scalar type union -- union
Entity records {
score union [int, decimal, null]
legacy_id union [string, int] [not null]
tags array [union [string, int]]
}Element ordering is significant. The runtime type acts as the discriminator.
20.2 Structural polymorphism -- oneOf, anyOf, allOf
Entity payments {
amount decimal(10,2) [not null]
method oneOf {
card object { last4 varchar(4), brand varchar }
bank object { iban varchar }
wallet object { provider varchar, account varchar }
} [discriminator: method_kind]
}20.3 Keywords
| Keyword | Semantics |
|---|---|
oneOf | Exactly one of N alternatives |
anyOf | One or more alternatives may apply |
allOf | Value must satisfy all listed schemas |
20.4 Paths through polymorphism -- strict alternative selector
indexes {
method.card.brand // valid
method.brand // rejected -- selector required even when unambiguous
}Paths through union (scalar) are rejected.
20.5 Polymorphism inside arrays
events array [
event oneOf {
user_event object { type varchar, user_id objectId, action varchar }
item_event object { type varchar, item_id objectId, qty int }
} [discriminator: type]
]21. JSON as a first-class data type
21.1 Opaque JSON
Entity api_logs {
request_body json [note: 'shape varies by endpoint']
response_body jsonb [note: 'large; may exceed 1 MB']
raw_data variant
}21.2 Schema-known JSON
Entity orders {
id int [pk]
payload json {
shipping_address object {
street varchar [not null]
city varchar [not null]
}
items array [
item object {
sku varchar [not null]
quantity int [not null]
}
]
}
}21.3 json vs. object
object describes an abstract structural type -- encoding chosen by target. json, jsonb, variant describe a physical storage choice (JSON-typed column).
21.4 Keyword semantics
| Keyword | Target intent |
|---|---|
json | Generic JSON storage (Oracle 21c+, MySQL, MS SQL Server 2025+, BigQuery) |
jsonb | PostgreSQL-specific binary JSON, indexable |
variant | Snowflake-specific semi-structured with auto-typing |
22. BSON scalar types
MongoDB stores documents in BSON, a JSON superset with additional scalar types. xDBML recognizes BSON type names as ordinary scalar type identifiers -- no special grammar treatment but preserved through the AST and emitted in MongoDB collection validators.
| BSON type | Typical use |
|---|---|
string | UTF-8 string (BSON element 0x02) |
int32 | 32-bit signed integer (BSON element 0x10) |
int64 | 64-bit signed integer (BSON element 0x12); long is a recognized alias |
double | 64-bit IEEE 754 floating point (BSON element 0x01) |
bool | Boolean (BSON element 0x08) |
null | Null type (BSON element 0x0A) |
objectId | 12-byte MongoDB primary key |
Decimal128 | High-precision decimal (IEEE 754-2008) |
BinData | Binary data with subtype |
Timestamp | MongoDB-internal timestamp |
Date | UTC datetime |
Regex | Compiled regular expression |
JavaScript | JavaScript code |
Symbol | Deprecated; preserved for legacy reads |
MinKey / MaxKey | Sentinel comparison values |
DBPointer | Deprecated; preserved for legacy reads |
int64 is canonical; long is accepted at parse time and normalized to int64 in the AST. MongoDB shell output uses long, while the BSON spec uses int64 -- both forms are equivalent.
22.1 Recommended pattern: BSON-native types in MongoDB Containers
When a field lives inside a Container [type: database] targeting MongoDB, use BSON-native type names (string, int32, int64, Decimal128, Date, objectId, etc.) rather than SQL conventions (varchar, int, decimal(p,s), timestamp). The rationale:
- BSON types are exactly what MongoDB's
$jsonSchemavalidators emit -- no translation layer interpreting precision/scale metadata that has no BSON meaning. Decimal128carries its IEEE 754-2008 precision intrinsically and does not accept(p,s)parameters. Writingdecimal(19,4)inside a MongoDB Collection misleadingly suggests precision metadata that BSON cannot represent.intis ambiguous between BSONint32andint64. Explicitint32orint64removes the ambiguity that cross-engineRefvalidation depends on.
Generators accept SQL-named scalars inside MongoDB Containers (§1.2 design principle 4 -- type-name pass-through) and convert them on lowering. The recommendation is for clarity and round-trip fidelity, not parser correctness.
22.2 Cross-engine references
A Ref may cross between Containers whose engines use different type-name vocabularies. The reference is valid if the source and target field types map to a compatible value space per the round-trip table below.
| Relational type | BSON equivalent |
|---|---|
int, integer | int32 (when target stores small IDs) or int64 (when target stores large IDs) |
bigint | int64 |
varchar, varchar(n), text | string (with optional maxLength) |
decimal(p,s), numeric(p,s) | Decimal128 |
boolean, bool | bool |
timestamp, timestamptz, datetime | Date |
A Ref from int32 (BSON) to int (relational) is valid by default. A Ref from int64 to a relational int issues a warning if the relational type is not bigint or wide enough to hold the BSON values.
Reusable Type declarations (§14) used in both relational and MongoDB Containers should use BSON-native types in the Type body; relational generators lower them per the round-trip table:
Type MonetaryAmount {
amount Decimal128 [not null]
currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3]
}This Type used in an Oracle Entity lowers amount to NUMBER(38,4) and currency to VARCHAR2(3) CHECK (REGEXP_LIKE(currency, '^[A-Z]{3}$')). The same Type used in a MongoDB Collection emits Decimal128 and string with maxLength: 3 and a pattern in $jsonSchema.
22.3 Round-trip to non-MongoDB targets
Container app_data [type: database] {
Collection users {
_id objectId [pk]
email string [unique, not null, maxLength: 255]
balance Decimal128
last_login Date
avatar BinData
}
}Round-trip to non-MongoDB targets:
| BSON type | Oracle | PostgreSQL | Avro | JSON Schema |
|---|---|---|---|---|
string | varchar2(n) or clob | varchar(n) or text | string | string |
int32 | number(10) | integer | int | integer |
int64 | number(19) | bigint | long | integer |
double | binary_double | double precision | double | number |
bool | number(1) check (col in (0,1)) | boolean | boolean | boolean |
null | null | null | null | null |
objectId | varchar2(24) w/ check | varchar(24) or bytea | string w/ logical type | string w/ pattern |
Decimal128 | number(38,4) | numeric(34,0) | decimal high precision | string |
BinData | blob | bytea | bytes | string w/ format: byte |
Timestamp | timestamp with time zone | timestamptz | long w/ logicalType: timestamp-millis | string w/ format: date-time |
Date | timestamp with time zone | timestamptz | as Timestamp | string w/ format: date-time |
23. AI-readiness and semantic-interop settings
Four optional settings expose declarative metadata for AI consumers, semantic-layer tools (OSI, dbt MetricFlow, Cube), and governance platforms (Atlan, Collibra, DataHub, Alation). A fifth mechanism (the x_ prefix) supports organization-specific custom properties without grammar changes.
23.1 synonyms: -- alternative names
Entity customers {
display_name varchar [synonyms: ['name', 'full name', 'customer name']]
mrr_amount decimal(10,2) [synonyms: ['monthly revenue', 'recurring revenue']]
}23.2 business_term: -- controlled-vocabulary identifier
Entity orders {
total_amount decimal(10,2) [business_term: 'Gross Order Value']
net_amount decimal(10,2) [business_term: 'https://glossary.acme.com/terms/net_order_value']
}23.3 granularity: -- temporal granularity
Entity events {
occurred_at timestamp [granularity: second]
recorded_at timestamp [granularity: microsecond]
}Values: year, quarter, month, week, day, hour, minute, second, millisecond, microsecond, nanosecond.
23.4 tags: -- free-form classification labels
A list of string labels for classification, sensitivity, lifecycle, domain, audience, governance markers, or any other discoverable property. Tags are the dominant pattern in modern data catalogs (Atlan, Collibra, DataHub, Alation, Hackolade Studio).
Entity customers {
email varchar [tags: ['pii', 'contact', 'gdpr-subject']]
ssn varchar [tags: ['pii', 'sensitive', 'gdpr-special', 'restricted']]
internal_id int [tags: ['internal-only']]
}
Entity products [tags: ['catalog', 'public-api']] { ... }
Type MonetaryAmount [tags: ['finance', 'reusable']] { ... }Tags compose freely with synonyms, business_term, and other settings. They differ from synonyms (which name the same thing in multiple ways) and from business terms (which reference an external glossary): tags are classification labels, freeform and multi-valued.
23.5 Custom properties -- the x_ prefix
Settings beyond the recognized vocabulary are accepted as open-vocabulary key-value pairs and passed through to the AST. To distinguish intentional custom properties from possible typos in reserved keywords, xDBML reserves the x_ prefix for explicit custom extensions.
Entity customers {
email varchar [
pattern: '^[^@]+@[^@]+$',
x_sensitivity: 'pii-low',
x_acme_data_steward: 'jane@acme.com',
x_retention_days: 2555
]
}
Container core [type: schema, x_provisioning_template: 'standard-tier-3'] { ... }Conventions:
- The prefix is
x_(underscore-prefix). Dash-prefix is not used because xDBML identifiers do not allow dashes. - The prefix is optional, not required. The parser also accepts non-prefixed unknown properties.
- The parser does not validate names or values of
x_*properties -- they pass through to the AST. - Tooling can distinguish
x_*properties as deliberately custom (versus possibly-misspelled reserved keywords); linters may warn on non-x_-prefixed unknown properties.
A registry of de-facto-standardized custom properties is maintained at xdbml.org/registry. Custom properties that reach widespread adoption may be promoted to first-class settings in a future MINOR version.
23.6 Scope
The four first-class settings and the x_ mechanism carry declarative metadata about stored data -- names, descriptions, classifications, glossary references, AI-readiness labels. xDBML does not carry computational meaning (measures, metrics, aggregations, derived fields, hierarchies, default-aggregation hints) -- that belongs in adjacent semantic-layer formats -- nor contractual obligations (SLAs, ownership, pricing, support hours) -- that belongs in ODCS.
24. Validation constraints
xDBML accepts the full JSON Schema validation vocabulary as field settings:
| Setting | Applies to | Meaning |
|---|---|---|
pattern | string | Regular expression |
format | string | email, uri, uuid, date-time, etc. |
minLength / maxLength | string | Length bounds |
minimum / maximum | number | Value bounds |
exclusiveMinimum / exclusiveMaximum | number | Exclusive bounds |
multipleOf | number | Multiple of N |
enum | any | Fixed list of values |
minItems / maxItems | array | Element count bounds |
uniqueItems | array | All elements distinct |
minProperties / maxProperties | object | Field count bounds |
check | any | Free-form expression |
Example:
Entity users {
email varchar [pattern: '^[^@]+@[^@]+$', maxLength: 255]
age int [minimum: 0, maximum: 150]
username varchar [pattern: '^[a-z0-9_]{3,32}$']
tags array [varchar] [minItems: 1, maxItems: 10, uniqueItems: true]
}24.1 Generator behavior
| Target | Constraint output |
|---|---|
| Oracle / PostgreSQL / SQL Server / Snowflake | CHECK for minimum, maximum, pattern; others as comments |
| MongoDB validator | Native $jsonSchema with all validations |
| JSON Schema | Direct round-trip |
| Avro | pattern → name validation; bounds → logical-type metadata |
| Cassandra | Validations without equivalent dropped with warning |
25. Records -- sample data
A records { } block declares sample data inline in the schema. Records serve documentation, testing, and example-data purposes -- they describe what real rows look like alongside the entity that defines their shape.
Records can be declared in two forms: inside an entity body (implicit column list) or at the top level referencing an entity by name (explicit column list).
25.1 Records inside an entity
When declared inside an entity body, the records { } block uses an implicit column list -- values are assigned to columns in the order the columns are declared on the entity:
Entity users {
id int [pk]
name varchar
email varchar
records {
1, 'Alice', 'alice@example.com'
2, 'Bob', 'bob@example.com'
3, 'Carol', 'carol@example.com'
}
}Each row in the block is a comma-separated list of values matching the column order. When an entity uses ~partial_name injection (§16.1), the implicit column order follows the same precedence: locally declared fields first, then partial-injected fields in injection order.
25.2 Records outside an entity (top-level form)
When declared outside an entity body, the records declaration names the entity and explicitly lists the columns being populated:
records users (id, name, email) {
1, 'Alice', 'alice@example.com'
2, 'Bob', 'bob@example.com'
}Columns not listed default to null (or the column's declared default). The order of values in each row matches the explicit column list, not the entity's field declaration order.
The top-level form is useful when:
- Sample data is maintained separately from the entity definition (e.g., one file declares schemas, another declares fixtures)
- Only a subset of an entity's columns has meaningful sample values
- The same entity has multiple disjoint record sets that benefit from separate declarations
25.3 Cross-container references
To declare records for an entity inside a Container, use the dotted path:
records core.users (id, name, email) {
1, 'Alice', 'alice@example.com'
}25.4 Value forms
Each value in a record row is one of:
| Form | Example | Notes |
|---|---|---|
| String | 'Alice', 'O\'Brien' | Single-quoted. Escape inner single quotes with \'. |
| Multi-line string | '''multi\nline''' | Triple single-quoted. Useful for description text. |
| Number | 42, 3.14, -1.5e-3 | Integer or decimal; standard scientific notation. |
| Boolean | true, false | Lowercase. |
| Null | null | The SQL/JSON null value. |
| Date / Time | '2026-06-10', '10:30:00', '2026-06-10T14:30:00Z' | ISO 8601 inside single quotes. Interpreted per the target column's declared type (date, time, timestamp with or without timezone). |
| Enum value | Status.active | When the column's type is a declared Enum, values may be written as <EnumName>.<value> for clarity. |
| Backtick expression | `gen_random_uuid()` | Backtick-wrapped. Treated as opaque target-engine expression, disabling static type checking; lowered verbatim by generators. Useful for database-side defaults or computed values. |
Each value is type-checked against the target column's declared type at generation time. Implementations SHOULD report a clear error when a value cannot be coerced (e.g., 'not_a_number' for an int column).
25.5 Generator behavior
Records are sample data, not authoritative schema content. Generators MAY emit records as:
- SQL
INSERTstatements (PostgreSQL, Oracle, MySQL, etc.) - MongoDB
insertManycalls or BSON document fixtures - Avro
IndexedRecordinstances for test corpora - JSON document fixtures
- Markdown tables in generated documentation
Generators MAY also choose to skip records entirely when producing DDL-only output. The records' presence in the xDBML document is non-prescriptive about whether they appear in any generated artifact.
26. Module system
A single xDBML file can describe a complete schema, and in many cases that is the appropriate structure. For larger architectures, however, splitting the schema across multiple files becomes valuable -- separating polyglot layers by engine, decomposing by domain, or sharing canonical declarations across data products. The module system lets one xDBML document import declarations from another.
The design is intentionally close to DBML's module system documented at https://dbml.dbdiagram.io/syntax/module-system, with strict-superset semantics: every valid DBML document parses correctly under xDBML and DBML constructs mean the same thing in xDBML as in DBML. xDBML extends DBML's design in four ways, all of which are available only when the file declares xdbml: 0.2 (or later):
- Extended element-type vocabulary covering xDBML-specific constructs (Container, Entity, Collection, Record, Type, Edge, View, DiagramView) in addition to DBML's table, enum, tablepartial, note, schema, tablegroup.
- Field-level imports, allowing an author to clone a single field declaration from another file as a reusable named shape.
- Container-scoped imports, allowing imports to live inside the Container body where they belong.
- Clone blocks, optionally embedding the imported content directly in the importing file so that the file remains parseable even if the referenced file is unavailable.
This phase of v0.2 supports relative-path imports. URL imports are deferred to a later phase.
26.1 Files and projects
Every xDBML file is a complete document and may declare a Project { ... } block describing its standalone view of a schema.
When file B imports from file A, what flows from A to B is the content of A's declarations -- its Containers, Entities, Types, Edges, Views, TablePartials, Enums, Notes, TableGroups, DiagramViews. What does NOT flow is A's Project { ... } declaration. Project declarations are intrinsic to each file; they are never importable.
The same file can be opened standalone (its own Project is the project) or imported by another file (only its non-Project content contributes to the importer's project). A file MAY omit the Project declaration if its sole purpose is to be imported.
26.2 Import in full vs selective import
Two import modes are supported.
Import in full:
reuse * from './path-to-file'Brings every top-level declaration from the source file into the importing file's scope (except for the source's Project block). Cannot be combined with aliasing -- if you need to rename, use the selective form.
Selective import:
reuse {
type name
type name as alias
...
} from './path-to-file'Brings only the listed declarations. Each item may carry an optional as alias to rename the import in the current file's namespace.
Multi-line and single-line comma-separated forms are both supported:
reuse { entity products, entity categories, type Email } from './library'A single directive may import multiple declarations from the same source file. When combined with a clone block (§26.6), the clone block contains all the imported declarations matched by name and element type.
26.3 Importable element types
The element type names in the selective form are case-insensitive (Table, TABLE, table all work).
| Keyword | What is imported |
|---|---|
table | Entity declared with the DBML Table keyword |
entity | Entity declared with Entity (synonym for table) |
collection | Entity declared with Collection (synonym for table) |
record | Entity declared with Record (synonym for table) |
enum | Enum |
tablepartial | TablePartial |
note | Sticky Note |
schema | Container declared with type: schema, including all entities inside it |
container | Container of any type, including all entities inside it |
tablegroup | TableGroup, including its membership |
type | Named Type (object-shaped or scalar -- §14) |
edge | Edge |
view | View |
diagramview | DiagramView |
field | Single field declaration cloned from an entity as a reusable named shape (see §26.8) |
The project element type is NOT importable.
Source paths mirror the source file's actual hierarchy:
| Source structure | Import path |
|---|---|
| Entity at top level (no container) | EntityName |
| Entity inside a Container | ContainerName.EntityName |
| Field on a top-level entity | EntityName.FieldName |
| Field on an entity inside a container | ContainerName.EntityName.FieldName |
| Nested field (object inside an entity) | ContainerName.EntityName.OuterField.InnerField |
| Top-level Type, Enum, TablePartial, etc. | Name |
26.4 The use and reuse directives
Both use and reuse bring declarations from another file into the current file. They differ in one respect: transitivity to further importers.
reuse is the simpler default and is what most authors want. Declarations brought in via reuse are visible to files that further import the current file:
// schemas/index.xdbml -- barrel file
reuse * from './oracle-core'
reuse * from './mongo-orders'
// main.xdbml -- imports the barrel
reuse * from './schemas/index'
// All declarations from oracle-core and mongo-orders are available.use is appropriate only when the author wants the imports to remain private to the importing file. Declarations brought in via use are NOT visible to files that further import the current file.
Recommendation: use reuse by default unless you have a specific reason to prevent transitivity. Most authors will never need the non-transitive form.
Both forms support both * (import all) and { ... } (selective) syntaxes.
26.5 Where the directive goes determines where the import lives
The location of a use/reuse directive in the importing file determines where the imported element lives in the merged AST.
The rule is simple: the directive's location is the placement location. What you see textually is what the merged AST contains. There is no separate "visibility scope" or "placement scope" -- they are the same thing.
For non-field imports, the directive may appear in any location where the imported element type is normally valid:
- Container imports (
container,schema): at file scope (Containers are top-level constructs) - Top-level constructs (
type,enum,tablepartial,edge,view,tablegroup,diagramview,note): at file scope - Entity imports (
table,entity,collection,record): at file scope (top-level uncontained entity) OR inside a Container body (the entity becomes a member of that Container)
For field imports, the directive MUST appear at file scope (see §26.8 for the declaration-vs-placement pattern).
use/reuse directives MAY NOT appear inside Entity, Edge, View, or Type bodies. Imports are declarations that establish structural elements; they don't belong inside the bodies of other structural elements.
Naming:
The imported element is named in the importing file's namespace based on where it is placed, not where it came from. For an entity import:
- Directive at file scope → entity is named by its bare name at file top level (e.g.,
products) - Directive inside
Container ordering→ entity is named within that container (e.g.,ordering.products)
The source's path appears in the directive's from clause (recording where to find the element in the source); the local name reflects the directive's placement.
26.6 Clone blocks
use and reuse directives MAY be followed by an optional clone block containing the imported content embedded directly in the importing file. When the clone is present, the parser uses it as the authoritative content for that import. When the clone is absent, the parser falls back to opening the referenced file at parse time (DBML-compatible behavior).
Syntax:
// Reference-only (DBML behavior, no clone)
reuse { entity core.dim_customer } from './conformed-dimensions'
// With clone (xDBML autonomous behavior)
reuse { entity core.dim_customer } from './conformed-dimensions' {
Entity dim_customer {
id int [pk]
email Email
country CountryCode
}
}
// With clone + tool-managed metadata
reuse { entity core.dim_customer } from './conformed-dimensions' [cloned_at: '2026-06-06T14:30:00Z'] {
Entity dim_customer { ... }
}What the clone block contains:
The clone block contains exactly the imported declaration(s), without any surrounding wrappers from the source file. The shape of the clone follows the granularity of the import:
| Import directive | Clone block content |
|---|---|
reuse { field X.Y.Z } from ... | The field declaration alone (no entity wrapper, no container wrapper) |
reuse { entity X.Y } from ... | The entity declaration alone (no container wrapper) |
reuse { type T } from ... | The Type declaration |
reuse { enum E } from ... | The Enum declaration |
reuse { tablepartial P } from ... | The TablePartial declaration |
reuse { container C } from ... | The Container declaration including its intrinsic entities |
reuse { schema S } from ... | Same as container -- the schema's content travels with it |
reuse { tablegroup G } from ... | The TableGroup declaration including its membership |
reuse { view V } from ... | The View declaration |
reuse { edge E } from ... | The Edge declaration |
reuse { diagramview D } from ... | The DiagramView declaration |
reuse { note N } from ... | The sticky Note declaration |
reuse * from ... | All top-level declarations from the source's non-Project content, preserving their natural shape |
The principle: container wrappers are dropped when extracting an inner element, preserved when the container itself is the imported element.
Multiple imports in one directive with one clone block are supported. The parser matches each clone declaration to its corresponding import item by name and element type; order within the clone block is not significant.
Names in the clone use the importing file's vocabulary. When a directive aliases an import (as), the clone uses the alias. Names referenced inside a clone also use the importing file's vocabulary -- no mediation between source names and importing-file names is required.
Semantics:
- When a clone block is present, the parser MUST use its content as the authoritative declarations for the directive's scope. The referenced path is not opened during parsing.
- When a clone block is absent, the parser MUST open the referenced path and use the source file's content as authoritative. If the path cannot be resolved, the parse fails.
- Clone content is parsed under the importing file's
xdbml:version directive, not the source file's.
Tool-mediated clone refresh: before submitting the file to a parser, tools MAY refresh clones by re-reading their referenced sources and updating clone content (and the cloned_at timestamp). Such refresh workflows are tool-defined and out of scope for this specification.
Autonomy property: a file containing clone blocks for all of its imports is fully self-contained and parses correctly without needing access to any referenced files. Authors choose per-directive whether to include a clone based on workflow needs.
Clone metadata: the directive MAY carry settings before the clone block. The only setting defined in this version is:
cloned_at: <timestamp>-- ISO 8601 timestamp recording when the clone was captured
This setting is informational. The parser does not act on it. Tooling MAY use it to drive drift-detection workflows. Other metadata fields (hashes, source versions, refresh policies) are reserved for future versions.
Drift detection: drift between the clone content and the current state of the referenced file is out of scope for the specification. Tools that author or refresh clones define their own policies.
Nested references within clones: when a clone is captured, it includes the directly-imported declarations only. If those declarations reference further declarations from other files, the clone does NOT recursively capture those further references. For full self-containment, the author must also import (with clone blocks) the transitively referenced declarations.
26.7 Worked example: use vs reuse
Private internal helper (use private to importing file):
// internal-helpers.xdbml
Type InternalAuditFields {
created_at timestamp
created_by varchar
}// orders.xdbml -- keeps the helper private
xdbml: 0.2
use { type InternalAuditFields } from './internal-helpers'
Container ordering [type: schema] {
Entity orders {
id int [pk]
audit InternalAuditFields
customer_id int
}
}// main.xdbml
xdbml: 0.2
Project ecommerce { targets: [PostgreSQL] }
reuse * from './orders'
// 'ordering.orders' is available. 'InternalAuditFields' is NOT (orders.xdbml used it privately).26.8 Field-level imports
Field-level imports clone a single field declaration from another file. They are useful when the importing file wants to reuse an existing field's complete definition without importing the whole entity or asking the library author to extract a Named Type.
Declaration vs placement:
Field imports are different from all other imports in one important respect: they declare a reusable shape without placing a field anywhere.
A field cannot exist on its own outside an entity. So the field import directive declares an available named field shape at file scope, and the author then uses that name as the type of a field in some entity. This makes field imports semantically similar to scalar Named Types (§14.7): the directive declares a reusable shape; the author places fields of that shape elsewhere in the file.
Field imports must appear at file scope, never inside a Container body.
Syntax:
reuse { field core.dim_customer.email } from './conformed-dimensions'The path follows xDBML's standard container.entity.field form. Field paths into nested structures use dot notation: field core.dim_customer.addresses.address.city.
Usage: the imported field's name (or alias) is used as the type of a field in some entity:
reuse { field core.dim_customer.email } from './conformed-dimensions'
Container facts [type: schema] {
Entity fact_sales {
customer_email email // 'email' here is used as the type, not the field name
customer_id int [ref: > core.dim_customer.id]
amount decimal(10,2)
}
}To rename the imported field shape, use as:
reuse { field core.dim_customer.email as canonical_email } from './conformed-dimensions'
Entity fact_sales {
customer_email canonical_email
}What is copied (cloning by value at clone-capture time or parse time):
- Type: copied verbatim, including nested shapes
- Settings (
not null,unique,default,increment,pk): copied verbatim - Validation constraints (
pattern,minLength,maxLength, range, enum membership,check): copied verbatim - Notes (both inline and block): copied verbatim
- AI-readiness metadata (
synonyms,business_term,tags,granularity): copied verbatim - Custom properties (
x_prefixed): copied verbatim - Inline
ref:settings: copied only if the ref's target is resolvable in the importing file's scope; otherwise dropped with a warning
Per-instance overrides:
When using the imported shape as a field type, the field's declaration may override settings:
reuse { field core.dim_customer.email } from './conformed-dimensions'
Entity employees {
email email [unique] // adds 'unique' to the imported email's settings
}Field imports support clone blocks:
reuse { field core.dim_customer.email } from './conformed-dimensions' [cloned_at: '2026-06-06T14:30:00Z'] {
email varchar [unique, not null, pattern: '^[^@]+@[^@]+$', tags: ['pii']]
}26.9 Aliases
When two source files define elements with the same name, use as to rename imports and avoid conflicts:
Container app [type: schema] {
reuse { entity users as auth_users } from './auth'
reuse { entity users as billing_users } from './billing'
}Once aliased, the alias is the only accessible name in the importing file under that import. The clone (if present) uses the alias as the declaration's name.
26.10 Name resolution
Name resolution operates in two passes:
Pass 1 (declaration collection): the parser walks the import graph starting from the parse target. For each use/reuse directive, the parser uses the clone block if present, otherwise opens the referenced file. Declarations are collected with their fully-qualified names in the importing file's namespace (placement applied, aliases applied).
Pass 2 (reference resolution): references are resolved against the declarations collected in pass 1. Forward references within and across files are valid.
Conflict policy: two imports conflict only if their fully-qualified names in the importing file's namespace collide. Two Entity users imported into different Containers (ordering.users and audit.users) do not conflict; two imports of users into the same Container without aliases DO conflict. On conflict, the parser MUST emit an error and require explicit aliasing.
Local declarations always win over imported declarations of the same name.
26.11 Version directive across files
Each file is parsed using its own xdbml: version directive (or as DBML if no directive is present). Different files in an import graph MAY use different version directives:
- A
xdbml: 0.2file MAY import axdbml: 0.1file - A
xdbml: 0.2file MAY import a DBML file (no directive) - A
xdbml: 0.1file MAY import a DBML file - A
xdbml: 0.1file MAY NOT use v0.2-only constructs such as clone blocks, scalar Named Types, or field imports
When a clone is captured from a source file, the clone content is parsed under the importing file's version directive, not the source's.
26.12 Multi-file Project rule
Every xDBML file MAY declare a Project, but only the entry file's Project survives in the merged AST. When file A imports from file B:
- A's Project is part of the merged project AST
- B's Project is ignored (B's declarations contribute, B's Project metadata does not)
The same file plays different roles depending on whether it's the entry point or being imported.
26.13 Path resolution
This phase of v0.2 supports relative paths only. URL imports are deferred to a later phase.
A relative path:
- Begins with
./or../ - Uses forward slashes as path separators on all platforms
- Is resolved relative to the importing file's directory
- Has an optional file extension (
./auth,./auth.xdbml,./auth.dbmlall work)
When the extension is omitted, the parser searches <path>.xdbml then <path>.dbml. If neither exists AND no clone block is present, the parser reports an error.
When a clone block IS present, path resolution failure is NOT an error -- the clone is authoritative. The parser MAY emit a warning.
Paths without a leading ./ or ../ are reserved for future use and rejected in this phase.
Cross-platform compatibility: forward-slash relative paths are accepted by all modern file APIs on Linux, macOS, and Windows. Browser-based renderers using the File System Access API support the same path syntax.
Browser-based renderer considerations: when an xDBML file is parsed in a browser-based renderer, reading referenced files may require explicit user permission via the File System Access API or equivalent. Clone blocks eliminate this concern entirely: a file with clones for all its imports requires no file system access to referenced files. Authors writing files for browser-based consumption are encouraged to include clones.
26.14 Circular imports
Circular imports are allowed, matching DBML's behavior. The two-pass name resolution handles cycles naturally.
Best practice (not a language rule): circular imports are often a signal that file boundaries could be improved.
26.15 Worked examples
Entity import inside a Container:
// catalog.xdbml
xdbml: 0.2
Project catalog { targets: [PostgreSQL] }
Container core [type: schema] {
Entity products {
id int [pk]
sku varchar [unique, not null]
name varchar [not null]
price decimal(10,2)
}
}// order-system.xdbml
xdbml: 0.2
Project order_system { targets: [PostgreSQL] }
Container ordering [type: schema] {
Entity orders {
id int [pk]
customer_id int
placed_at timestamp
}
reuse { entity core.products } from './catalog' [cloned_at: '2026-06-06T14:30:00Z'] {
Entity products {
id int [pk]
sku varchar [unique, not null]
name varchar [not null]
price decimal(10,2)
}
}
Entity order_lines {
order_id int [ref: > ordering.orders.id]
product_id int [ref: > ordering.products.id]
quantity int
}
}The imported products entity becomes ordering.products in the importing file. The source's core container appears in the directive's path but does not survive into the importing file's AST.
Field-level import:
// legacy-customer.xdbml
Container ops [type: schema, target: Oracle] {
Entity customer_legacy {
id number(10)
legacy_email varchar2(255) [pattern: '^[^@]+@[^@]+$', tags: ['pii']]
}
}// new-system.xdbml
xdbml: 0.2
Project new_system { targets: [PostgreSQL] }
reuse { field ops.customer_legacy.legacy_email as contact_email }
from './legacy-customer'
[cloned_at: '2026-06-06T14:30:00Z'] {
contact_email varchar2(255) [pattern: '^[^@]+@[^@]+$', tags: ['pii']]
}
Container app [type: schema] {
Entity new_customer {
id int [pk]
contact_email contact_email
}
}The field import declares contact_email as a reusable shape at file scope. The Entity uses contact_email as the type of one of its fields.
26.16 Parser implementation requirements
A v0.2-conformant parser supporting the module system MUST:
- Accept any xDBML file as the parse target. The target file's
Projectis the merged project's Project. - For each
use/reusedirective: if a clone block is present, use it as authoritative content. If absent, open the referenced path. - Walk the import graph by reading directives at file scope and inside Container bodies, resolving paths relative to the importing file's directory.
- Visit each unique file at most once; circular imports must terminate.
- Parse each referenced file with its own
xdbml:version directive. Parse clone content with the importing file's version directive. - Place each imported element at the directive's location in the importing file's merged AST.
- Reject field imports inside Container bodies; field imports MUST be at file scope.
- Reject
use/reusedirectives inside Entity, Edge, View, or Type bodies. - Implement
usevsreusesemantics correctly. - For field imports, copy the field declaration as a reusable named shape available for use as a field type, with
ref:settings dropped (with warning) when the ref target is not in the importing file's scope. - When extracting a clone for an entity import, strip any surrounding Container wrapper. When extracting a clone for a container import, preserve the Container with its entities. Apply directive aliases in the clone's declaration names.
- When a directive carries multiple imports and a clone block, match each clone declaration to its corresponding import item by name and element type.
- Resolve names in two passes; an imported element's name follows its placement location, not its source path.
- Detect and reject unresolved name conflicts.
- Reject paths not starting with
./or../.
The parser SHOULD:
- Cache parsed file contents during a single parse invocation
- Emit warnings for shadowed-but-unaliased imports
- Emit warnings for circular import chains
- Emit warnings when a clone block is present but the referenced path could not be accessed
- Emit warnings when a captured clone has unresolved external references
Drift detection between clones and their sources is out of scope for the specification.
27. AST representation
Abstract Schema Tree
Project
├── settings (targets, ...)
├── Note (project-level)
├── Container (0..n)
│ ├── settings (type, target, replication, location, ...)
│ ├── Note (container-level)
│ ├── Entity (n per container)
│ │ ├── Field (n; recursively nested)
│ │ ├── Index (n)
│ │ ├── Check (n)
│ │ ├── Records (0..1)
│ │ └── Note
│ ├── View (n per container)
│ │ ├── settings (source_query, materialized, ...)
│ │ ├── Field (n)
│ │ └── Note
│ └── Edge (n per container)
│ ├── settings (source, target, source_cardinality, target_cardinality, undirected, ...)
│ ├── Field (n)
│ ├── Index (n)
│ └── Note
├── Entity (0..n; project-level)
├── View (0..n; project-level)
├── Edge (0..n; project-level)
├── Type (0..n)
├── Enum (n)
├── Ref (n; project-level)
│ └── Cardinality (source/target min and max)
├── TablePartial (n)
├── TableGroup (n)
├── DiagramView (n)
└── Note (n; sticky notes)27.1 Field recursion
A Field node carries a type expression. Complex types contain nested Field nodes; tree depth is unbounded.
27.2 Path segments
Field paths are ordered lists of PathSegment nodes, each tagged by kind: field, array_index, array_iter, map_key, map_iter, alternative. Implicit forms are normalized to explicit forms during parsing.
27.3 AST flavors
- Raw -- references stored as strings, for partial-document analysis.
- Normalized -- references resolved into direct pointers, for traversal and code generation.
28. Round-trip semantics
28.1 Round-trip targets
| Format | Direction | Notes |
|---|---|---|
| DBML 3.13.6 | xDBML ↔ DBML | Strict superset. xDBML-only constructs emit as comments when downgrading. |
| Oracle DDL | xDBML → SQL | CREATE TABLE, CREATE TYPE, CHECK, FKs. JSON via native JSON type (21c+). Views as CREATE VIEW / CREATE MATERIALIZED VIEW. |
| PostgreSQL DDL | xDBML → SQL | JSON via jsonb. |
| MongoDB validators | xDBML ↔ $jsonSchema | Object/array/oneOf/union direct. BSON types preserved. |
| Avro schemas | xDBML ↔ Avro | Named types → Avro named records. Union element ordering preserved. |
| Parquet schemas | xDBML → Parquet | Nested types → Parquet group types. |
| JSON Schema | xDBML ↔ JSON Schema | Direct correspondence for most constructs. Named types → $defs. |
| OpenAPI | xDBML → OpenAPI | Entities and types → components.schemas. |
| Neo4j / Cypher schema | xDBML → Cypher | Entities → node labels; Edges → relationship types with properties. |
| RDF-star | xDBML ↔ RDF-star | Edges → quoted triples with annotations. |
| ODCS (schema section) | xDBML → ODCS | xDBML supplies schema portion; other ODCS sections out of scope. |
| Hackolade PDM | xDBML ↔ PDM | Lossless round-trip. PDM Description fields receive xDBML notes (markdown). |
28.2 Lossiness boundary
Lossless: Container declarations and settings; Entity, View, Edge, Field, Index, Ref structure; nested types; polymorphism; named types (where target supports $ref); validation constraints; cardinality; notes.
Lossy when target lacks the construct:
- Polymorphism to relational targets (lowered to discriminator + checks)
- Named types to formats without
$ref(inlined) - Edges to non-LPG relational targets (lowered to junction tables)
- Views with non-portable source queries (passed verbatim; may fail on target)
Generators emit warnings on lossy lowering.
29. Relationship to other standards
┌──────────────────────────────────────────────────────────────────────┐
│ Data product layer ODPS │
├──────────────────────────────────────────────────────────────────────┤
│ Inferential / reasoning layer OWL, knowledge graphs │
├──────────────────────────────────────────────────────────────────────┤
│ Semantic / business layer OSI, dbt MetricFlow, Cube │
├──────────────────────────────────────────────────────────────────────┤
│ Contract layer ODCS │
├──────────────────────────────────────────────────────────────────────┤
│ Validation layer JSON Schema, SHACL │
├──────────────────────────────────────────────────────────────────────┤
│ Schema / shape layer xDBML ← here │
├──────────────────────────────────────────────────────────────────────┤
│ Engine DDL layer SQL DDL, Cypher, CQL, ... │
└──────────────────────────────────────────────────────────────────────┘| Standard | Layer | xDBML relationship |
|---|---|---|
| SQL DDL | Engine | Forward-engineering target |
| Cypher schema | Engine | Forward-engineering target for LPG |
| JSON Schema | Schema (JSON-only) | Bidirectional translation |
| OpenAPI | Schema (HTTP APIs) | Same shapes |
| Avro / Parquet | Schema (serialization) | Forward-engineering target |
| SHACL | Validation | Bidirectional translation |
| ODCS | Contract | Supplies schema section |
| ODPS | Data product | Sits above ODCS |
| OSI | Semantic | Complementary via §23 shared vocabulary |
| RDFS / OWL | Inferential | Shapes lift into knowledge graphs |
| RDF-star | Inferential | Bidirectional for edges with properties |
xDBML does not attempt to be SQL, OSI, ODCS, OWL, or JSON Schema. Each addresses part of the data-description problem at one layer.
30. Conformance
A conforming xDBML v0.2 implementation:
- Parses every valid xDBML v0.2 document to the AST described in §27.
- Parses every valid xDBML v0.1 document with v0.1 semantics (the version directive selects).
- Rejects malformed documents with informative error messages.
- Honors the version declaration per §4.1.
- Honors the
experimental:opt-in per §4.2. - Normalizes implicit forms to canonical AST representations for paths.
- Preserves declared keyword choices in the raw AST flavor.
- Computes default cardinality per §11.8 when not declared.
- Implements the module system (§26) including relative-path imports, clone blocks, field-level imports, and scoped imports per §26.6.
Generators and importers are not required for spec conformance.
Appendix A. Reserved keywords
Document-level: xdbml, experimental, Project
Container-level: Container, Schema, Database, Keyspace, Namespace, Dataset, Bucket
Entity-level: Entity, Table, Collection, Record
Other top-level constructs: Edge, View, Type, Enum, enum, Ref, TablePartial, TableGroup, DiagramView, Note
Field types: object, struct, record, array, list, map, dict, dictionary, set, union, oneOf, anyOf, allOf, json, jsonb, variant
Block keywords: records, indexes, checks
Module system keywords (new in v0.2): use, reuse, from, as
Module system element-type slot values (new in v0.2): table, entity, collection, record, enum, tablepartial, note, schema, container, tablegroup, type, edge, view, diagramview, field
Module system directive setting (new in v0.2): cloned_at
Relationship operators: <, >, -, <>
Field settings: pk, primary key, null, not null, unique, default, increment, note, ref, check
Cardinality settings: source, target, min_source, max_source, min_target, max_target, source_cardinality, target_cardinality, undirected
Relationship settings (new for v0.2): inactive (flag for visualization-only deactivation), color (#rgb / #rrggbb for the relationship line)
TableGroup settings (new for v0.2): color (#rgb / #rrggbb for the group's frame in diagrams)
Check settings: name (constraint name in generated DDL), note (free-text annotation)
View settings: source_query, materialized, refresh_schedule, refresh_on, source_database, storage_options
AI-readiness settings: synonyms, business_term, granularity, tags
Custom-property prefix: x_ (reserved as a convention for organization-specific extensions; see §23.5)
Quoted identifiers ("keyword") may be used for entities or fields that share a name with a reserved keyword.
Appendix B. Lexical grammar reference
| Token class | Pattern |
|---|---|
| Identifier | [A-Za-z_][A-Za-z0-9_]* |
| Quoted identifier | " any chars with \", \\ escapes " |
| String literal | ' any chars with \', \\ escapes ' |
| Multi-line string | ''' any chars with indentation normalization ''' |
| Number | integer or decimal, optional sign and exponent |
| Expression | ` any chars ` |
| Cardinality string | 'N..M' where N, M are non-negative integers or * |
| Comment (single-line) | // to end of line |
| Comment (block) | /* ... */ |
| Container/entity-qualified name | identifier.identifier |
| Composite reference | identifier.(identifier, identifier, ...) |
| Field path | identifier followed by zero or more .identifier, .[N], .[*], .["literal"], or ."quoted" segments |
A formal ANTLR4 grammar is published at grammar/xDBML.g4 in this repository.
Appendix C. Worked end-to-end examples
C.1 Polyglot model: Oracle + MongoDB + events + graph
xdbml: 0.1
Project polyglot_example {
targets: [Oracle, MongoDB, Avro, Neo4j]
}
Type Address {
street varchar [not null]
city varchar [not null]
country varchar
}
Type MonetaryAmount {
amount Decimal128 [not null]
currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3]
}
Container core [type: schema, target: Oracle] {
Table customers {
id int [pk]
email varchar [unique, not null, pattern: '^[^@]+@[^@]+$']
display_name varchar [not null]
primary_address Address
}
}
Container orders_store [type: database, target: MongoDB] {
Collection orders {
_id objectId [pk]
customer_id int32 [not null]
placed_at Date [granularity: second]
total MonetaryAmount
line_items array [
line_item object {
sku string [not null]
quantity int32 [not null, minimum: 1]
unit_price MonetaryAmount
}
]
payment_method oneOf {
card object { last4 string [maxLength: 4], brand string }
bank object { iban string }
wallet object { provider string, account string }
} [discriminator: method_kind]
}
}
Container events [type: namespace, target: Avro] {
Record OrderPlaced {
event_id string [pk]
occurred long [granularity: millisecond, not null,
note: 'Unix epoch milliseconds; Avro logical-type timestamp-millis']
order_id string [not null, note: 'String form of order _id from orders_store']
customer_id int [not null]
total MonetaryAmount
}
}
Container social [type: database, target: Neo4j] {
Edge FOLLOWS [source: core.customers, target: core.customers,
source_cardinality: '0..*', target_cardinality: '0..*'] {
since date [not null]
is_close boolean [default: false]
}
}
Container catalog [type: schema, target: Oracle] {
Entity products {
sku varchar [pk]
name varchar [not null]
price MonetaryAmount
metadata json {
weight_grams int
dimensions object { length int, width int, height int }
tags array [varchar]
}
}
View top_sellers [materialized: true, refresh_schedule: 'daily'] {
source_query: '''
SELECT p.sku, p.name, COUNT(*) AS order_count
FROM catalog.products p
JOIN orders_store.orders o ON o.line_items.sku = p.sku
GROUP BY p.sku, p.name
ORDER BY order_count DESC
FETCH FIRST 100 ROWS ONLY
'''
sku varchar [pk]
name varchar
order_count int
}
}
Ref: orders_store.orders.customer_id > core.customers.id [source: '1..*', target: '1..1']
Ref: orders_store.orders.line_items.[*].sku > catalog.products.sku
Ref: events.OrderPlaced.order_id > orders_store.orders._id
Ref: events.OrderPlaced.customer_id > core.customers.idC.2 Heterogeneous tuple
xdbml: 0.1
Entity customers {
id int [pk]
name varchar
addresses array [
[0] billing object { street varchar, city varchar, zip varchar }
[1] shipping object { street varchar, city varchar, zip varchar }
]
indexes {
addresses.[0].city
addresses.[1].zip
}
}C.3 Recursive named type
xdbml: 0.1
Type TreeNode {
id int
value varchar
children array [child TreeNode]
}
Entity organization_chart {
root_id int [pk]
tree TreeNode
}C.4 Graph model with property edges
xdbml: 0.1
Project social_network {
targets: Neo4j
}
Container social [type: database, target: Neo4j] {
Entity Person {
id int [pk]
name varchar [not null]
email varchar [unique]
}
Entity Movie {
id int [pk]
title varchar [not null]
year int
}
Edge KNOWS [source: Person, target: Person,
source_cardinality: '0..*', target_cardinality: '0..*'] {
since date [not null]
intimacy int [minimum: 0, maximum: 10]
}
Edge ACTED_IN [source: Person, target: Movie] {
role varchar [not null]
billing int [note: 'order in credits']
}
Edge RATED [source: Person, target: Movie] {
rating int [not null, minimum: 1, maximum: 5]
rated_at timestamp [granularity: second]
review varchar
}
}Appendix D. Compatibility with DBML 3.13.6
xDBML is a strict superset of DBML 3.13.6:
- Every valid DBML 3.13.6 document is a valid xDBML document at the lowest compatibility level (no version declaration).
- A document declaring
xdbml: 0.1opts into the v0.1 feature set; all DBML 3.13.6 constructs remain valid. - A document declaring
xdbml: 0.2opts into the v0.2 feature set (which includes the v0.1 feature set plus the module system, scalar Named Types, and field-level imports); all DBML 3.13.6 constructs remain valid. - xDBML extension constructs are not recognized in documents without a version declaration.
- DBML's implicit-schema syntax (
Table core.users { ... }) remains valid; produces an implicit Container. - DBML's
Tablekeyword remains valid as a synonym forEntity. - DBML's
useandreusemodule-system directives (without clone blocks, without xDBML-specific element types) behave identically in xDBML. v0.2 adds clone blocks, field-level imports, the extended element-type vocabulary, and Container-scoped imports -- all available only when the file declaresxdbml: 0.2.
A DBML 3.13.6 document is upgraded to xDBML by adding xdbml: 0.2 at the top (or xdbml: 0.1 for the previous feature set). No structural changes required.
Appendix E. Sources and acknowledgments
xDBML builds on the work of Holistics and contributors to DBML (Apache License 2.0). The DBML language reference at dbml.dbdiagram.io served as the foundation for the constructs documented in §5, §7, §9, §10, §11, §15, §16, §17, §18, §25, §26.
xDBML acknowledges intellectual debt to:
- JSON Schema -- validation vocabulary,
oneOf/anyOf/allOf - Avro -- named records, namespaces, union semantics
- OpenAPI -- discriminator conventions,
components.schemasreuse - ODCS (bitol.io) -- Open Data Contract Standard: schema/contract layering
- OSI -- Open Semantic Interchange: schema/semantic-layer boundary vocabulary
- openCypher / Neo4j -- labeled property graph semantics; edge properties
- RDF-star -- annotation properties on triples
- Mermaid ER code -- cardinality notation principles
- Hackolade Studio -- polyglot data modeling principles, AI-readiness settings, path notation
End of xDBML Specification v0.2.