One Data Model, Many Representations

From PiRho Knowledgebase
Jump to navigationJump to search

One Data Model, Many Representations

Summary: Modern web systems often create unnecessary separation between websites and APIs. In reality, both are simply different representations of the same underlying data. When designed correctly, a website can function as an API, and an API can render a complete website — without semantic loss, duplication, or inconsistency. This article explores that principle and the machine‑readable HTML technologies that make it viable.

The Fundamental Principle

At the heart of a well‑designed web system is a simple idea:

There should be one canonical data model.

HTML, JSON, XML, and other formats are not competing truths — they are **projections** of that model, tailored for different consumers.

Problems arise when:

  • HTML says one thing
  • APIs say another
  • Metadata exists in isolation
  • Meaning is duplicated instead of shared

When this happens, systems drift, documentation lies, and maintenance cost grows quietly but relentlessly.

The Website as an API

A properly structured HTML document is already machine‑readable.

Browsers, crawlers, screen readers, and assistive technologies all parse:

  • Document structure
  • Element relationships
  • Headings and landmarks
  • Links and identifiers

When semantic meaning is embedded directly into HTML — using Microformats or RDFa — the document becomes a self‑describing data source.

In this model:

  • The content is the data
  • The markup expresses meaning
  • Machines consume the same source as humans

This approach is resilient by design:

  • It works without JavaScript
  • It survives partial rendering
  • It degrades gracefully
  • It remains readable decades later

The document does not pretend to be an API — it simply is one.

The API as a Website

The inverse approach is equally valid.

When an API exposes:

  • Stable identifiers
  • Explicit relationships
  • Meaningful field names
  • A coherent domain model

…then rendering HTML from it becomes a presentation concern, not a data problem.

The same endpoint can legitimately serve:

  • JSON to machines
  • HTML to humans
  • XML to legacy systems
  • Other formats as required

Nothing new is invented — only rendered.

This is not duplication. It is **representation**.

Machine‑Readable HTML Technologies in Context

Different technologies support this model in different ways.

Microformats

Microformats embed meaning using existing HTML elements and class names.

Their strengths are simplicity and longevity:

  • No parallel data structures
  • No special parsers required
  • No loss of human readability

If the machine disappears, the document remains correct.

This makes Microformats ideal for:

  • Human‑centric documents
  • Long‑lived content
  • Systems that value resilience

RDFa

RDFa extends this idea by allowing richer expression of relationships.

Crucially, it still:

  • Annotates existing content
  • Avoids data duplication
  • Keeps the document authoritative

Edits to content are edits to data — a powerful alignment that reduces drift over time.

JSON‑LD

JSON‑LD serves a different purpose.

It exists primarily for automated consumers that:

  • Do not want to parse HTML
  • Prefer fast, predictable extraction
  • Operate at web scale

JSON‑LD works best when treated as:

  • An optimisation layer
  • A reflection of existing truth
  • A convenience for external systems

Problems arise only when JSON‑LD becomes the *primary* source of meaning rather than a projection of it.

Microdata

Microdata introduced attribute‑based semantics alongside HTML5.

In practice, it:

  • Adds verbosity without clarity
  • Introduces new concepts without solving new problems
  • Competes with simpler, more mature approaches

It is supported, but rarely preferred in real‑world systems.

Avoiding Parallel Realities

The most common architectural failure is semantic duplication.

Examples include:

  • Content updated but metadata forgotten
  • API fields drifting from UI labels
  • SEO data diverging from visible truth
  • Accessibility annotations bolted on late

The cure is not tooling — it is alignment.

When:

  • HTML and API share identifiers
  • Meaning is expressed once
  • Representations are derived, not rewritten

…the system becomes calm and legible.

Progressive Enhancement as Architecture

This approach naturally supports progressive enhancement.

A document‑first system:

  • Works without scripts
  • Improves with them
  • Never depends on them

An API‑first system:

  • Scales cleanly
  • Supports automation
  • Remains renderer‑agnostic

Both are valid — and both can coexist — as long as they project the same underlying model.

Design Guidance

A pragmatic strategy looks like this:

  • **Human‑first content** → Microformats or RDFa
  • **Crawler‑first metadata** → JSON‑LD
  • **Single source of truth** → Canonical identifiers and models
  • **Long‑term systems** → Embedded meaning over external declarations

There is no universal winner — only informed trade‑offs.

Final Thought

The web does not suffer from a lack of standards. It suffers from a lack of honesty.

Systems last when:

  • Meaning is not duplicated
  • Data is not reinvented
  • Documents say exactly what they mean

When the website and the API tell the same story, the web works as it always should have — as a shared space for humans and machines alike.