Is your feature request related to a problem?
Instructor's core value is structured outputs - getting LLMs to return valid, typed data reliably.
The irony: the JSON wire format used to achieve this structure is itself one of the largest token cost drivers in production pipelines.
Every Instructor call serializes:
- The JSON schema in the prompt
- The structured output response
- Retry attempts with validation errors
- Patch history across retries
~44% of tokens in typical Instructor payloads
are pure JSON syntax overhead.
At scale this compounds fast:
- Schema overhead on every single call
- Repeated field names across every retry
- Validation error payloads add more JSON
- Multi-step pipelines multiply the overhead
At 10M Instructor calls on GPT-4o:
~$59K spent on syntax noise. Not intelligence.
The frustration: Instructor already does the hard work of structured validation. The JSON wire format undermines that efficiency at scale.
Describe the solution you'd like
A pluggable serializer interface allowing token-efficient wire formats as opt-in replacement for JSON in Instructor pipelines.
I built ULMEN specifically for this problem.
Benchmarks on NVIDIA Tesla T4:
The natural fit with Instructor specifically:
Instructor validates structure on the Python side.
ULMEN's Semantic Firewall extends this to the wire:
- Validates structured output schemas
- Rejects malformed responses before retry
- Catches invalid enum states
- Raises structured errors vs silent failures
This aligns with Instructor's core philosophy:
never pass broken structure downstream.
How code might look:
import instructor
from openai import OpenAI
Current
client = instructor.from_openai(OpenAI())
Proposed
client = instructor.from_openai(
OpenAI(),
serializer="ulmen"
)
Per-call override
response = client.chat.completions.create(
model="gpt-4o",
response_model=MyModel,
serializer="ulmen",
messages=[...]
)
Pydantic model definitions unchanged.
ULMEN handles wire format transparently.
Pure Python fallback if Rust unavailable.
BSL license, free under $10M revenue.
Reproducible benchmark notebook:
github.com/makroumi/ulmen
Describe alternatives you've considered
-
orjson: Faster but identical token count. Doesn't address context window overhead.
-
Manual schema compressionLossy. Breaks with model changes.Not systematic across pipelines.
-
Reducing retry attempts: Trades reliability for cost. Wrong tradeoff for production systems.
-
Smaller models: Reduces capability not just cost.Wrong lever for this specific problem.
ULMEN addresses the root cause: JSON was designed for web APIs not LLM context windows.
Additional Context
Instructor users running structured extraction pipelines at scale are the most affected by this problem because:
- Schema overhead appears on EVERY call
- Retry payloads add compounding JSON overhead
- Batch extraction pipelines multiply the cost
- Multi-step pipelines chain the overhead
The teams most likely to benefit:
- Document extraction pipelines at scale
- Classification systems with high call volume
- Any production Instructor deployment over 1M calls per month
Reproducible benchmark notebook:
github.com/makroumi/ulmen
Is your feature request related to a problem?
Instructor's core value is structured outputs - getting LLMs to return valid, typed data reliably.
The irony: the JSON wire format used to achieve this structure is itself one of the largest token cost drivers in production pipelines.
Every Instructor call serializes:
~44% of tokens in typical Instructor payloads
are pure JSON syntax overhead.
At scale this compounds fast:
At 10M Instructor calls on GPT-4o:
~$59K spent on syntax noise. Not intelligence.
The frustration: Instructor already does the hard work of structured validation. The JSON wire format undermines that efficiency at scale.
Describe the solution you'd like
A pluggable serializer interface allowing token-efficient wire formats as opt-in replacement for JSON in Instructor pipelines.
I built ULMEN specifically for this problem.
Benchmarks on NVIDIA Tesla T4:
The natural fit with Instructor specifically:
Instructor validates structure on the Python side.
ULMEN's Semantic Firewall extends this to the wire:
This aligns with Instructor's core philosophy:
never pass broken structure downstream.
How code might look:
import instructor
from openai import OpenAI
Current
client = instructor.from_openai(OpenAI())
Proposed
client = instructor.from_openai(
OpenAI(),
serializer="ulmen"
)
Per-call override
response = client.chat.completions.create(
model="gpt-4o",
response_model=MyModel,
serializer="ulmen",
messages=[...]
)
Pydantic model definitions unchanged.
ULMEN handles wire format transparently.
Pure Python fallback if Rust unavailable.
BSL license, free under $10M revenue.
Reproducible benchmark notebook:
github.com/makroumi/ulmen
Describe alternatives you've considered
orjson: Faster but identical token count. Doesn't address context window overhead.
Manual schema compressionLossy. Breaks with model changes.Not systematic across pipelines.
Reducing retry attempts: Trades reliability for cost. Wrong tradeoff for production systems.
Smaller models: Reduces capability not just cost.Wrong lever for this specific problem.
ULMEN addresses the root cause: JSON was designed for web APIs not LLM context windows.
Additional Context
Instructor users running structured extraction pipelines at scale are the most affected by this problem because:
The teams most likely to benefit:
Reproducible benchmark notebook:
github.com/makroumi/ulmen