Let's start at the end and work backwards. All the code for this example can be found here: https://github.com/ouachitalabs/baml-playground.
At work we sometimes need to extract values from semi-structured documents like 10-Qs or earnings call transcripts. Here's a quick script I wrote while exploring BAML:
$ uv run main.py "Combined Walmart U.S. and Samsclub U.S. net sales" --url "https://stock.walmart.com/sec-filings/all-sec-filings/content/0000104169-25-000137/0000104169-25-000137.pdf"
# Output:
{
"company": "Walmart Inc.",
"description": "Combined Walmart U.S. and Samsclub U.S. net sales for the latest quarter",
"derivation": 144549000000,
"facts": [
{
"value": 120911000000,
"units": "USD",
"start_date": "2025-05-01",
"end_date": "2025-07-31",
"location": "Page 15, Segment Information table, Walmart U.S. section, 'Net sales' row, '2025' column under 'Three Months Ended July 31,'"
},
{
"value": 23638000000,
"units": "USD",
"start_date": "2025-05-01",
"end_date": "2025-07-31",
"location": "Page 15, Segment Information table, Sam's Club U.S. section, 'Net sales' row, '2025' column under 'Three Months Ended July 31,'"
}
]
}
The numbers check out. Go verify them for yourself! These facts can sometimes be teased out with existing archaic and very tedious solutions like XBRL tags (made much simpler with the excellent Edgartools library, go check it out), but the tags are managed by the filers, and sometimes what we need out of the document is not tagged. Not to mention that XBRL is really only a standard for US filings, not Canadian/EU filings. Thus, we would like to have one solution to rule them all. Since it's a document extraction tool, your first instinct might just be to send the PDF to ChatGPT and see if it gets the answer right. Sure enough, it does so pretty often. LLMs are exceedingly good at pulling out relevant facts from deep within documents like this: commit hashes, facts and figures, exact dates and timestamps. The LLM is really good at regurgitating these facts just because of how the underlying transformer models work. Note that I'm no ML scientist, and my hand waving here should be taken with a grain of salt.
Tool calls are neat. They're what break out a single LLM call into a
multi-turn agent that can actually interact with something on the
outside. Orchestrating tool calls is relatively simple when you use a
library like openai
- however, with BAML things are a bit
more low level. This is a good thing. Before trying
BAML, I never truly understood what was happening under the hood when an
LLM "decided" to make a tool call. What was that handoff process like?
With BAML everything is nice and transparent. You can see everything as
it happens, and the tooling around the language makes it very easy to
test each prompt as a function with simple inputs/outputs.
Basically, BAML functions serve as a function interface to LLM prompts. Things go in and come out. BAML tries to help shape the data flowing in and out in a reasonably transparent way. Any real compute done in a BAML function is purely LLM state transfer. BAML functions are true pure functions, and side effects/mess are handled in the application layer rather than the routing layer.
To do the trick I showed at the beginning of the article, I’ve defined a few datatypes:
class Fact {
value int @description(#"
If the value is in currency, round to the nearest whole dollar. For example, always expand "$12.8 millions" to 12800000 "USD"
"#)
units string
start_date string
end_date string?
location string? @description(#"
A plain text description of where this fact can be found on the PDF
"#)
}
class FilingInformation {
company string
description string
derivation int? @description(#"
If asked for multi-fact calculation, put the *final calculation* here in whole USD
"#)
facts Fact[] @description(#"
If asked for a multi-fact calculation, put the *base facts* here
"#)
}
class CalculatorRequest {
equation string
@@assert(contains_operation, {{ this|regex_match("[*-+/]") }} )
}
These are my data models. Basically users can extract filing information from an SEC filing (or any document, as you'll see next) and back it up with cold hard facts 😎. These facts contain a value, units, start/end dates, and a location description so you can find the fact in the doc for debugging purposes. Now, onto the functions:
function SearchDoc(document: pdf, query: string) -> FilingInformation {
client Gemini
prompt #"
Extract the information from the document below.
Always prefer to source a fact from within a table if possible.
Info: {{query}}
Document: {{document}}
{{ ctx.output_format }}
"#
}
function NeedCalculator(message: string, facts: Fact[]) -> CalculatorRequest {
client Gpt5Nano
prompt #"
Given a message, create an expression that best represents the user's
request. If you don't have to calculate anything, return null.
Never give the answer or include the units. Always just raw numbers and
a symbolic arithmetic operation (+, -, *, /)
{{ ctx.output_format }}
{{ _.role("user") }} {{ message }}
{{ _.role("user") }} {{ facts }}
"#
}
These are the only two BAML functions we need. SearchDoc
is a
function that accepts a PDF and a query, returning some sort of filing
information. NeedCalculator
is a poorly named function that takes a
message (better thought of as the original query) and a list of facts
and turns it into an equation. If you check the models above, you'll see
that the CalculatorRequest output type is literally just a string named
"equation." This is both hilarious and deeply concerning. How do we
solve an equation as a string? It turns out LLMs are very good at
solving these simple arithmetic problems, but they're just not
trustworthy (read: deterministic) enough to trust their calculations,
even though they're correct very often. To remedy this, I thought we
could just bring a nuke to a paintball match and
import sympy
- a full-blown symbolic algebra system 😀.
Now, if the LLM ever decides to calculate derivatives or solve
differential equations, it has the tools at its disposal.
So how do we orchestrate all of this? It's pretty simple! BAML
works kind of like an OpenAPI spec generator if you've ever used one
before. Every time you save your work, it regenerates the client super
quickly (like in 5ms), which regenerates Python files using Pydantic for
data modeling in our case. Just like with OpenAPI codegen, many
languages are supported (Go, TypeScript, Python, etc). You can then
import the client and use the types and functions you've defined in your
*.baml
files. The code to orchestrate pretty much looks
like this:
import argparse
from utils import encode_file_to_base64
from baml_py import Pdf
from baml_client.sync_client import b
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Search a PDF document for information')
parser.add_argument('query', help='Search query for the document')
parser.add_argument('--url', required=True, help='URL or path to the PDF document')
args = parser.parse_args()
pdf = Pdf.from_base64(encode_file_to_base64(args.url))
result = b.SearchDoc(document=pdf, query=args.query)
if len(result.facts) > 1:
request = b.NeedCalculator(
message=args.query,
facts=result.facts
)
from sympy.parsing.mathematica import parse_mathematica
derivation = parse_mathematica(request.equation)
result.derivation = derivation.doit()
print(result.model_dump_json(indent=2))
Note how there are no import statements for OpenAI, Gemini,
Anthropic, etc. You may not have noticed I'm using two
totally different model providers here! I've used
gemini-2.5-pro
for the more complicated document parsing
and gpt-5-nano
for generating the simple equation! I can
mix and match models or run requests to different models concurrently,
taking the first one to respond. I can even round-robin them if I want
(a concept I first read about in XBOW's post on alloys,
which I'm very excited to try out in the future). You can also easily
plug in OpenAI-compatible local LLM servers like Ollama and
LMStudio.
client Gpt5Nano {
provider openai
options {
model "gpt-5-nano"
api_key env.OPENAI_API_KEY
}
}
client Gemini {
provider google-ai
options {
model "gemini-2.5-pro"
api_key env.GEMINI_API_KEY
}
}
client Lmstudio {
provider openai-generic
options {
model "google/gemma-3-4b"
base_url "http://localhost:1234/v1"
}
}
In conclusion, I don't know what the future holds for programming agents and baking LLMs into software, but this approach is a great glimpse into how the future may look. I didn't even get into the concept of evals, which - if your calls to the LLM are factored out into pure functions - are now trivial to implement in BAML. The tooling around building things with LLMs is getting better every day. There are about a million different very opinionated frameworks, and sometimes it's nice to step back into a more flexible, lower level of the stack for a change.