Introducing PyDough-CE: A simpler, safer path for natural language analytics

October 27, 2025

Rohit Krishnan

Today we’re releasing PyDough Community Edition (PyDough-CE), an open source project that finally makes natural language analytics both practical and safe.

PyDough-CE is built around PyDough, a small, human-readable analytics Domain-Specific Language (DSL) designed to bridge language models and databases. Instead of letting LLMs directly generate raw SQL — which is risky, brittle, prone to prompt injections and notoriously untrustworthy — PyDough takes a fundamentally different approach.

LLMs write short, auditable PyDough plans according to an existing knowledge graph — compact analytics “recipes” that are easy to read, easy to audit, and compile deterministically into correct, dialect-specific SQL for any backend.

The result is accuracy, transparency, and trust — all without sacrificing the speed and flexibility of natural-language interaction.

‍

Why We Built PyDough

Everyone’s been chasing the ability to “talk to your data.” And now, with LLMs, it seems like the dream should be easy — just ask the database a question, right? But in reality, three big issues keep that dream from being safe or scalable:

Prompt Injection: Models can unpredictably generate dangerous commands. How can you trust the SQL that your users will get your LLM to generate? What if it inserts malicious code, or allows you to exfiltrate data? The only way to solve it with SQL is through strict guardrails, which reduces its functionality.
SQL Dialect Sprawl: Different databases mean different SQL dialects, each needing custom training or prompts and templates.‍
Auditability: SQL generated directly from natural language can be lengthy, hard to review, and prone to subtle errors.

We realized the problem wasn’t the models — it was the interface. We’ve been asking LLMs to do too much: invent logic, reason about structure, and write perfect SQL syntax across multiple dialects. So we flipped the problem around.

Instead of teaching LLMs to write SQL, what if we gave them a safe language — one designed specifically for analytics:

Where every operation is valid, reviewable, and guaranteed to compile into correct SQL across backends.
That’s expressive enough to describe intent — but constrained enough to guarantee safety and correctness.
That’s small enough for models to master and strict enough to compile deterministically across backends.

That’s why we built PyDough — a protective, simplifying layer between natural language and SQL.

‍

The PyDough Way

In PyDough, the LLM never writes SQL. Instead:

It generates short, human-readable PyDough plans — tiny, structured programs that describe the analytics operation.
PyDough then compiles those plans deterministically into dialect-specific SQL for your backend of choice.
If the model tries to go rogue? It can’t — the code it produces can only express valid analytics operations.

This means that your results are:

Trustworthy → Because PyDough is constrained, the LLM literally can’t inject dangerous code.
Auditable → PyDough plans are short, readable, and easy to review.
Portable → One plan works everywhere — no more babysitting dialects.

And with a Knowledge Graph in the middle, PyDough also understands your schema — which tables exist, how they relate, and which fields are allowed.If you ask for something outside that scope, it simply can’t execute.If you ask anything else the LLM literally can’t do it, since the PyDough code it might generate cannot run!

‍

A Quick Example

Consider the query:
"Top 3 US cities by sales in the last 30 days."

PyDough produces a compact plan:

result = sales.FILTER(country == "US" and order_date >= TODAY() - DAYS(30)) \
	.CALCULATE(city, total = SUM(amount)) \
	.TOP_K(3, by=total.DESC())

This same PyDough plan compiles seamlessly into dialect-specific SQL, such as:

Snowflake:

SELECT city, SUM(amount) AS total
FROM sales
WHERE country = 'US'  
	AND order_date >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY city
ORDER BY total DESC
LIMIT 3;

Postgres:

SELECT city, SUM(amount) AS total
FROM sales
WHERE country = 'US'  
	AND order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY city
ORDER BY total DESC
LIMIT 3;

You write once, review once, and run anywhere.

‍

What's Inside PyDough-CE

PyDough-CE provides a complete, open-source reference workflow:

It converts your database schema to provide a knowledge graph as context to the model.
It translates natural-language questions into PyDough.
It compiles PyDough into SQL for your chosen database.
It executes the query and previews the results.

This approach lets you see exactly what's happening at every step, ensuring both transparency and safety.

‍

Who Should Use PyDough?

If your team juggles multiple database dialects, PyDough simplifies query management. It provides a single, clear review step, reduces dialect-specific headaches, and limits prompt injection risks inherent in open-ended SQL generation.

Analysts, data engineers, and developers alike can quickly adopt PyDough to streamline collaboration and maintain consistency across analytics workloads.

‍

Try PyDough-CE Today

PyDough-CE is fully open source and available on GitHub. Check out the PyDough-CE repository, run the quick-start guide on your own data, and experience simpler, safer analytics generation firsthand.

We have an enterprise version that adds deeper integrations, governance controls, and advanced compilation features. (Stay tuned for more!)

We look forward to your feedback and contributions as we expand PyDough’s capabilities further!