
%20(1200%20x%20485%20px).png)
Today we’re releasing PyDough Community Edition (PyDough-CE), an open source project that finally makes natural language analytics both practical and safe.
PyDough-CE is built around PyDough, a small, human-readable analytics Domain-Specific Language (DSL) designed to bridge language models and databases. Instead of letting LLMs directly generate raw SQL — which is risky, brittle, prone to prompt injections and notoriously untrustworthy — PyDough takes a fundamentally different approach.
LLMs write short, auditable PyDough plans according to an existing knowledge graph — compact analytics “recipes” that are easy to read, easy to audit, and compile deterministically into correct, dialect-specific SQL for any backend.
The result is accuracy, transparency, and trust — all without sacrificing the speed and flexibility of natural-language interaction.
Everyone’s been chasing the ability to “talk to your data.” And now, with LLMs, it seems like the dream should be easy — just ask the database a question, right? But in reality, three big issues keep that dream from being safe or scalable:
We realized the problem wasn’t the models — it was the interface. We’ve been asking LLMs to do too much: invent logic, reason about structure, and write perfect SQL syntax across multiple dialects. So we flipped the problem around.
Instead of teaching LLMs to write SQL, what if we gave them a safe language — one designed specifically for analytics:
That’s why we built PyDough — a protective, simplifying layer between natural language and SQL.
In PyDough, the LLM never writes SQL. Instead:
This means that your results are:
And with a Knowledge Graph in the middle, PyDough also understands your schema — which tables exist, how they relate, and which fields are allowed.If you ask for something outside that scope, it simply can’t execute.If you ask anything else the LLM literally can’t do it, since the PyDough code it might generate cannot run!
Consider the query:
"Top 3 US cities by sales in the last 30 days."
PyDough produces a compact plan:
result = sales.FILTER(country == "US" and order_date >= TODAY() - DAYS(30)) \
.CALCULATE(city, total = SUM(amount)) \
.TOP_K(3, by=total.DESC())This same PyDough plan compiles seamlessly into dialect-specific SQL, such as:
Snowflake:
SELECT city, SUM(amount) AS total
FROM sales
WHERE country = 'US'
AND order_date >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY city
ORDER BY total DESC
LIMIT 3;Postgres:
SELECT city, SUM(amount) AS total
FROM sales
WHERE country = 'US'
AND order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY city
ORDER BY total DESC
LIMIT 3;You write once, review once, and run anywhere.
PyDough-CE provides a complete, open-source reference workflow:
This approach lets you see exactly what's happening at every step, ensuring both transparency and safety.
If your team juggles multiple database dialects, PyDough simplifies query management. It provides a single, clear review step, reduces dialect-specific headaches, and limits prompt injection risks inherent in open-ended SQL generation.
Analysts, data engineers, and developers alike can quickly adopt PyDough to streamline collaboration and maintain consistency across analytics workloads.
PyDough-CE is fully open source and available on GitHub. Check out the PyDough-CE repository, run the quick-start guide on your own data, and experience simpler, safer analytics generation firsthand.
We have an enterprise version that adds deeper integrations, governance controls, and advanced compilation features. (Stay tuned for more!)
We look forward to your feedback and contributions as we expand PyDough’s capabilities further!