Using LLMs at Scale: A Pandas-Native Approach

September 4, 2025

Isaac Warren

Large Language Models (LLMs) are transforming how we process and generate textual data. But for data scientists and data engineers accustomed to the elegance and power of Pandas, applying these models to large-scale datasets often means leaving the familiar comfort of the DataFrame. You're suddenly wrestling with complex scripts, manual batching, and distributed computing boilerplate.

At Bodo, we believe that scaling your workload shouldn't force you to change your workflow. Bodo DataFrames is a drop-in replacement for Pandas that is designed to scale Pandas code to hundreds of nodes without modification. So, we asked ourselves: What if applying an LLM to a column of data felt as natural as calling series.str.lower()?

Today, we're excited to introduce the Bodo DataFrame AI toolkit, a new set of APIs designed to seamlessly integrate LLM inference and embedding directly into your Pandas workflows. It can be as simple as:

import bodo.pandas as pd
prompts = pd.read_csv("prompts.csv")["prompts"]
prompts.ai.llm_generate().to_csv("results.csv")

‍

How It All Began: A Real-World Challenge

The spark came from a collaboration with Pleias—an AI research lab that designs and pretrains highly efficient LLMs purpose-built for complex document processing, RAG, and data harmonization. Their models excel at enterprise, knowledge-intensive work over multimodal, long-context (32k+) documents with high accuracy, and are optimized for fully secure, local deployment on consumer GPUs/CPUs or private clouds. Crucially, their assistants are explainable and auditable, citing exact source passages so teams can verify and reuse enterprise knowledge.

Against that backdrop, we teamed up to tackle a massive pain point: correcting OCR errors across millions of scanned pages—an ideal fit for a fine-tuned LLM.

Our initial solution was a powerful, hand-crafted script. It used Bodo's map_partitions to distribute the workload, bodo.rebalance to move data onto GPU-enabled nodes, and directly invoked a vllm inference server.

The code worked, but it was complex. It required knowledge of Bodo's distributed architecture and the vllm library. We knew there had to be a better, more user-friendly way. So we set out to build it.

‍

From Pain Points to a New Philosophy

As we designed the public API, we hit a critical decision point. Our first instinct was to bundle an inference engine like vllm or llama-cpp directly into Bodo. However, we quickly realized the pitfalls:

Packaging Hell: These libraries have heavy dependencies (e.g., specific CUDA versions) and are very challenging to package reliably across all the platforms.
Lack of Flexibility: Tying our API to one specific inference engine would limit our users' choices. What if they preferred a different engine or wanted to use a cloud-based service?

So we designed our API to be compatible with any OpenAI-style endpoint (and Amazon Bedrock, the only major service we know that doesn’t provide an OpenAI-style interface). This approach allows you to:

Use managed cloud services like OpenAI, Anthropic, or Cohere.
Connect to cloud-hosted models through services like Amazon Bedrock.
Run your own open-source models served via tools like vllm, llama-cpp-python, or Ollama, which all provide OpenAI-compatible endpoints.

This pivot from providing the inference engine to connecting to one became the guiding philosophy.

‍

Introducing the Bodo DataFrame AI Toolkit 🚀

We’ve integrated our new AI capabilities under a simple .ai accessor on every Bodo Series, creating a familiar, Pandas-like experience.

Effortless Text Generation with `BodoSeries.ai.llm_generate()`

The BodoSeries.ai.llm_generate() function is the workhorse for LLM inference. It takes each element in a Series, sends it to the specified model, and returns the generated text in a new Series.

Here’s how easy it is to ask multiple questions to an OpenAI-compatible model:

import bodo.pandas as pd
from bodo.ai.backend import Backend

# Your DataFrame column of prompts
prompts = pd.Series([
    "What is the capital of France?",
    "Who wrote 'To Kill a Mockingbird'?",
    "What is the largest mammal?",
])

# Generate responses in a single, scalable call
responses = prompts.ai.llm_generate(
    api_key="your_api_key_here",
    model="gpt-3.5-turbo",
)

print(responses)

‍

Output:
0    The capital of France is Paris.
1    'To Kill a Mockingbird' was written by Harper Lee.
2    The largest mammal is the blue whale.
dtype: string[pyarrow]

We also provide first-class support for Amazon Bedrock, allowing you to leverage powerful models like Titan and Nova with the same simple API. For advanced use cases, you can even provide custom request_formatter and response_formatter functions to interact with any model on Bedrock.

Simplified Embeddings with `BodoSeries.ai.embed()`

Creating embeddings for tasks like retrieval-augmented generation (RAG) or semantic search is now just as simple. The BodoSeries.ai.embed() function converts a Series of text into a Series of embedding vectors.

Here's how you can generate embeddings using Amazon Bedrock's Titan model:

import bodo.pandas as pd
from bodo.ai.backend import Backend

# A Series of text to be embedded
documents = pd.Series([
    "bodo.ai will improve your workflows.",
    "This is a professional sentence."
])

# Generate embeddings at scale
embeddings = documents.ai.embed(
    model="amazon.titan-embed-text-v2:0",
    backend=Backend.BEDROCK,
)

print(embeddings)

‍

Output:
0    [0.123, 0.456, 0.789, ...]
1    [0.234, 0.567, 0.890, ...]
dtype: list<item: float64>[pyarrow]

‍

Bringing Inference to Your Data

What if you want to use a custom, open-source model? Our API design makes this especially easy, but you still need to run the inference server. We've streamlined that, too. Bodo now includes a utility function, bodo.spawn_process_on_nodes, to easily launch processes—like an ollama server—across your cluster.

This completes the picture, enabling a powerful, self-hosted workflow:

Load Data: Read your massive dataset into a Bodo DataFrame.
Launch Server: Use bodo.spawn_process_on_nodes to start a ollama inference server with your desired open-source model on each node of your Bodo cluster.
Run Inference: Call series.ai.llm_generate, pointing the base_url to your locally running server (http://localhost:8000).

Bodo handles the parallel requests, data distribution, and aggregation automatically. You get the power of a distributed, self-hosted model with the simplicity of a single API call.

import shutil
import time

import requests

import bodo.pandas as pd
from bodo.spawn.spawner import spawn_process_on_nodes

assert shutil.which("docker") is not None, (
    "Docker is not installed or not found in PATH"
)


def get_ollama_models(url) -> requests.Response:
    for _ in range(50):
        try:
            response = requests.get(f"{url}/api/tags", timeout=5)
            if response.status_code == 200:
                return response
            else:
                time.sleep(3)
        except requests.exceptions.RequestException:
            time.sleep(3)
    raise AssertionError("Ollama server not available yet")


def wait_for_ollama(url):
    get_ollama_models(url)


def wait_for_ollama_model(url, model_name):
    for _ in range(20):
        models = get_ollama_models(url)
        if model_name in models.text:
            return
        else:
            time.sleep(3)
    raise AssertionError(
        f"Model {model_name} not found in Ollama server at {url} after waiting for 60 seconds"
    )


def main():
    prompts = pd.Series(
        [
            "bodo.ai will improve your workflows.",
            "This is a professional sentence.",
        ]
    )

    spawn_process_on_nodes(
        "docker run -v ollama:/root/.ollama -p 11434:11434 --name bodo_test_ollama ollama/ollama:latest".split(
            " "
        )
    )
    wait_for_ollama("http://localhost:11434")
    spawn_process_on_nodes(
        "docker exec bodo_test_ollama ollama run smollm:135m".split(" ")
    )
    wait_for_ollama_model("http://localhost:11434", "smollm:135m")
    results = prompts.ai.llm_generate(
        base_url="http://localhost:11434/v1",
        api_key="",
        model="smollm:135m",
        temperature=0.1,
    )
    pd.DataFrame(
        {
            "prompts": prompts,
            "results": results,
        }
    ).to_parquet("test_ollama.pq")

    spawn_process_on_nodes("docker rm bodo_test_ollama -f".split(" "))


if __name__ == "__main__":
    main()

‍

Conclusion

The new Bodo DataFrame AI toolkit is designed to make working with LLMs at scale even simpler. By integrating these capabilities with a Pandas-native feel, we enable data teams to build AI applications without leaving their familiar data processing environment. To scale your AI workflows, check out our documentation to get started. We look forward to seeing what you build!

If you are interested in Bodo:

‍