Skip to content

Padwan LLM

Unified client for OpenAI, Gemini, Mistral, and Grok APIs. Supports also OpenAI-compatible endpoints.

Why

Most LLM client libraries pull in heavy dependencies (pydantic, httpx) and lock you into a single provider's SDK. Padwan LLM takes a different approach:

  • Single runtime dependency — only niquests, no pydantic, no httpx. Zero overhead beyond the HTTP layer.
  • TypedDict-only — all request/response types are plain TypedDicts, no validation framework required. No runtime cost, full editor support.
  • Multi-provider, extensible — supports the major providers (OpenAI, Gemini, Mistral, Grok) with a shared base class that makes adding new ones straightforward.

Features

  • Unified interface - Single API for multiple LLM providers
  • Async-first - Built on async/await for high performance
  • HTTP/2 and HTTP/3 - Automatic protocol negotiation via niquests
  • Fully typed - Complete type hints with Python 3.13+ generics
  • Streaming support - Real-time token streaming for all providers
  • Conversation management - Built-in conversation history handling
  • Agentic loop - AgentSession drives multi-turn conversations with tool dispatch, parallel execution, approval hooks, and snapshot persistence
  • MCP support - Streamable HTTP and stdio transports for Model Context Protocol tool servers
  • Gemini thinking - Stream thought tokens separately via an on_thought callback

Supported Providers

Provider Chat Streaming Batch Transcription Embeddings
OpenAI
Gemini
Mistral
Grok
OpenAI-Compatible

Quick Example

export OPENAI_API_KEY="sk-..."
uv run padwan-llm "Hello!" -m gpt-4o-mini
from padwan_llm import LLMClient

async with LLMClient(model="gpt-4o") as client:
    response, usage = await client.complete_chat(
        [{"role": "user", "content": "Hello!"}]
    )
    print(response["content"])

Installation

pip install padwan-llm

Or with uv:

uv add padwan-llm

Agentic loop

AgentSession wraps a conversation with the loop that calls the LLM, dispatches any tool calls, feeds the results back, and repeats until the model produces a plain text answer. It accepts individual McpTool instances and whole McpTransport servers in the same list, transports are entered as part of the session lifecycle:

from padwan_llm import AgentSession, LLMClient, McpStdio

async with AgentSession(
    client=LLMClient(model="gpt-4o"),
    mcp_tools=[McpStdio(command="uvx", args=["weather-mcp"])],
    system="You have access to weather tools.",
) as session:
    text = await session.send("What is the weather in Paris?")

See the Agents page for approval hooks, parallel execution, and snapshot persistence.

MCP (Model Context Protocol)

Connect to MCP tool servers over streamable HTTP or stdio:

from padwan_llm import McpStreamable, McpStdio

async with McpStreamable(url="https://mcp.example.com/mcp", token="sk-...") as mcp:
    result = await mcp.tools[0].handler({"query": "test"})

See the MCP page for architecture diagrams, the dual-channel design of McpStreamable, the stdio reader model, and the reconnection flow.

CLI / TUI

The interactive CLI/TUI is available as a separate package: padwan-cli.

# One-shot prompt
uvx padwan-cli "Explain Python decorators" -m gemini-2.5-flash

# Interactive chat
uvx padwan-cli chat -m gpt-4o