Padwan LLM

Unified client for OpenAI, Gemini, Mistral, and Grok APIs. Supports also OpenAI-compatible endpoints.

Why

Most LLM client libraries pull in heavy dependencies (pydantic, httpx) and lock you into a single provider's SDK. Padwan LLM takes a different approach:

Single runtime dependency — only niquests, no pydantic, no httpx. Zero overhead beyond the HTTP layer.
TypedDict-only — all request/response types are plain TypedDicts, no validation framework required. No runtime cost, full editor support.
Multi-provider, extensible — supports the major providers (OpenAI, Gemini, Mistral, Grok) with a shared base class that makes adding new ones straightforward.

Features

Unified interface - Single API for multiple LLM providers
Async-first - Built on async/await for high performance
HTTP/2 and HTTP/3 - Automatic protocol negotiation via niquests
Fully typed - Complete type hints with Python 3.13+ generics
Streaming support - Real-time token streaming for all providers
Conversation management - Built-in conversation history handling
Agentic loop - AgentSession drives multi-turn conversations with tool dispatch, parallel execution, approval hooks, and snapshot persistence
MCP support - Streamable HTTP and stdio transports for Model Context Protocol tool servers
Gemini thinking - Stream thought tokens separately via an on_thought callback

Supported Providers

Provider	Chat	Streaming	Batch	Transcription	Embeddings
OpenAI	✅	✅	✅	❌	❌
Gemini	✅	✅	✅	❌	❌
Mistral	✅	✅	❌	✅	✅
Grok	✅	✅	✅	❌	❌
OpenAI-Compatible	✅	✅	➕	➕	➕

Quick Example

export OPENAI_API_KEY="sk-..."
uv run padwan-llm "Hello!" -m gpt-4o-mini

from padwan_llm import LLMClient

async with LLMClient(model="gpt-4o") as client:
    response, usage = await client.complete_chat(
        [{"role": "user", "content": "Hello!"}]
    )
    print(response["content"])

Installation

pip install padwan-llm

Or with uv:

uv add padwan-llm

Agentic loop

AgentSession wraps a conversation with the loop that calls the LLM, dispatches any tool calls, feeds the results back, and repeats until the model produces a plain text answer. It accepts individual McpTool instances and whole McpTransport servers in the same list, transports are entered as part of the session lifecycle:

from padwan_llm import AgentSession, LLMClient, McpStdio

async with AgentSession(
    client=LLMClient(model="gpt-4o"),
    mcp_tools=[McpStdio(command="uvx", args=["weather-mcp"])],
    system="You have access to weather tools.",
) as session:
    text = await session.send("What is the weather in Paris?")

See the Agents page for approval hooks, parallel execution, and snapshot persistence.

MCP (Model Context Protocol)

Connect to MCP tool servers over streamable HTTP or stdio:

from padwan_llm import McpStreamable, McpStdio

async with McpStreamable(url="https://mcp.example.com/mcp", token="sk-...") as mcp:
    result = await mcp.tools[0].handler({"query": "test"})

See the MCP page for architecture diagrams, the dual-channel design of McpStreamable, the stdio reader model, and the reconnection flow.

CLI / TUI

The interactive CLI/TUI is available as a separate package: padwan-cli.

# One-shot prompt
uvx padwan-cli "Explain Python decorators" -m gemini-2.5-flash

# Interactive chat
uvx padwan-cli chat -m gpt-4o