Padwan LLM
Unified client for OpenAI, Gemini, Mistral, and Grok APIs. Supports also OpenAI-compatible endpoints.
Why
Most LLM client libraries pull in heavy dependencies (pydantic, httpx) and lock you into a single provider's SDK. Padwan LLM takes a different approach:
- Single runtime dependency — only niquests, no pydantic, no httpx. Zero overhead beyond the HTTP layer.
- TypedDict-only — all request/response types are plain
TypedDicts, no validation framework required. No runtime cost, full editor support. - Multi-provider, extensible — supports the major providers (OpenAI, Gemini, Mistral, Grok) with a shared base class that makes adding new ones straightforward.
Features
- Unified interface - Single API for multiple LLM providers
- Async-first - Built on async/await for high performance
- HTTP/2 and HTTP/3 - Automatic protocol negotiation via niquests
- Fully typed - Complete type hints with Python 3.13+ generics
- Streaming support - Real-time token streaming for all providers
- Conversation management - Built-in conversation history handling
- Agentic loop -
AgentSessiondrives multi-turn conversations with tool dispatch, parallel execution, approval hooks, and snapshot persistence - MCP support - Streamable HTTP and stdio transports for Model Context Protocol tool servers
- Gemini thinking - Stream thought tokens separately via an
on_thoughtcallback
Supported Providers
| Provider | Chat | Streaming | Batch | Transcription | Embeddings |
|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ❌ | ❌ |
| Gemini | ✅ | ✅ | ✅ | ❌ | ❌ |
| Mistral | ✅ | ✅ | ❌ | ✅ | ✅ |
| Grok | ✅ | ✅ | ✅ | ❌ | ❌ |
| OpenAI-Compatible | ✅ | ✅ | ➕ | ➕ | ➕ |
Quick Example
from padwan_llm import LLMClient
async with LLMClient(model="gpt-4o") as client:
response, usage = await client.complete_chat(
[{"role": "user", "content": "Hello!"}]
)
print(response["content"])
Installation
Or with uv:
Agentic loop
AgentSession wraps a conversation with the loop that calls the LLM, dispatches any tool calls, feeds the results back,
and repeats until the model produces a plain text answer.
It accepts individual McpTool instances and whole McpTransport servers in the same list, transports are entered as part of the session lifecycle:
from padwan_llm import AgentSession, LLMClient, McpStdio
async with AgentSession(
client=LLMClient(model="gpt-4o"),
mcp_tools=[McpStdio(command="uvx", args=["weather-mcp"])],
system="You have access to weather tools.",
) as session:
text = await session.send("What is the weather in Paris?")
See the Agents page for approval hooks, parallel execution, and snapshot persistence.
MCP (Model Context Protocol)
Connect to MCP tool servers over streamable HTTP or stdio:
from padwan_llm import McpStreamable, McpStdio
async with McpStreamable(url="https://mcp.example.com/mcp", token="sk-...") as mcp:
result = await mcp.tools[0].handler({"query": "test"})
See the MCP page for architecture diagrams, the dual-channel design of McpStreamable, the stdio reader model, and the reconnection flow.
CLI / TUI
The interactive CLI/TUI is available as a separate package: padwan-cli.