Digio infrastructure

AI models & GPU

Run agents on managed frontier models today—or rent GPU capacity, deploy your own weights, and route Digio tasks to private endpoints in the same workspace.

Claude, GPT, Gemini Per-agent model pick GPU rental & BYOM
Managed models

Models available in Digio today

Assign a default model per agent or override per task. Usage is metered in Digio Tokens from your plan balance—the same wallet whether the agent calls Sonnet, GPT-4o, or Gemini Flash.

Anthropic Claude

  • Claude Opus 4.7 Flagship reasoning, long context, architecture and strategy work.
  • Claude Opus 4.6 Previous-generation Opus for stable, high-quality analysis.
  • Claude Sonnet 4.6 Daily driver—coding, writing, and multi-step agent loops.
  • Claude Sonnet 4.5 / 4 Fast Sonnet tiers with prompt caching on supported workloads.
  • Claude Haiku 4.5 Low-latency drafts, classification, and high-volume subtasks.

OpenAI

  • GPT-5.5 / GPT-5.4 / GPT-5.2 Latest GPT-5 family for general and agentic workloads.
  • GPT-4.1 & GPT-4o Reliable multimodal chat and tool use for production agents.
  • GPT-4o mini Cost-efficient routing for summaries and lightweight steps.
  • o3 / o3-pro / o3-mini / o4-mini Reasoning-focused models for math, planning, and verification.
  • GPT-5.3 Codex & Codex mini Code generation, refactors, and repo-aware agent skills.

Google Gemini

  • Gemini 2.5 Pro Long-context research and structured extraction.
  • Gemini 2.5 Flash High-throughput agent steps with competitive token rates.
  • Gemini 2.0 Flash Ultra-fast passes for parsing, tagging, and batch jobs.

Open & specialist APIs

  • DeepSeek Chat & Reasoner Strong value for chat and chain-of-thought style tasks.
  • Mistral Large European-hosted option for multilingual agent teams.
  • Llama 3.3 70B Open-weights class model via API—pairs well with private GPU.
  • Grok 3 Real-time oriented model for news and social monitoring agents.
  • Sonar Pro Search-grounded answers for research agents.
  • Command R+ RAG-friendly enterprise chat and retrieval workflows.

Model list and token economics evolve with provider releases. Your workspace shows live options when you assign a model to an agent; Digio Tokens debit from the same balance as in pricing.

Usage

How agents pick a model

The Coordinator can recommend Sonnet vs Opus vs a cheaper flash model based on task type. Power users set defaults per agent role—research on Sonnet, final review on Opus, bulk tagging on Haiku or Gemini Flash.

  • Per agent — default model in agent settings; override in To do or chat when needed.

  • Metered fairly — input, output, and cached tokens map to Digio Token charges (see usage in your wallet).

  • Skills stay the same — tools and integrations work across models; only latency and cost profile change.

  • Plan limits — more agents and monthly Digio Tokens on higher tiers; top up anytime on the pricing page.

GPU rental

Rent GPU and run your own models

Need a fine-tune, an air-gapped checkpoint, or predictable inference pricing? Add dedicated GPU capacity to your Digio workspace, install the serving stack you prefer, and point agents at your private endpoint.

Dedicated instances

Hourly or monthly GPU nodes (A100, H100, L40S class) attached to your tenant—isolated from other customers.

Your weights

Upload safetensors, GGUF, or pull from your registry; run Llama, Mistral, Qwen, and custom fine-tunes.

Standard serving

vLLM, TGI, Ollama, or container images you maintain—Digio agents call an OpenAI-compatible base URL.

Same orchestration

To do, team chat, skills, and collaboration unchanged—only the inference backend is yours.

Hybrid routing

Send sensitive steps to private GPU and use Claude or GPT for public research in one workflow.

Enterprise controls

VPC peering, static egress, audit logs, and model allowlists for regulated teams.

Bring your own model

Install and connect a custom model

Typical setup from zero to agents calling your endpoint:

  1. Reserve GPU

    Choose VRAM, region, and uptime (burst vs always-on). Storage for weights ships with the instance or mounts your bucket.

  2. Deploy the stack

    Start a serving image or SSH in, install CUDA drivers, and load checkpoints. Health checks confirm the model is ready.

  3. Register endpoint

    Add base URL, API key, and model id in workspace settings. Digio validates latency and token format before going live.

  4. Assign to agents

    Pick your private model as the default for selected agents; managed Claude/GPT models remain available side by side.

GPU rental is billed separately from Digio plan subscriptions. Contact us for capacity planning, SLAs, and migration from an existing inference cluster.

FAQ

Models & GPU questions

Choosing managed APIs vs self-hosted inference on Digio.

Do I pay twice—plan plus API?

Your Digio subscription covers infrastructure, agents, and included Digio Tokens. Managed model usage debits that token balance by actual input/output tokens. GPU rental is an add-on for the machines you control.

Can different agents use different models?

Yes—each agent can have its own default. Tasks and chats can override for a single run without changing the global default.

What is the difference between Sonnet and Opus?

Opus is tuned for harder reasoning and longer coherent plans; Sonnet is faster and cheaper for everyday agent loops. Haiku and flash-class models are best for volume subtasks.

Can I run only my own model and block cloud APIs?

Enterprise workspaces can restrict outbound model providers and route all agent traffic to your GPU endpoint. Hybrid mode is the default for most teams.

Which GPU sizes are available?

Offerings depend on region and demand—commonly 24–80 GB VRAM tiers for 7B–70B class models and multi-GPU nodes for larger stacks. We help size VRAM from your parameter count and quantization.

Does private GPU usage still consume Digio Tokens?

Orchestration (agents, tasks, storage) stays on your plan. Inference on your GPU is billed as GPU time; you may optionally meter token-shaped usage for internal chargeback.

Choose managed models or bring your GPU

Start on Claude and GPT today, then add dedicated GPU when you are ready to host custom weights—same agents, same tasks, your inference.