Running Codex CLI with TogetherAI Credits – A Real-World, No-MCP Guide

After several hours of debugging, I finally have a stable setup that lets me use Codex CLI (and Gemini CLI) powered entirely by my Together AI credits (plus Replicate AI when needed). No MCP server, no OpenAI billing.

This post captures the exact journey — including all the frustrating errors we hit and how we ultimately solved them by forking LiteLLM.

The Problems We Ran Into

LiteLLM proxy works great in theory, but pairing it with modern OpenAI-style CLIs like Codex revealed some rough edges when routing to Replicate:

“Invalid model name passed” + empty /v1/models list
“OPENAI_BASE_URL is deprecated” warnings
Unsupported parameters (parallel_tool_calls, reasoning_effort, web_search_options) → fixed with drop_params: true
The stubborn “sequence item 1: expected str instance, list found” error
→ Triggered by Received Model Group=replicate/google/gemini-2.5-flash (or replicate/openai/gpt-5, etc.)
Followed by chat template failures and TypeError in default_pt() because Codex sometimes sends complex message content (lists instead of plain strings).

These “Model Group” fallback issues happen when Codex requests a model name that doesn’t exactly match one of your model_name entries in config.yaml. LiteLLM’s router turns unknown names into internal lists, which then breaks the Replicate chat handler.

We tried every clean config trick (num_retries: 0, fallback: false, strict model aliases, etc.), but the errors kept coming back. At that point, forking and patching LiteLLM became the practical solution.

(Note: This is a known pain point — similar “Model Group” and Codex compatibility issues appear in several LiteLLM GitHub issues.)

The Working Setup (After Patching)

1. Fork and Patch LiteLLM

I forked the official repo (BerriAI/litellm), then applied targeted fixes:

Stricter early model name validation for Replicate deployments (prevent phantom Model Groups)
Safer default_pt fallback in the prompt factory to handle list-based content from Codex
Disabled aggressive fallbacks in the router for Replicate models

After the patch:

cd ~/Projects/litellm-fork
pip install -e '.[proxy]'

(Verify with: python -c "import litellm; print(litellm.__file__)" — it should point to your fork.)

2. Clean `config.yaml`

model_list:
  - model_name: qwen-coder
    litellm_params:
      model: replicate/qwen/qwen2.5-coder-32b-instruct
      api_key: os.environ/REPLICATE_API_KEY

  - model_name: llama-70b
    litellm_params:
      model: replicate/meta/meta-llama-3.1-70b-instruct
      api_key: os.environ/REPLICATE_API_KEY

  - model_name: grok4
    litellm_params:
      model: replicate/xai/grok-4
      api_key: os.environ/REPLICATE_API_KEY

litellm_settings:
  drop_params: true
  num_retries: 0
  fallback: false

general_settings:
  master_key: sk-your-super-secret-master-key-change-this-2026

3. Start the Proxy

export REPLICATE_API_KEY=r8_XXXXXXXXXXXXXXXXXXXXXXXX

litellm --config config.yaml --port 4000

4. Codex CLI Configuration

~/.codex/config.toml:

openai_base_url = "http://localhost:4000/v1"
model = "qwen-coder"

Run it:

codex

Switch models anytime with /model qwen-coder, /model llama-70b, or /model grok4.

5. Gemini CLI (Same Proxy)

~/.gemini/settings.json:

{
  "googleGeminiBaseUrl": "http://localhost:4000/gemini",
  "apiKey": "sk-your-super-secret-master-key-change-this-2026"
}

Recommended Models

qwen-coder → Best daily driver for coding (React Native, Expo, etc.)
GLX-4.7 → Reliable for planning and architecture

Key Lessons from This Journey

Use short, clean model_name aliases in your config.
drop_params: true + disabling retries/fallbacks helps a lot.
When clean configs aren’t enough, a small fork/patch makes the setup production-ready for Codex.
Alternative if you want to avoid maintaining a fork: Try Aider — it has native Replicate support and fewer compatibility issues (aider --model replicate/qwen/qwen2.5-coder-32b-instruct).

What I’m Building With This

A React Native Expo app for bulk SMS: upload a CSV of numbers + message template → send campaigns + view delivery reports. Codex is now smoothly editing the codebase using my Replicate credits.

Subtotal	$0.00
Total	$0.00

Running Codex CLI with TogetherAI Credits – A Real-World, No-MCP Guide

The Problems We Ran Into

The Working Setup (After Patching)

1. Fork and Patch LiteLLM

2. Clean `config.yaml`

3. Start the Proxy

4. Codex CLI Configuration

5. Gemini CLI (Same Proxy)

Recommended Models

Key Lessons from This Journey

What I’m Building With This

Meta’s $2 Billion Lobbying Machine Exposed: They’re Pushing Digital ID Straight Into Your Phone’s OS – And Google too

Recent Posts

Recent Comments

Running Codex CLI with TogetherAI Credits – A Real-World, No-MCP Guide

The Problems We Ran Into

The Working Setup (After Patching)

1. Fork and Patch LiteLLM

2. Clean config.yaml

3. Start the Proxy

4. Codex CLI Configuration

5. Gemini CLI (Same Proxy)

Recommended Models

Key Lessons from This Journey

What I’m Building With This

Meta’s $2 Billion Lobbying Machine Exposed: They’re Pushing Digital ID Straight Into Your Phone’s OS – And Google too

Related Articles

Meta’s $2 Billion Lobbying Machine Exposed: They’re Pushing Digital ID Straight Into Your Phone’s OS – And Google too

You need to make money? This message is not a secret.

Tohju Updates: March 2026 Edition

Anthropic vs Pentagon: AI Giant Defies DoD on Weapons & Surveillance – OpenAI Swoops In!

Add Pay-Per-Use Billing to Your Site in Days – Before Your Competitors Do

Recent Posts

Recent Comments

Shopping Cart

2. Clean `config.yaml`