Everything you need to deploy, configure, and use Axecodi — the AI chat interface powered by NVIDIA NIM and GLM-5.1.
Axecodi is a self-hostable AI chat frontend that connects to NVIDIA NIM's API. Deploy it on Cloudflare Pages for free in under 3 minutes.
Axecodi consists of three components working together:
Get Axecodi running in minutes with no backend infrastructure required.
Sign up at build.nvidia.com and navigate to API Keys to generate your key. The free tier includes generous credits to get started.
# Download the project files
git clone https://github.com/yourrepo/axecodi.git
cd axecodi
Go to pages.cloudflare.com → Create a project → Upload the project folder directly. No build step required — it's pure HTML.
In your Cloudflare Pages project settings, navigate to Settings → Environment Variables and add:
NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxxxxxxxxx
Your Axecodi instance is now available at yourproject.pages.dev. Share it with anyone — no accounts needed to chat.
The easiest and fastest option. Free tier includes 500 deployments/month and unlimited bandwidth.
# Using Wrangler CLI
npm install -g wrangler
wrangler pages deploy . --project-name axecodi
In Cloudflare Pages → Custom Domains, add your domain. HTTPS is automatic via Cloudflare's edge.
Configure Axecodi via environment variables in your Cloudflare Pages settings.
| Variable | Description | Required |
|---|---|---|
| NVIDIA_API_KEY | Your NVIDIA NIM API key from build.nvidia.com | Required |
| DEFAULT_MODEL | Default model ID to use on load (default: glm-4-5) | Optional |
| MAX_TOKENS | Maximum tokens per response (default: 2048) | Optional |
| SYSTEM_PROMPT | Default system prompt injected into all conversations | Optional |
| RATE_LIMIT | Max requests per minute per IP (default: 30) | Optional |
Type / in the chat input to access powerful commands:
| Command | Description |
|---|---|
| /help | Show all available commands |
| /model [name] | Switch to a different AI model |
| /temp [0-2] | Set temperature for response randomness |
| /system [prompt] | Set a custom system prompt for this conversation |
| /export | Export the current chat as Markdown |
| /clear | Clear the current conversation |
| /retry | Regenerate the last AI response |
| /tokens | Show token count for current conversation |
Axecodi supports attaching files to your messages. Click the 📎 button or paste an image with Ctrl+V.
Not all models support file inputs. GLM-5.1, Llama 3.3 70B, and Qwen 2.5 72B have vision/document understanding. Check the Models page for compatibility.
The Cloudflare Worker acts as a secure proxy between the frontend and NVIDIA NIM. It handles authentication, rate limiting, and CORS.
export async function onRequestPost(context) { const { request, env } = context; const body = await request.json(); const response = await fetch( 'https://integrate.api.nvidia.com/v1/chat/completions', { method: 'POST', headers: { 'Authorization': `Bearer ${env.NVIDIA_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify(body), } ); return new Response(response.body, { headers: { 'Content-Type': 'text/event-stream', 'Access-Control-Allow-Origin': '*', }, }); }
No. The API key is stored as a Cloudflare environment variable and never sent to the browser. All requests go through the Cloudflare Worker which injects the key server-side.
Yes! The proxy worker can be modified to point to any OpenAI-compatible API (OpenRouter, Together AI, Groq, etc.) by changing the endpoint URL and auth format.
All conversation history is stored in localStorage in the user's browser. Nothing is sent to any server except the actual messages during inference.
By default, the Cloudflare Worker enforces 30 requests/minute per IP. This can be adjusted via the RATE_LIMIT environment variable.