Qwen API User Guide

A comprehensive guide to understanding and using the Qwen API Proxy effectively.

Getting Started
Understanding Authentication
Choosing the Right Model
Feature Guide
File Upload Guidelines
Best Practices
Common Use Cases
Error Handling
Troubleshooting
FAQ

Getting Started

What You Need

Before you begin using the Qwen API, make sure you have:

A Qwen Account - Sign up at chat.qwen.ai if you don't have one
Your Authentication Token - Extract it using the browser script provided in the README
An HTTP Client - Any tool that can make HTTP requests (cURL, Postman, your application, etc.)

First Steps

Get Your Token: Visit the Qwen website, run the token extraction script in your browser console, and copy the generated token
Validate Your Token: Use the /v1/validate endpoint to ensure your token is working
List Available Models: Check /v1/models to see all models you can use
Make Your First Request: Try a simple chat completion to get familiar with the API

API Base URL

All requests should be sent to: https://qwen.aikit.club

Understanding Authentication

What is the Token?

The authentication token is a compressed string that contains your Qwen credentials. It's like a password that proves you have access to use the Qwen AI services.

Token Format

Starts with H4sIAAAAAAAAA...
Is compressed using gzip for security
Contains both your access token and session cookie
Typically several hundred characters long

Token Lifecycle

Valid Token → Make Requests → Token Expires → Refresh Token → Continue Using

When to Refresh

Your token will eventually expire. You'll know it's expired when you start getting authentication errors. Use the /v1/refresh endpoint to get a new token without logging in again.

Security Tips

✅ Store tokens securely (environment variables, secure storage)
✅ Never commit tokens to version control
✅ Use HTTPS only (never HTTP)
✅ Rotate tokens periodically
❌ Don't share tokens publicly
❌ Don't hardcode tokens in your application

Choosing the Right Model

Quick Model Selector

Need general conversation?
→ Use qwen-max-latest (best all-around)

Need fast responses?
→ Use qwen2.5-turbo (optimized for speed)

Need web search?
→ Use qwen2.5-max (includes web search capability)

Need deep analysis?
→ Use QVQ-Max or QWQ-32B (advanced reasoning)

Need comprehensive research?
→ Use qwen-deep-research (multi-phase research with citations)

Need code generation?
→ Use qwen3-coder-plus (specialized for coding)

Need web development?
→ Use qwen-web-dev (frontend UI/UX specialist)

Need full application?
→ Use qwen-full-stack (complete stack development)

Need vision analysis?
→ Use qwen2.5-max or qwen-max-latest (both support images)

Need very long context?
→ Use qwen2.5-14b-1m (supports 1 million tokens)

Need multimodal (audio/video)?
→ Use qwen2.5-omni-7b (handles multiple media types)

Model Capabilities Overview

Each model has different strengths. Check the Model Capabilities table in the README to see which features each model supports:

👁️ Vision - Can analyze images and visual content
💡 Reasoning - Advanced thinking and problem-solving
🌐 Web Search - Can search the web for current information
🔧 Tool Calling - Can use external tools and functions

Feature Guide

1. Basic Chat Conversations

What it does: Have text-based conversations with AI

When to use: General questions, content generation, explanations, creative writing

Best models: qwen-max-latest, qwen2.5-plus, qwen2.5-turbo

Tips:

Keep messages clear and specific
Provide context when needed
Use system messages to set behavior
Enable streaming for real-time responses

2. Web Search Integration

What it does: AI can search the web and use current information

When to use: Questions about current events, recent developments, real-time data

Requirements:

Must use a model with web search capability (check table)
Include tools: [{ type: "web_search" }] in your request

Tips:

Ask about current topics ("What happened today...")
Request recent information ("Latest news about...")
Specify time frames ("In the last week...")

3. Thinking Mode (Reasoning)

What it does: AI shows its reasoning process and thinks through complex problems

When to use: Math problems, logic puzzles, complex analysis, step-by-step solutions

Requirements:

Enable with enable_thinking: true
Set a thinking budget (recommended: 30000 tokens)

Tips:

Use for problems that need careful reasoning
Ask for step-by-step explanations
Works best with reasoning-capable models

4. Vision and Image Analysis

What it does: AI can see and understand images, photos, screenshots, diagrams

When to use: Image description, visual analysis, OCR, chart interpretation

Supported formats: JPG, PNG, GIF, WebP (most common)

Tips:

Provide clear, high-quality images
Ask specific questions about what you want to know
Combine multiple images for comparison (up to 5)
Works with URLs or base64-encoded images

5. Document Analysis

What it does: AI can read and analyze PDF documents, text files, and other documents

When to use: Document summarization, information extraction, content analysis

Supported formats: PDF, TXT, MD (most common), DOC, DOCX

Tips:

Keep documents under 20MB
Can combine with images (e.g., PDF + photo)
Ask specific questions about document content
Request summaries, key points, or specific information

6. Deep Research

What it does: Comprehensive multi-phase research with web search and citations

When to use: Academic research, market analysis, in-depth investigations

Model: Must use qwen-deep-research

Features:

Multi-phase research process
Web search integration
Source validation
Automatic citations
Comprehensive reports
Export to PDF/Markdown

Tips:

Ask broad research questions
Let the AI complete all research phases
Review citations for accuracy
Download reports for offline use

7. Image Generation

What it does: Create new images from text descriptions

When to use: Creating art, illustrations, concept designs, visual content

Options:

Size: 1024x1024, 1792x1024, 1024x1792
Quality: Standard or HD
Style: Natural or vivid

Tips:

Be descriptive in your prompts
Specify style, mood, colors
Mention composition and perspective
Iterate with edits for refinement

8. Image Editing

What it does: Modify existing images using text instructions

When to use: Changing image elements, style adjustments, adding/removing objects

Input: Original image + editing instructions

Tips:

Provide clear editing instructions
Use simple, direct commands
One change at a time works best
Can use generated images as input

9. Video Generation

What it does: Create videos from text descriptions

When to use: Video content creation, animations, visual storytelling

Options:

Size: 1280x720, 1920x1080
Duration: 5-60 seconds

Tips:

Describe motion and action clearly
Specify scene changes
Keep descriptions focused
Longer videos take more time

10. Code Generation

What it does: Generate code, debug, explain programming concepts

Model: Best with qwen3-coder-plus

Features:

Multi-language support
Function/tool calling
Code explanation
Bug fixing
Code optimization

Tips:

Specify programming language
Describe functionality clearly
Ask for comments and documentation
Request error handling

11. Web Development

What it does: Create web components, UI elements, responsive designs

Model: Use qwen-web-dev

Output: HTML, CSS, JavaScript

Features:

Responsive design
Modern CSS
Interactive components
Framework support (React, Vue)
Accessibility considerations

Tips:

Describe desired functionality
Specify design preferences
Mention framework if needed
Ask for responsive design

12. Full-Stack Development

What it does: Build complete applications with frontend, backend, database

Model: Use qwen-full-stack

Output: Complete application code

Features:

Frontend frameworks
Backend APIs
Database schemas
Authentication
Deployment-ready code

Tips:

Describe full requirements
Specify tech stack preferences
Mention scalability needs
Request security features

File Upload Guidelines

Understanding File Categories

Files are grouped into two main categories:

Media Files (Image, Audio, Video):

All considered the same category
Cannot combine different media types
Examples: JPG + MP4 = ❌ Invalid

Documents (PDF, Text files):

Separate category from media
Can combine with media files
Examples: JPG + PDF = ✅ Valid

What You Can Upload

Valid Single File Types:

✅ One or more images (up to 5)
✅ One audio file
✅ One video file
✅ One or more documents (up to 5)

Valid Combinations:

✅ Multiple images only
✅ Image(s) + Document(s)
✅ Audio + Document(s)
✅ Video + Document(s)
✅ Single media file only

Invalid Combinations:

❌ Image + Audio
❌ Image + Video
❌ Audio + Video
❌ Multiple videos
❌ Multiple audio files

Size and Duration Limits

Images:

Maximum: 10MB per image
Count: Up to 5 images
Formats: JPG, PNG, GIF, WebP recommended

Audio:

Maximum: 100MB
Duration: Up to 3 minutes (180 seconds)
Count: 1 file only
Formats: MP3, WAV, M4A, AAC recommended

Video:

Maximum: 500MB
Duration: Up to 10 minutes (600 seconds)
Count: 1 file only
Formats: MP4, MOV, AVI, MKV recommended

Documents:

Maximum: 20MB per document
Count: Up to 5 documents
Formats: PDF, TXT, MD recommended

Upload Methods

Method 1: URL
Provide a direct URL to the file hosted online

Method 2: Base64
Encode the file as base64 data and include it directly

Method 3: File Upload
Use multipart form data to upload the file directly

Best Practices

Request Optimization

Stream for Real-Time:

Enable streaming for chatbot-like experiences
Provides instant feedback to users
Better for long responses

Non-Stream for Complete Data:

Use when you need the full response at once
Better for batch processing
Easier error handling

Message Management

System Messages:

Set AI behavior and personality
Define response format
Establish context

User Messages:

Ask clear, specific questions
Provide necessary context
Keep conversations focused

Assistant Messages:

Include previous AI responses for context
Maintain conversation history
Build on previous exchanges

Context Management

Keep Relevant History:

Include important previous messages
Don't send entire conversation every time
Focus on recent relevant context

Token Awareness:

Longer conversations use more tokens
Summarize old conversations if needed
Reset context when changing topics

Rate Limiting

Respect Limits:

Don't spam requests rapidly
Implement delays between requests
Use queuing for bulk operations

Handle Failures:

Retry with exponential backoff
Don't retry immediately
Log errors for debugging

Error Recovery

Graceful Degradation:

Have fallback options
Inform users of issues
Don't crash on errors

Token Refresh:

Detect expiration automatically
Refresh before making request
Store new token securely

Common Use Cases

1. Building a Chatbot

Goal: Create an interactive conversational AI

Steps:

Initialize with system message
Maintain conversation history
Stream responses for real-time feel
Handle context window limits
Reset conversation when needed

Best Practices:

Keep last 10-20 messages
Use streaming for better UX
Implement typing indicators
Handle connection issues

2. Document Summarization

Goal: Extract key information from documents

Steps:

Upload document (PDF, TXT)
Ask for summary
Request specific sections
Extract key points

Best Practices:

Use clear, focused questions
Specify desired length
Ask for structured output
Verify important information

3. Image Analysis

Goal: Understand and describe visual content

Steps:

Provide image URL or file
Ask specific questions
Request detailed analysis
Compare multiple images

Best Practices:

Use high-quality images
Ask focused questions
Specify what to look for
Combine with text context

4. Research Assistant

Goal: Conduct comprehensive research

Steps:

Use qwen-deep-research model
Provide research topic
Wait for all phases to complete
Review sources and citations
Download report if needed

Best Practices:

Start with clear research question
Allow time for completion
Verify citations
Use non-streaming mode

5. Content Generation

Goal: Create written content

Steps:

Describe content type
Specify tone and style
Set length requirements
Iterate and refine

Best Practices:

Provide examples if possible
Be specific about requirements
Review and edit output
Regenerate if needed

6. Code Assistant

Goal: Help with programming tasks

Steps:

Use qwen3-coder-plus model
Describe functionality
Specify language/framework
Request explanations
Ask for improvements

Best Practices:

Include context about project
Specify error handling needs
Request code comments
Test generated code

7. Visual Content Creation

Goal: Generate images and videos

Steps:

Write detailed prompts
Specify size and style
Generate initial content
Edit and refine
Download final result

Best Practices:

Be descriptive
Iterate on results
Use editing for refinement
Save successful prompts

8. Multi-Language Support

Goal: Translate or work in multiple languages

Steps:

Specify source and target languages
Provide context
Request translation or generation
Verify accuracy

Best Practices:

Specify language explicitly
Provide cultural context
Verify translations
Use native speakers for review

Error Handling

Common Errors and Solutions

Authentication Errors (401)

Symptoms: "Invalid API key" or "Unauthorized"

Solutions:

Validate your token using /v1/validate
Refresh token using /v1/refresh
Regenerate token from browser
Check token format is correct

Invalid Request (400)

Symptoms: "Bad request" or "Invalid parameters"

Solutions:

Check request format matches API spec
Verify all required fields are present
Validate JSON syntax
Check parameter values are valid

Rate Limiting (429)

Symptoms: "Too many requests"

Solutions:

Slow down request rate
Implement request queuing
Add delays between requests
Use exponential backoff

File Too Large (413)

Symptoms: "File size exceeds limit"

Solutions:

Compress images before upload
Use smaller file sizes
Split large documents
Check size limits for file type

Unsupported Format (415)

Symptoms: "Unsupported media type"

Solutions:

Use supported file formats
Convert files to compatible format
Check format is in allowed list
Verify MIME type is correct

Server Error (500)

Symptoms: "Internal server error"

Solutions:

Retry request after delay
Check if service is operational
Report persistent errors
Use different endpoint if available

Error Response Structure

Every error response includes:

message: Human-readable error description
type: Error category
code: Specific error code
param: Which parameter caused error (if applicable)

Retry Strategies

Immediate Retry (Network issues):

Retry 1-2 times immediately
For temporary connection problems
Don't retry on auth errors

Exponential Backoff (Rate limits):

Wait 1s, 2s, 4s, 8s between retries
For 429 or temporary 500 errors
Give up after 3-5 attempts

No Retry (Client errors):

Don't retry 400, 401, 403, 415
Fix the request instead
These indicate client-side problems

Troubleshooting

Token Issues

Problem: Token not working

Diagnostic Steps:

Check token format (starts with H4sIAAAAAAAAA)
Validate using /v1/validate endpoint
Check if token is expired
Verify you're logged into Qwen
Regenerate token from browser

Common Causes:

Token copied incorrectly
Session expired
Account issues
Wrong token format

Streaming Problems

Problem: Stream cuts off or doesn't work

Diagnostic Steps:

Verify stream: true is set
Check you're handling SSE format
Look for [DONE] marker
Test with non-streaming first
Check network/proxy settings

Common Causes:

Incorrect SSE parsing
Network interruptions
Proxy buffering
Client timeout too short

File Upload Failures

Problem: Can't upload files

Diagnostic Steps:

Check file size against limits
Verify file format is supported
Test with smaller file
Check file is not corrupted
Verify URL is accessible (if using URL)

Common Causes:

File too large
Unsupported format
Mixing incompatible file types
Network issues

Model Not Working

Problem: Specific model gives errors

Diagnostic Steps:

Check model name spelling
Verify model exists (/v1/models)
Check model supports requested feature
Try with default model
Check capability matrix

Common Causes:

Typo in model name
Model doesn't support feature
Model deprecated or unavailable
Feature flag not set

Slow Responses

Problem: Requests take too long

Diagnostic Steps:

Check if large file uploads
Verify network connection
Try simpler request
Use faster model (turbo)
Enable streaming

Common Causes:

Large files being processed
Complex thinking/research tasks
Network latency
Server load
Long generation requested

Debug Logs Not Showing

Problem: Can't see Request/Response IDs

Diagnostic Steps:

Check you're looking at response content
Verify logs are at the end
Look for collapsible section
Check streaming vs non-streaming
Verify response completed

Common Causes:

Stream interrupted
Looking in wrong place
Response cut off
Client filtering logs

FAQ

General Questions

Q: Is this service free to use?

A: Yes, the proxy itself is free. However, you need a Qwen account, and any usage limits depend on your Qwen account tier. Free Qwen accounts have certain limitations, while premium accounts have higher limits.

Q: Are there rate limits?

A: The proxy doesn't impose its own rate limits. All limitations come from your Qwen account. Free accounts have lower limits than paid accounts.

Q: Can I use this in production?

A: Yes, but keep in mind this is an unofficial proxy. For production use, implement proper error handling, token refresh logic, and have fallback strategies.

Q: How do I get support?

A: Open an issue on GitHub for bugs or feature requests. Check existing issues and discussions for common questions.

Q: Can I self-host this?

A: Yes! The code is open source. You can deploy it to your own Cloudflare Workers account or adapt it for other platforms.

Authentication

Q: How long do tokens last?

A: Tokens expire based on Qwen's session management. You'll need to refresh or regenerate them periodically.

Q: Can I use the same token from multiple applications?

A: Yes, one token can be used across multiple applications, but be aware of rate limits.

Q: What if my token gets leaked?

A: Regenerate immediately from the Qwen website and update it in your applications.

Q: Do I need to refresh tokens manually?

A: You can implement automatic refresh using the /v1/refresh endpoint when you detect expiration.

Models

Q: Which model should I use?

A: For general use, start with qwen-max-latest. For specific needs, refer to the Model Selection guide above.

Q: Can I switch models mid-conversation?

A: Yes, but you'll need to create a new chat. Models don't share conversation context.

Q: Are all models always available?

A: Availability depends on your Qwen account. Some models may require premium access.

Q: What's the difference between Max and Turbo?

A: Max is more powerful and thorough, Turbo is faster but may be less detailed.

Features

Q: Can I use web search with any model?

A: No, only models with web search capability (check the capabilities table) support this feature.

Q: How do I enable thinking mode?

A: Add enable_thinking: true to your request and use a reasoning-capable model.

Q: Can I combine multiple features?

A: Yes! For example, you can use web search + thinking mode + vision together if the model supports all three.

Q: What's the maximum file size?

A: It varies by type: Images (10MB), Audio (100MB), Video (500MB), Documents (20MB).

Technical

Q: Is streaming faster than non-streaming?

A: Streaming provides faster first response but same total time. It's better for user experience.

Q: Can I cancel a request in progress?

A: You can close the connection, but the server may continue processing.

Q: How do I handle long conversations?

A: Keep only recent relevant messages to avoid hitting context limits. Summarize old conversations if needed.

Q: Can I use this with OpenAI SDKs?

A: Yes! Just change the base URL to https://qwen.aikit.club and use your Qwen token.

Troubleshooting

Q: Why am I getting "Invalid API key" errors?

A: Your token may be expired, invalid, or incorrectly formatted. Try validating or regenerating it.

Q: Files aren't uploading properly

A: Check file size, format, and that you're not mixing incompatible file types (e.g., image + video).

Q: Responses are cut off

A: This could be token limits, connection issues, or streaming problems. Try non-streaming mode first.

Q: Can I get request logs for debugging?

A: Yes, response content includes debug logs with Request ID and Response ID at the end.

Qwen API User Guide

Table of Contents

Getting Started

What You Need

First Steps

API Base URL

Understanding Authentication

What is the Token?

Token Format

Token Lifecycle

When to Refresh

Security Tips

Choosing the Right Model

Quick Model Selector

Model Capabilities Overview

Feature Guide

1. Basic Chat Conversations

2. Web Search Integration

3. Thinking Mode (Reasoning)

4. Vision and Image Analysis

5. Document Analysis

6. Deep Research

7. Image Generation

8. Image Editing

9. Video Generation

10. Code Generation

11. Web Development

12. Full-Stack Development

File Upload Guidelines

Understanding File Categories

What You Can Upload

Size and Duration Limits

Upload Methods

Best Practices

Request Optimization

Message Management

Context Management

Rate Limiting

Error Recovery

Common Use Cases

1. Building a Chatbot

2. Document Summarization

3. Image Analysis

4. Research Assistant

5. Content Generation

6. Code Assistant

7. Visual Content Creation

8. Multi-Language Support

Error Handling

Common Errors and Solutions

Error Response Structure

Retry Strategies

Troubleshooting

Token Issues

Streaming Problems

File Upload Failures

Model Not Working

Slow Responses

Debug Logs Not Showing

FAQ

General Questions

Authentication

Models

Features

Technical

Troubleshooting