Metadata-Version: 2.4
Name: gemini-flux
Version: 1.0.0
Summary: Smart Gemini API key manager with token-aware sliding window scheduling
Author-email: Muhammad Ali <malikasana2810@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/malikasana/gemini-flux
Project-URL: Repository, https://github.com/malikasana/gemini-flux
Keywords: gemini,api,rate-limit,key-rotation,google-ai
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: google-genai>=0.8.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: uvicorn>=0.30.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pytz>=2024.1
Requires-Dist: requests>=2.31.0
Requires-Dist: python-dotenv>=1.0.0

 

```markdown
# gemini-flux 🔥

> Give it your N Gemini API keys. It manages everything.

**Author:** Muhammad Ali — malikasana2810@gmail.com

---

## Why This Exists

If you've ever used the Gemini API on the free tier, you've hit this wall:

```
429 RESOURCE_EXHAUSTED — You exceeded your current quota.
```

The frustrating part? Google lets you create multiple projects, each with its own independent API key and quota. So the question becomes:

> *"Why am I managing these keys manually when a smart system could do it for me?"*

That's exactly what **gemini-flux** solves.

Most key rotation tools are dumb — they round-robin every X seconds regardless of what's actually happening. gemini-flux is different. It knows how many tokens your request contains, calculates the exact cooldown needed per key, and sends each request at the **earliest mathematically possible moment** — no unnecessary waiting, no wasted quota.

Built originally for a dubbing application that needed to send large translation requests (instructions + transcripts) continuously without hitting rate limits. Works for any use case.

---

## How It Works

### The Core Problem
Gemini free tier limit per project:
- **250,000 tokens per minute (TPM)**

If you send a 500,000 token request, that key needs **2 minutes** to recover:
```
cooldown = token_count / tokens_per_minute
cooldown = 500,000 / 250,000 = 2 minutes
```

### Token-Aware Sliding Window Scheduling

Instead of blindly rotating every 30 seconds, gemini-flux:

1. **Counts tokens** (FREE via Google's API) before every request
2. **Calculates exact cooldown** needed per key based on actual token usage
3. **Maintains a sliding window** per key tracking token usage over last 60 seconds
4. **Picks the key with enough capacity RIGHT NOW** — no unnecessary waiting
5. If no key is ready → waits **exactly** as long as needed for the soonest available key

### Dynamic Interval Math
```
worst_case_interval = cooldown / n_keys

8 keys, 1M token request:
  cooldown = 1,000,000 / 250,000 = 240 seconds
  interval = 240 / 8 = 30 seconds between requests

8 keys, 10k token request:
  cooldown = 10,000 / 250,000 = 2.4 seconds
  interval = 2.4 / 8 = 0.3 seconds — nearly instant!
```

The system adapts automatically. Light requests → fast. Heavy requests → smart cooldown.

### Model Exhaustion Chain

When a model's daily quota is hit on a key, gemini-flux automatically moves to the next model — not because it failed, but because it's **exhausted for the day**:

```
1. gemini-2.5-pro               → smartest (100 RPD)
2. gemini-2.5-flash             → fast + smart (250 RPD) ← main workhorse
3. gemini-2.5-flash-lite        → lightweight (1000 RPD)
4. gemini-3.1-pro-preview       → newest pro generation
5. gemini-3-flash-preview       → newest flash generation
6. gemini-3.1-flash-lite-preview → newest lite generation
```

### Smart Policy Fetcher

On startup, gemini-flux uses **1 request** to ask Gemini about its own free tier limits, then uses those numbers for all internal math. Result is cached for 7 days — so you only spend 1 request on setup per week, not every run. If Google changes their limits tomorrow → gemini-flux catches it automatically on next refresh.

---

## Total Free Capacity (8 keys)

| Model | RPD per key | x 8 keys | Daily total |
|-------|------------|----------|-------------|
| gemini-2.5-pro | 100 | x 8 | 800/day |
| gemini-2.5-flash | 250 | x 8 | 2,000/day |
| gemini-2.5-flash-lite | 1000 | x 8 | 8,000/day |
| Preview models | varies | x 8 | bonus! |
| **TOTAL** | | | **10,800+/day** |

All completely free. No credit card needed.

---

## Quick Start

### Option 1: Direct Python (Kaggle, scripts, notebooks)

```bash
git clone https://github.com/malikasana/gemini-flux
cd gemini-flux
pip install -r requirements.txt
cp .env.example .env
```

Fill in your keys in `.env`, then:

```python
from core import GeminiFlux

flux = GeminiFlux(
    keys=["key1", "key2", "key3", "key4", "key5", "key6", "key7", "key8"],
    mode="both",
    log=True
)

response = flux.generate("Translate this transcript to Spanish...")
print(response["response"])
```

### Option 2: Docker Microservice

```bash
docker build -t gemini-flux .
docker run -p 8000:8000 --env-file .env gemini-flux
```

Then from any app:

```python
from client.client import GeminiFluxClient

client = GeminiFluxClient(base_url="http://localhost:8000")
response = client.generate("Translate this transcript to Spanish...")
print(response["response"])
```

### Option 3: Kaggle Notebook

```python
!git clone https://github.com/malikasana/gemini-flux
!pip install google-genai pytz python-dotenv

import sys
sys.path.insert(0, "/kaggle/working/gemini-flux")
from core import GeminiFlux

flux = GeminiFlux(keys=["key1", "key2", ...])
response = flux.generate("your prompt here")
```

---

## Setup — Getting Your API Keys

### Step 1: Create Google Cloud Projects
1. Go to [console.cloud.google.com](https://console.cloud.google.com)
2. Create up to 10 projects — each gets independent quota
3. Name them anything — e.g. `project-1`, `project-2`

> **Pro tip:** Use multiple Google accounts to go beyond 10 projects. Each account allows up to 10 projects independently.

### Step 2: Get API Keys
For each project:
1. Go to **APIs & Services → Credentials**
2. Click **Create Credentials → API Key**
3. Copy the key

### Step 3: Add to .env
Copy `.env.example` to `.env` and fill in your keys:

```env
GEMINI_KEY_1=AIza...your_key_1
GEMINI_KEY_2=AIza...your_key_2
GEMINI_KEY_3=AIza...your_key_3
GEMINI_KEY_4=AIza...your_key_4
GEMINI_KEY_5=AIza...your_key_5
GEMINI_KEY_6=AIza...your_key_6
GEMINI_KEY_7=AIza...your_key_7
GEMINI_KEY_8=AIza...your_key_8

GEMINI_MODE=both
GEMINI_LOG=true
```

---

## Usage

### Minimum input (just works):
```python
response = flux.generate("your prompt here")
```

### Full control:
```python
response = flux.generate(
    prompt="Translate this transcript to Urdu with natural dubbing tone...",
    images=["base64_image..."],
    files=["base64_pdf..."],
    mode="flash_only",
    preferred_key=3,
    max_tokens=2000,
    temperature=0.5,
    retry=True
)
```

### Response always includes:
```python
{
    "response": "Gemini's reply...",
    "key_used": 3,
    "model_used": "gemini-2.5-flash",
    "tokens_used": 45231,
    "wait_applied": 1.8,
    "retried": False
}
```

---

## Runtime Controls

```python
flux.set_mode("flash_only")    # change mode anytime
flux.disable_key(3)            # disable key #3
flux.enable_key(3)             # re-enable key #3
flux.refresh_policy()          # force re-fetch Gemini policy
flux.status()                  # full key pool status
```

### Mode options:

| Mode | Description |
|------|-------------|
| `both` | Full exhaustion chain, best to lite (default) |
| `pro_only` | Only Pro models |
| `flash_only` | Only Flash models |
| `flash_lite_only` | Only Flash-Lite models |

---

## HTTP API (Docker Mode)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/generate` | POST | Send prompt, get response |
| `/status` | GET | Key pool status and usage |
| `/refresh-policy` | POST | Force policy re-fetch |
| `/config` | POST | Change mode, enable/disable keys |
| `/health` | GET | Health check |

### Example:
```bash
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Translate to Spanish: Hello world"}'
```

---

## Console Output

```
==================================================
  gemini-flux 🔥  Starting up with 8 keys
==================================================

[STARTUP] Checking 8 keys...
[KEY 1] ✅ Healthy
[KEY 2] ✅ Healthy
[KEY 3] ⚠️  Exhausted — will reset at midnight PT
[KEY 4] ❌ Invalid — removed from pool
[STARTUP] Pool ready: 6 healthy, 1 exhausted, 1 invalid

[MODELS] Exhaustion chain:
  1. gemini-2.5-pro
  2. gemini-2.5-flash
  3. gemini-2.5-flash-lite
  4. gemini-3.1-pro-preview
  5. gemini-3-flash-preview
  6. gemini-3.1-flash-lite-preview

[POLICY] Using cached policy (1.2 days old)
[STARTUP] Dynamic interval: 240s / 6 keys = 40.0s (worst case)
[STARTUP] ✅ gemini-flux ready! Mode: BOTH

[REQUEST] Incoming — 450,000 tokens detected
[SCHEDULER] Key #2 selected — sending via gemini-2.5-flash
[RESPONSE] ✅ Success via Key #2 (gemini-2.5-flash)
[KEY 2] gemini-2.5-flash: 1/250 requests used today
```

---

## Docker Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `GEMINI_KEY_1` ... `GEMINI_KEY_N` | required | Your API keys |
| `GEMINI_MODE` | `both` | Model mode |
| `GEMINI_LOG` | `true` | Console logging |

---

## Project Structure

```
gemini-flux/
├── core/
│   ├── __init__.py           # Public interface
│   ├── flux.py               # Main GeminiFlux class
│   ├── scheduler.py          # Token-aware sliding window brain
│   ├── key_pool.py           # Key validation and tracking
│   └── policy.py             # Smart policy fetcher
├── service/
│   └── main.py               # FastAPI microservice
├── client/
│   ├── __init__.py
│   └── client.py             # Lightweight HTTP client
├── .env.example              # Environment template
├── Dockerfile
├── requirements.txt
├── test.py
└── README.md
```

---

## Security

- Never commit your `.env` file — it's in `.gitignore` by default
- Use `.env.example` as a template — contains no real keys
- Each key validated on startup — invalid keys removed immediately

---

## License

MIT License — free to use, modify, and distribute.

---

## Author

**Muhammad Ali**
malikasana2810@gmail.com

---

*Built out of frustration with rate limits. Powered by math.*
```
