Metadata-Version: 2.4
Name: autourgos-vertexai-modelkit
Version: 0.1.1
Summary: A production-ready, developer-friendly Python wrapper for Google Gemini models on Vertex AI with async support, multi-turn chat, JSON mode, function calling, and MaaS adapters.
Author: DevxJitin
License: MIT
Project-URL: Homepage, https://github.com/DevxJitin/autourgos-vertexai-modelkit
Project-URL: Repository, https://github.com/DevxJitin/autourgos-vertexai-modelkit
Project-URL: Issues, https://github.com/DevxJitin/autourgos-vertexai-modelkit/issues
Keywords: vertex-ai,gemini,google-cloud,genai,llm,ai,machine-learning,model-as-a-service
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: google-genai>=1.0.0
Provides-Extra: maas
Requires-Dist: httpx>=0.25.0; extra == "maas"
Requires-Dist: google-auth>=2.20.0; extra == "maas"
Provides-Extra: all
Requires-Dist: httpx>=0.25.0; extra == "all"
Requires-Dist: google-auth>=2.20.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"

<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/DevxJitin/autourgos-vertexai-modelkit/main/README/dark.png">
    <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/DevxJitin/autourgos-vertexai-modelkit/main/README/light.png">
    <img alt="Autourgos Vertex AI ModelKit" src="https://raw.githubusercontent.com/DevxJitin/autourgos-vertexai-modelkit/main/README/dark.png" width="600">
  </picture>
</p>

<h3 align="center">The Developer-Friendly Vertex AI Wrapper</h3>

<p align="center">
  <img src="https://img.shields.io/badge/python-3.10+-blue?logo=python&logoColor=white" alt="Python 3.10+">
  <img src="https://img.shields.io/badge/license-MIT-green" alt="License: MIT">
  <img src="https://img.shields.io/badge/vertex_ai-google_cloud-4285F4?logo=google-cloud&logoColor=white" alt="Vertex AI">
  <img src="https://img.shields.io/badge/version-0.1.0-orange" alt="Version 0.1.0">
</p>

---

## Table of Contents

1.  [Why This Wrapper?](#why-this-wrapper)
2.  [Features](#features)
3.  [Package Structure](#package-structure)
4.  [Installation](#installation)
5.  [Authentication](#authentication)
    - [API Key (Development)](#option-a--api-key-development)
    - [Application Default Credentials (Production)](#option-b--application-default-credentials-production)
    - [MaaS Authentication](#maas-authentication)
6.  [Quick Start](#quick-start)
7.  [VertexTextModel](#vertextextmodel)
    - [Constructor Parameters](#constructor-parameters)
    - [invoke()](#invoke)
    - [System Instructions](#system-instructions)
    - [Multi-turn Chat](#multi-turn-chat)
    - [Async Support](#async-support)
    - [JSON Mode](#json-mode)
    - [Streaming](#streaming)
    - [Structured Output](#structured-output)
    - [Prompt Templates](#prompt-templates)
    - [Batch Processing](#batch-processing)
8.  [VertexMultiModel](#vertexmultimodel)
    - [Supported Media Types](#supported-media-types)
    - [Multi Constructor Parameters](#multi-constructor-parameters)
    - [Image Analysis](#image-analysis)
    - [Video Analysis](#video-analysis)
    - [Audio Transcription](#audio-transcription)
    - [PDF / Document Analysis](#pdf--document-analysis)
    - [Code File Analysis](#code-file-analysis)
    - [Mixed Media Input](#mixed-media-input)
    - [Remote URLs](#remote-urls)
    - [File Input Formats](#file-input-formats)
    - [Async MultiModel](#async-multimodel)
    - [Multi Structured Output](#multi-structured-output)
9.  [VertexMaaSModel](#vertexmaasmodel)
    - [MaaS Constructor Parameters](#maas-constructor-parameters)
    - [Async MaaS](#async-maas)
10. [Error Handling](#error-handling)
    - [Exception Hierarchies](#exception-hierarchies)
11. [Logging](#logging)
12. [Environment Variables Reference](#environment-variables-reference)
13. [Advanced Configuration](#advanced-configuration)
    - [Thinking (Extended Reasoning)](#thinking-extended-reasoning)
    - [Context Manager](#context-manager)
    - [Retry and Backoff](#retry-and-backoff)
    - [Safety Settings](#safety-settings)
    - [MaaS with System Instruction](#maas-with-system-instruction)
14. [License](#license)

---

## Why This Wrapper?

When building on Google Vertex AI, developers typically have to choose between raw low-level SDKs or heavyweight orchestration frameworks. **Autourgos ModelKit** hits the sweet spot: it provides a high-level, developer-friendly API specifically tailored for Gemini and Vertex MaaS, without the overhead or complexity of massive frameworks.

### Ecosystem Comparison

| Feature / Capability | `autourgos-modelkit` (This) | `google-genai` (Raw SDK) | LangChain (`langchain-google-vertexai`) | LlamaIndex (`llama-index-llms-vertex`) |
| :--- | :--- | :--- | :--- | :--- |
| **Primary Focus** | Rapid App & Feature Dev | Low-level Integration | Agent Orchestration & Chaining | Data Ingestion & RAG |
| **Learning Curve** | 🟢 Lowest (One-liners) | 🟡 Medium (Verbose Types) | 🔴 High (LangChain abstractions) | 🟡 Medium (Data-centric) |
| **API Surface** | Unified (`invoke`, `ainvoke`) | Separated by `types` & Configs | Runnables (`invoke`, `stream`) | Query Engines & Indexes |
| **MaaS (Third-Party) Auth**| ✅ Native (`VertexMaaSModel`) | ❌ Requires Vertex AI SDK | ✅ Model Garden Supported | ✅ Model Garden Supported |
| **Multimodal Inputs** | ✅ Any file, URL, mixed-media | 🟡 Verbose `Part` initialization | 🟡 Complex Zod schema limits | 🟡 Primarily Image-focused |
| **Client Caching** | ✅ Automatic (Low latency) | ❌ Re-instantiated | 🟡 Depends on usage | 🟡 Depends on usage |
| **Retry & Backoff** | ✅ Built-in (Configurable) | ❌ Manual implementation | ✅ Built-in | ✅ Built-in |
| **gRPC Noise Suppression** | ✅ Automatic | ❌ Manual env vars needed | ❌ Manual env vars needed | ❌ Manual env vars needed |

### Eliminating Raw SDK Boilerplate

The raw `google-genai` SDK is **powerful but verbose**. Common tasks like setting system instructions, streaming with retries, or getting token counts require dozens of lines of boilerplate.

**Autourgos ModelKit** fixes all of that out of the box:

| Pain Point (raw SDK) | Autourgos Solution |
|---|---|
| 15+ lines to make a simple call | `llm.invoke("Hello")` — one line |
| No system instruction parameter | `system_instruction="You are a pirate"` |
| No retry logic | Built-in exponential backoff |
| No streaming + retry | Word-level streaming with auto-retry |
| Client re-created every call (slow) | Client caching — near-zero overhead |
| Silent failures | Rich error messages showing actual API errors |
| No async support | `await llm.ainvoke()` + `abatch_invoke()` |
| No chat support | `llm.chat([...messages...])` |
| No JSON mode | `response_mime_type="application/json"` |
| No latency tracking | `latency_ms` in every structured response |
| Vision-only multimodal | `VertexMultiModel` — images, video, audio, PDF, code, URLs |
| Noisy gRPC logs | Suppressed automatically |

---

## Features

- **Unified API** — `VertexTextModel`, `VertexMultiModel`, `VertexMaaSModel` with identical `.invoke()` interface
- **True Multimodal** — images, video, audio, PDFs, code files, and HTTPS URLs in one model
- **System Instructions** — set AI persona with a single parameter
- **Async Support** — `ainvoke()`, `abatch_invoke()` for non-blocking calls
- **Multi-turn Chat** — `chat()` method with conversation history
- **JSON Mode** — enforce structured JSON responses with schemas
- **Function Calling** — pass tool declarations for agentic workflows
- **Streaming** — word-level chunk yielding with automatic retries
- **Structured Output** — response dict with token counts + latency tracking
- **Safety Settings** — configurable content filters
- **Resilient Retries** — exponential backoff on all API calls (including MaaS)
- **Client Caching** — SDK client created once, reused across calls (low latency)
- **Dual Auth** — API key (dev) or ADC (production)
- **Thinking Levels** — Gemini extended reasoning (`minimal` → `high`)
- **Batch Processing** — `batch_invoke()` for multiple prompts
- **Context Manager** — `with VertexTextModel(...) as llm:`
- **Structured Logging** — `set_log_level("DEBUG")` for diagnostics
- **Zero Runtime Noise** — stderr and gRPC diagnostics suppressed automatically

---

## Package Structure

```
src/
└── autourgos_vertexai_modelkit/
    ├── __init__.py              # Top-level exports (models, exceptions, utilities)
    ├── core/
    │   ├── billing.py           # Structured output builder with latency
    │   ├── logging.py           # Structured logging & latency tracking
    │   ├── normalization.py     # Input validation, JSON mode, safety settings
    │   ├── prompting.py         # Template parsing helpers
    │   ├── response.py          # Response + thinking extraction
    │   ├── runtime.py           # Env setup, stderr suppression
    │   └── sdk.py               # google-genai loader & cached client factory
    ├── textmodel/
    │   └── base.py              # VertexTextModel (invoke, ainvoke, chat, batch)
    ├── multimodel/
    │   └── base.py              # VertexMultiModel (images, video, audio, PDF, code, URLs)
    └── maasmodel/
        └── base.py              # VertexMaaSModel (invoke, ainvoke, batch)
```

---

## Installation

```bash
pip install autourgos-vertexai-modelkit
```

With **MaaS support** (third-party models via OpenAI-compatible endpoint):

```bash
pip install autourgos-vertexai-modelkit[maas]
```

Install **everything** (MaaS + dev tools):

```bash
pip install autourgos-vertexai-modelkit[all]
```

For **development** (editable install from source):

```bash
git clone https://github.com/DevxJitin/autourgos-vertexai-modelkit.git
cd autourgos-vertexai-modelkit
pip install -e ".[all,dev]"
```

---

## Authentication

### Option A — API Key (Development)

```bash
# PowerShell
$env:GOOGLE_CLOUD_API_KEY = "your-api-key"

# bash / zsh
export GOOGLE_CLOUD_API_KEY="your-api-key"
```

Or pass it directly:

```python
llm = VertexTextModel(model="gemini-3-flash-preview", api_key="your-api-key")
```

### Option B — Application Default Credentials (Production)

```bash
gcloud auth application-default login
$env:GOOGLE_CLOUD_PROJECT  = "your-gcp-project-id"
$env:GOOGLE_CLOUD_LOCATION = "us-central1"   # optional, defaults to us-central1
```

Or pass explicitly:

```python
llm = VertexTextModel(
    model="gemini-3-flash-preview",
    project="your-gcp-project-id",
    location="us-central1",
)
```

### MaaS Authentication

MaaS models use OAuth2 Bearer tokens:

- Pass `access_token="ya29.xxx"` directly, **or**
- Set up ADC — the wrapper auto-refreshes tokens (recommended)

---

## Quick Start

```python
from autourgos_vertexai_modelkit import VertexTextModel, VertexMultiModel, VertexMaaSModel

# --- Text ---
llm = VertexTextModel(model="gemini-3.1-flash-lite-preview", api_key="YOUR_KEY")
print(llm.invoke("Explain Vertex AI in one sentence."))

# --- Multimodal (image, video, audio, PDF, code) ---
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")
print(multi.invoke("Describe this image.", files="photo.jpg"))
print(multi.invoke("Summarize this video.", files="clip.mp4"))
print(multi.invoke("Transcribe this audio.", files="speech.mp3"))
print(multi.invoke("Extract key points.", files="report.pdf"))

# --- MaaS (third-party model) ---
maas = VertexMaaSModel(
    model="moonshotai/kimi-k2-thinking-maas",
    project="your-project-id",
    location="global",
)
print(maas.invoke("Hello, Kimi!"))
```

---

## VertexTextModel

```python
from autourgos_vertexai_modelkit import VertexTextModel
```

### Constructor Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `model` | `str` | **required** | Model ID (e.g. `"gemini-3.1-flash-lite-preview"`) |
| `api_key` | `str \| None` | `None` | API key. Falls back to env vars. |
| `project` | `str \| None` | `None` | GCP project ID. Ignored with API key. |
| `location` | `str \| None` | `None` | GCP region. Defaults to `us-central1`. |
| `system_instruction` | `str \| None` | `None` | System prompt / AI persona. |
| `prompt_template` | `str \| None` | `None` | Python format-string template. |
| `temperature` | `float \| None` | `None` | Sampling temperature (0.0–2.0). |
| `top_p` | `float \| None` | `None` | Nucleus sampling (0.0–1.0). |
| `top_k` | `int \| None` | `None` | Top-k sampling. |
| `max_tokens` | `int \| None` | `None` | Maximum output tokens. |
| `thinking_level` | `str \| None` | `None` | Extended reasoning: `"minimal"`, `"low"`, `"medium"`, `"high"`. |
| `response_mime_type` | `str \| None` | `None` | `"application/json"` for JSON mode. |
| `response_schema` | `Any` | `None` | JSON schema for structured responses. |
| `tools` | `list \| None` | `None` | Function calling declarations. |
| `safety_settings` | `list \| None` | `None` | Content filter configuration. |
| `stop_sequences` | `list[str] \| None` | `None` | Stop generation at these strings. |
| `presence_penalty` | `float \| None` | `None` | Penalize repeated tokens. |
| `frequency_penalty` | `float \| None` | `None` | Penalize frequent tokens. |
| `seed` | `int \| None` | `None` | Deterministic output seed. |
| `structured_output` | `bool` | `False` | Return dict with token counts + latency. |
| `stream` | `bool` | `False` | Yield word-level chunks. |
| `max_retries` | `int` | `3` | Retry attempts on failure. |
| `timeout` | `float \| None` | `30.0` | Per-request timeout (seconds). |
| `backoff_factor` | `float` | `0.5` | Exponential backoff multiplier. |

### invoke()

```python
result = llm.invoke("What is Vertex AI?")
result = llm.invoke(prompt_variables={"topic": "AI"})
```

### System Instructions

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    system_instruction="You are a concise technical writer. Always use bullet points.",
)
print(llm.invoke("Explain microservices architecture."))
```

### Multi-turn Chat

```python
llm = VertexTextModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

response = llm.chat([
    {"role": "user", "content": "Hi, I'm building a web app."},
    {"role": "model", "content": "Great! What tech stack are you using?"},
    {"role": "user", "content": "FastAPI + React. How should I structure the project?"},
])
print(response)
```

### Async Support

```python
import asyncio

llm = VertexTextModel(model="gemini-3.1-flash-lite-preview", api_key="YOUR_KEY")

async def main():
    # Single async call
    result = await llm.ainvoke("Explain quantum computing.")
    print(result)

    # Concurrent batch
    results = await llm.abatch_invoke([
        "What is Python?",
        "What is Rust?",
        "What is Go?",
    ])
    for r in results:
        print(r[:100])

asyncio.run(main())
```

### JSON Mode

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    response_mime_type="application/json",
)
result = llm.invoke("List 3 programming languages with their year of creation as JSON.")
print(result)  # Valid JSON string
```

With a schema:

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    response_mime_type="application/json",
    response_schema={
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "year": {"type": "integer"},
            },
        },
    },
)
```

### Streaming

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    stream=True,
)

for chunk in llm.invoke("Write a haiku about the ocean."):
    print(chunk, end="", flush=True)
```

### Structured Output

Returns a dict with token counts and latency:

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    structured_output=True,
)

result = llm.invoke("What is 2 + 2?")
# {
#   "model": "gemini-3.1-flash-lite-preview",
#   "response": "4",
#   "input_tokens": 12,
#   "output_tokens": 1,
#   "total_tokens": 13,
#   "latency_ms": 245.67
# }
```

### Prompt Templates

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    prompt_template="Translate the following to {language}:\n\n{text}",
)

result = llm.invoke(prompt_variables={"language": "French", "text": "Hello world"})
```

### Batch Processing

```python
llm = VertexTextModel(model="gemini-3.1-flash-lite-preview", api_key="YOUR_KEY")

results = llm.batch_invoke([
    "What is Python?",
    "What is JavaScript?",
    "What is Rust?",
])
```

---

## VertexMultiModel

```python
from autourgos_vertexai_modelkit import VertexMultiModel
```

A true multimodal model that processes **any** media type Gemini supports — not just images.

### Supported Media Types

| Category | Formats | Extensions |
|---|---|---|
| **Images** | PNG, JPEG, WebP, GIF, SVG | `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.svg` |
| **Video** | MP4, MOV, AVI, WebM, MPEG, FLV, 3GPP, WMV | `.mp4`, `.mov`, `.avi`, `.webm`, `.mpeg`, `.flv`, `.3gp`, `.wmv` |
| **Audio** | MP3, WAV, AAC, FLAC, OGG | `.mp3`, `.wav`, `.aac`, `.flac`, `.ogg` |
| **Documents** | PDF | `.pdf` |
| **Text / Code** | Python, JS, TS, Java, Go, Rust, C++, HTML, CSS, CSV, SQL, YAML, JSON, XML, Markdown | `.py`, `.js`, `.ts`, `.java`, `.go`, `.rs`, `.cpp`, `.html`, `.csv`, `.sql`, `.yaml`, `.json`, `.md`, etc. |
| **Remote** | HTTPS URLs | Any URL starting with `https://` |

### Multi Constructor Parameters

All parameters from [VertexTextModel](#constructor-parameters) apply, plus:

| Parameter | Type | Default | Description |
|---|---|---|---|
| `media_resolution` | `str \| None` | `"media_resolution_high"` | Image/video decode quality. |

### Image Analysis

```python
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

result = multi.invoke("What objects are in this image?", files="photo.jpg")
print(result)
```

### Video Analysis

```python
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

result = multi.invoke("Summarize what happens in this video.", files="presentation.mp4")
print(result)
```

### Audio Transcription

```python
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

result = multi.invoke("Transcribe this audio recording.", files="meeting.mp3")
print(result)
```

### PDF / Document Analysis

```python
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

result = multi.invoke("Extract the key findings from this research paper.", files="paper.pdf")
print(result)
```

### Code File Analysis

```python
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

result = multi.invoke("Review this code for bugs and improvements.", files="main.py")
print(result)
```

### Mixed Media Input

Pass multiple files of **different types** in a single request:

```python
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

# Image + Audio
result = multi.invoke(
    "Does the audio narration match this image?",
    files=["photo.jpg", "narration.mp3"],
)

# PDF + Code
result = multi.invoke(
    "Does this implementation follow the spec?",
    files=["spec.pdf", "implementation.py"],
)

# Multiple images
result = multi.invoke(
    "Compare these two screenshots and list the differences.",
    files=["before.png", "after.png"],
)
```

### Remote URLs

Pass HTTPS URLs directly — the model fetches and processes them:

```python
multi = VertexMultiModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")

result = multi.invoke(
    "What is in this image?",
    files="https://storage.googleapis.com/example/photo.jpg",
)
```

### File Input Formats

The `files` parameter accepts any of these formats (and mixed lists of them):

| Format | Example |
|---|---|
| File path string | `"photo.jpg"`, `"report.pdf"`, `"clip.mp4"` |
| `pathlib.Path` | `Path("audio.mp3")` |
| HTTPS URL | `"https://example.com/file.pdf"` |
| Raw `bytes` | `open("img.jpg","rb").read()` |
| `(bytes, mime)` tuple | `(data, "audio/mp3")` |
| `dict` | `{"data": b"...", "mime_type": "video/mp4"}` |
| List of any above | `["photo.jpg", "clip.mp4", "notes.pdf"]` |

### Async MultiModel

```python
result = await multi.ainvoke("Describe this.", files="photo.jpg")
```

### Multi Structured Output

```python
multi = VertexMultiModel(
    model="gemini-3-flash-preview",
    api_key="YOUR_KEY",
    structured_output=True,
)
result = multi.invoke("Summarize this video.", files="clip.mp4")
# {
#   "model": "gemini-3-flash-preview",
#   "response": "The video shows...",
#   "input_tokens": 1258,
#   "output_tokens": 42,
#   "total_tokens": 1300,
#   "latency_ms": 3420.15,
#   "input_file_count": 1,
#   "input_file_types": ["video"]
# }
```

---

## VertexMaaSModel

```python
from autourgos_vertexai_modelkit import VertexMaaSModel
```

Calls third-party models (Moonshot Kimi, Meta Llama, Mistral, etc.) via Vertex AI's OpenAI-compatible MaaS endpoint.

### MaaS Constructor Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `model` | `str` | **required** | Full model identifier from Model Garden. |
| `project` | `str \| None` | `None` | GCP project ID. |
| `location` | `str` | `"global"` | Vertex AI region. |
| `access_token` | `str \| None` | `None` | Static Bearer token. |
| `system_instruction` | `str \| None` | `None` | System message. |
| `prompt_template` | `str \| None` | `None` | Format-string template. |
| `temperature` | `float` | `0.6` | Sampling temperature. |
| `top_p` | `float` | `0.95` | Nucleus sampling. |
| `max_tokens` | `int` | `8192` | Maximum output tokens. |
| `structured_output` | `bool` | `False` | Return dict with token counts. |
| `stream` | `bool` | `False` | Stream SSE chunks. |
| `timeout` | `float` | `60.0` | HTTP timeout (seconds). |
| `max_retries` | `int` | `3` | Retry attempts with backoff. |
| `backoff_factor` | `float` | `0.5` | Exponential backoff multiplier. |

### Async MaaS

```python
import asyncio

maas = VertexMaaSModel(
    model="moonshotai/kimi-k2-thinking-maas",
    project="my-project",
)

result = asyncio.run(maas.ainvoke("Explain transformers."))
```

---

## Error Handling

All errors show **actual details** instead of generic messages:

```python
from autourgos_vertexai_modelkit import (
    VertexTextModel,
    VertexTextModelAPIError,
    VertexTextModelImportError,
    VertexTextModelConfigError,
)

try:
    llm = VertexTextModel(model="gemini-3-flash-preview", api_key="YOUR_KEY")
    response = llm.invoke("Hello!")
except VertexTextModelImportError as e:
    print(f"SDK problem: {e}")
    # -> "Failed to import google-genai SDK. Install it with: pip install google-genai"
except VertexTextModelAPIError as e:
    print(f"API failure: {e}")
    # -> "Vertex AI text request failed after 3 attempts. Last error: InvalidArgument: ..."
except VertexTextModelConfigError as e:
    print(f"Config error: {e}")
    # -> "structured_output=True is incompatible with stream=True."
```

### Exception Hierarchies

```
VertexTextModelError
├── VertexTextModelImportError     # SDK not installed / client init failed
├── VertexTextModelAPIError        # API failure after all retries
├── VertexTextModelResponseError   # Response present but no text extractable
└── VertexTextModelConfigError     # Incompatible configuration options

VertexMultiModelError
├── VertexMultiModelImportError    # SDK not installed / client init failed
├── VertexMultiModelAPIError       # API failure after all retries
├── VertexMultiModelResponseError  # Response present but no text extractable
└── VertexMultiModelConfigError    # Incompatible configuration options

VertexMaaSModelError
├── VertexMaaSModelAuthError       # GCP token acquisition failed
├── VertexMaaSModelAPIError        # HTTP error or bad response
└── VertexMaaSModelConfigError     # Incompatible configuration
```

---

## Logging

```python
from autourgos_vertexai_modelkit import set_log_level

# See request/response details
set_log_level("DEBUG")

# Only warnings and errors (default)
set_log_level("WARNING")

# Completely silent
set_log_level("CRITICAL")
```

Log output format:

```
2026-05-09 10:30:45 | autourgos_vertexai_modelkit | DEBUG    | MultiModel invoke attempt 1/3 model=gemini-3-flash-preview
2026-05-09 10:30:48 | autourgos_vertexai_modelkit | INFO     | MultiModel invoke completed in 2845.2ms, model=gemini-3-flash-preview, files=2 (audio, image)
```

---

## Environment Variables Reference

| Variable | Used By | Description |
|---|---|---|
| `GOOGLE_CLOUD_API_KEY` | Text, Multi | API key (highest priority). |
| `GOOGLE_API_KEY` | Text, Multi | API key fallback. |
| `GEMINI_API_KEY` | Text, Multi | API key fallback (lowest priority). |
| `GOOGLE_CLOUD_PROJECT` | Text, Multi, MaaS | GCP project ID. |
| `GCP_PROJECT` | Text, Multi, MaaS | GCP project ID fallback. |
| `GOOGLE_CLOUD_LOCATION` | Text, Multi | GCP region (default: `us-central1`). |
| `GRPC_VERBOSITY` | All | Set to `ERROR` automatically. |
| `GLOG_minloglevel` | All | Set to `2` automatically. |
| `TF_CPP_MIN_LOG_LEVEL` | All | Set to `3` automatically. |

---

## Advanced Configuration

### Thinking (Extended Reasoning)

```python
llm = VertexTextModel(
    model="gemini-3.1-pro-preview",
    api_key="YOUR_KEY",
    thinking_level="high",   # "minimal" | "low" | "medium" | "high"
    structured_output=True,  # thinking text included in response
)
result = llm.invoke("Prove that root 2 is irrational.")
print(result["thinking"])    # model's internal reasoning
print(result["response"])    # final answer
```

### Context Manager

```python
with VertexTextModel(model="gemini-3.1-flash-lite-preview", api_key="YOUR_KEY") as llm:
    result = llm.invoke("Hello!")
    print(result)
```

### Retry and Backoff

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    max_retries=5,
    backoff_factor=1.0,   # waits 1s, 2s, 4s, 8s between attempts
    timeout=60.0,
)
```

### Safety Settings

```python
llm = VertexTextModel(
    model="gemini-3.1-flash-lite-preview",
    api_key="YOUR_KEY",
    safety_settings=[
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"},
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"},
    ],
)
```

### MaaS with System Instruction

```python
maas = VertexMaaSModel(
    model="moonshotai/kimi-k2-thinking-maas",
    project="my-project",
    system_instruction="You are a helpful coding assistant. Always provide examples.",
    temperature=0.7,
    max_tokens=4096,
    max_retries=3,
)
answer = maas.invoke("How do I use async/await in Python?")
```

---

## Developers

- **devxjitin**

---

## License

MIT © DevxJitin
