Metadata-Version: 2.3
Name: analyticsflow
Version: 0.1.0
Summary: An agentic tool to fetch data from databases with natural language and get it as pandas DataFrame, Polars, or any other format you like. Extensible and component-based.
Keywords: agent,llm,sql,analytics,dataframe,pandas,polars,database,natural language
Author: Pedro José Mora Gallegos
Author-email: Pedro José Mora Gallegos <developmentbypedrojose@gmail.com>
License: AnalyticsFlow Non-Commercial License (ANCL)
         
         Copyright (c) 2025 [Your Name]
         
         Permission is hereby granted to any individual or organization to use, copy,
         modify, and distribute this software and its documentation for personal,
         research, or educational purposes, provided that:
         
         1. The software may not be used, in whole or in part, for any commercial
            purpose. “Commercial purpose” includes any use intended for or directed
            toward commercial advantage, monetary compensation, or operation within
            a for-profit environment.
         
         2. Any redistributed copies or modifications must include this license in full.
         
         3. No warranty is provided. The software is offered “as is,” without any
            guarantee of fitness for any purpose. The authors are not liable for any
            damages arising from its use.
         
         For commercial licensing inquiries, contact the copyright holder.
Classifier: Programming Language :: Python :: 3
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Project-URL: Documentation, https://github.com/pedrojose/analyticsflow#readme
Project-URL: Homepage, https://github.com/pedrojose/analyticsflow
Project-URL: Repository, https://github.com/pedrojose/analyticsflow
Description-Content-Type: text/markdown

# AnalyticsFlow

An agentic tool to fetch data from databases with natural language and get it as pandas DataFrame, Polars, or any other format you like. The complete architecture is extensible and component-based.

## Features

- **Natural Language Queries**: Write queries in plain English, get structured data back
- **Multiple Database Support**: SQLite, PostgreSQL, MySQL, and more
- **Flexible Output Formats**: pandas DataFrame, Polars, Arrow, JSON, CSV
- **LLM Provider Agnostic**: OpenAI, Anthropic, or bring your own
- **Extensible Architecture**: Component-based design for easy customization
- **Type Safe**: Full type hints for better IDE support

## Installation

```bash
pip install analyticsflow
```

### Optional Dependencies

```bash
# For PostgreSQL support
pip install analyticsflow[postgres]

# For MySQL support
pip install analyticsflow[mysql]

# For Polars output
pip install analyticsflow[polars]

# Install everything
pip install analyticsflow[all]
```

## Quick Start

```python
from analyticsflow import SQLAgent, OpenAILLM, SQLiteDatabase

# Initialize the agent
sqlagent = SQLAgent(
    llm=OpenAILLM(
        api_key="YOUR_API_KEY",
        model="gpt-4"
    ),
    database=SQLiteDatabase(
        path="/data/analytics.db"
    )
)

# Query with natural language
df = sqlagent.query(
    "Show me the top 10 users by revenue in 2023",
    type="pandas"
)

print(df)
```

## Examples

### Complex Analytics Query

```python
df = sqlagent.query(
    "Generate a full monthly analytics report for 2023 including "
    "active users, returning users, new signups, churn rate, ARPU, "
    "total revenue, average session duration, and conversion rate. "
    "Group by month and sort chronologically.",
    type="pandas"
)

print(df)
```

**Output:**

```bash
| month    | active_users | returning_users | new_signups | churn_rate | arpu | revenue | avg_session_min | conversion_rate |
|----------|--------------|-----------------|-------------|------------|------|---------|------------------|------------------|
| 2023-01  | 84210        | 71620           | 12590       | 0.037      | 4.98 | 4.19e6  | 14.2             | 0.092           |
| 2023-02  | 85940        | 72830           | 13110       | 0.034      | 5.03 | 4.32e6  | 14.5             | 0.095           |
| 2023-03  | 90320        | 75520           | 14800       | 0.032      | 5.07 | 4.57e6  | 15.1             | 0.098           |
| 2023-04  | 91850        | 76580           | 15270       | 0.030      | 5.10 | 4.69e6  | 15.0             | 0.101           |
| 2023-05  | 94710        | 78670           | 16040       | 0.029      | 5.12 | 4.85e6  | 15.4             | 0.103           |
| 2023-06  | 96280        | 79390           | 16890       | 0.028      | 5.15 | 4.96e6  | 15.8             | 0.106           |
| 2023-07  | 97530        | 80020           | 17510       | 0.028      | 5.17 | 5.05e6  | 16.1             | 0.108           |
| 2023-08  | 98840        | 80840           | 18000       | 0.027      | 5.20 | 5.14e6  | 16.3             | 0.109           |
| 2023-09  | 100120       | 82470           | 17650       | 0.029      | 5.18 | 5.19e6  | 16.0             | 0.107           |
| 2023-10  | 102550       | 84120           | 18320       | 0.027      | 5.21 | 5.34e6  | 16.5             | 0.111           |
| 2023-11  | 103180       | 85340           | 17840       | 0.028      | 5.24 | 5.41e6  | 16.4             | 0.112           |
| 2023-12  | 105300       | 86300           | 19010       | 0.026      | 5.30 | 5.58e6  | 16.9             | 0.114           |
```

### Different Output Formats

```python
# Pandas DataFrame (default)
df = sqlagent.query("Get user counts by country", type="pandas")

# Polars DataFrame
df = sqlagent.query("Get user counts by country", type="polars")

# JSON
data = sqlagent.query("Get user counts by country", type="json")

# Arrow Table
table = sqlagent.query("Get user counts by country", type="arrow")

# CSV string
csv = sqlagent.query("Get user counts by country", type="csv")
```

### Using Different Databases

#### PostgreSQL

```python
from analyticsflow import SQLAgent, OpenAILLM, PostgreSQLDatabase

sqlagent = SQLAgent(
    llm=OpenAILLM(api_key="YOUR_API_KEY"),
    database=PostgreSQLDatabase(
        host="localhost",
        database="analytics",
        user="postgres",
        password="your_password",
        port=5432
    )
)
```

#### MySQL

```python
from analyticsflow import SQLAgent, OpenAILLM, MySQLDatabase

sqlagent = SQLAgent(
    llm=OpenAILLM(api_key="YOUR_API_KEY"),
    database=MySQLDatabase(
        host="localhost",
        database="analytics",
        user="root",
        password="your_password",
        port=3306
    )
)
```

### Using Different LLM Providers

#### Anthropic Claude

```python
from analyticsflow import SQLAgent, AnthropicLLM, SQLiteDatabase

sqlagent = SQLAgent(
    llm=AnthropicLLM(
        api_key="YOUR_ANTHROPIC_KEY",
        model="claude-3-opus-20240229"
    ),
    database=SQLiteDatabase(path="/data/analytics.db")
)
```

#### Custom LLM Provider

```python
from analyticsflow import BaseLLM

class CustomLLM(BaseLLM):
    def generate_sql(self, query: str, schema: str, context=None) -> str:
        # Your custom implementation
        pass
    
    def validate_sql(self, sql: str, schema: str) -> dict:
        # Your custom validation
        pass

sqlagent = SQLAgent(
    llm=CustomLLM(),
    database=SQLiteDatabase(path="/data/analytics.db")
)
```

## Architecture

AnalyticsFlow uses a component-based architecture with three main interfaces:

### 1. LLM Interface (`BaseLLM`)

Handles natural language to SQL conversion:

```python
class BaseLLM(ABC):
    @abstractmethod
    def generate_sql(self, query: str, schema: str, context=None) -> str:
        """Generate SQL from natural language"""
        pass
    
    @abstractmethod
    def validate_sql(self, sql: str, schema: str) -> dict:
        """Validate generated SQL"""
        pass
```

**Implementations:**

- `OpenAILLM` - OpenAI GPT models
- `AnthropicLLM` - Anthropic Claude models
- Custom implementations welcome!

### 2. Database Interface (`BaseDatabase`)

Handles database connections and queries:

```python
class BaseDatabase(ABC):
    @abstractmethod
    def connect(self) -> None:
        """Establish connection"""
        pass
    
    @abstractmethod
    def execute(self, query: str) -> Tuple[List[Tuple], List[str]]:
        """Execute query and return results"""
        pass
    
    @abstractmethod
    def get_schema(self) -> str:
        """Get database schema"""
        pass
```

**Implementations:**

- `SQLiteDatabase` - SQLite databases
- `PostgreSQLDatabase` - PostgreSQL databases
- `MySQLDatabase` - MySQL databases
- Custom implementations welcome!

### 3. Formatter Interface (`BaseFormatter`)

Handles output format conversion:

```python
class BaseFormatter(ABC):
    @abstractmethod
    def format(self, data: List[Tuple], columns: List[str]) -> Any:
        """Format query results"""
        pass
```

**Implementations:**

- `PandasFormatter` - pandas DataFrames
- `PolarsFormatter` - Polars DataFrames
- `JSONFormatter` - JSON objects
- `ArrowFormatter` - Apache Arrow tables
- `CSVFormatter` - CSV strings
- Custom implementations welcome!

### 4. SQLAgent (Orchestrator)

The main agent that coordinates all components:

```python
class SQLAgent:
    def __init__(self, llm: BaseLLM, database: BaseDatabase):
        self.llm = llm
        self.database = database
    
    def query(self, query: str, type: str = "pandas") -> Any:
        """Execute natural language query"""
        pass
```

## Advanced Usage

### Custom Context

```python
context = """
The 'users' table tracks user signups and activity.
The 'sessions' table records user sessions with duration.
The 'transactions' table contains revenue data.
"""

df = sqlagent.query(
    "What's the average session duration for paying users?",
    context=context,
    type="pandas"
)
```

### Query Validation

```python
# Enable strict validation
sqlagent = SQLAgent(
    llm=OpenAILLM(api_key="YOUR_API_KEY"),
    database=SQLiteDatabase(path="/data/analytics.db"),
    validate_queries=True,  # Default: True
    max_retries=3  # Retry failed queries
)
```

### Caching

```python
# Enable query caching
sqlagent = SQLAgent(
    llm=OpenAILLM(api_key="YOUR_API_KEY"),
    database=SQLiteDatabase(path="/data/analytics.db"),
    cache_queries=True,  # Cache SQL generation
    cache_ttl=3600  # Cache for 1 hour
)
```

## License

See [LICENSE](LICENSE) file for license details.
