Metadata-Version: 2.3
Name: semanticslayer
Version: 0.1.0rc1
Summary: A lightweight, agent-first semantic layer for AI agents
License: MIT
Keywords: semantic-layer,sql,ai,mcp,data
Author: MotleyAI
Requires-Python: >=3.11,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database
Provides-Extra: all
Provides-Extra: api
Provides-Extra: client
Provides-Extra: docs
Provides-Extra: mcp
Requires-Dist: fastapi (>=0.100) ; extra == "api" or extra == "all"
Requires-Dist: httpx (>=0.24) ; extra == "client" or extra == "all"
Requires-Dist: mcp (>=1.0) ; extra == "mcp" or extra == "all"
Requires-Dist: mkdocs-material (>=9.0) ; extra == "docs"
Requires-Dist: pandas (>=2.0) ; extra == "client" or extra == "all"
Requires-Dist: pydantic (>=2.0)
Requires-Dist: pyyaml (>=6.0)
Requires-Dist: sqlalchemy (>=2.0)
Requires-Dist: sqlglot (>=20.0)
Requires-Dist: uvicorn (>=0.20) ; extra == "api" or extra == "all"
Description-Content-Type: text/markdown

# SLayer — a semantic layer by Motley

[![PyPI](https://img.shields.io/pypi/v/semanticslayer?label=PyPI)](https://pypi.org/project/semanticslayer/)
[![Python](https://img.shields.io/pypi/pyversions/semanticslayer)](https://pypi.org/project/semanticslayer/)
[![Docs](https://img.shields.io/badge/docs-readthedocs-blue)](https://motley-semanticlayer.readthedocs.io/)
[![License](https://img.shields.io/github/license/MotleyAI/slayer)](LICENSE)

A lightweight open-source semantic layer for AI agents and humans

## Quick Start

```bash
# Install
pip install semantic-slayer[all]

# Start the HTTP server
slayer serve --models-dir ./my_models

# Or set up stdio MCP for an agent like Claude Code
# (the agent spawns slayer as a subprocess — you don't run this manually)
claude mcp add slayer -- slayer mcp --models-dir ./my_models
```

## What is SLayer?

SLayer sits between your data and your apps or AI agents. Instead of writing raw SQL, agents describe what data they want — measures, dimensions, filters — and SLayer handles the rest.

**Key features:**
- **Agent-first design** — MCP, Python SDK, and REST API interfaces
- **Datasource-agnostic** — Postgres, MySQL, BigQuery, Snowflake, and more via sqlglot
- **Expressions and transforms** — Derived metrics (arithmetic on measures) and window functions (cumsum, lag, lead, change, rank) built into queries
- **Auto-ingestion with rollup joins** — Connect to a DB, introspect schema, generate denormalized models with FK-based LEFT JOINs automatically
- **Incremental model editing** — Add/remove measures and dimensions without replacing the full model
- **Lightweight** — Minimal dependencies, easy to set up and extend


## Interfaces

### REST API

```bash
# Query
curl -X POST http://localhost:5143/query \
  -H "Content-Type: application/json" \
  -d '{"model": "orders", "measures": [{"name": "count"}], "dimensions": [{"name": "status"}]}'

# List models (returns name + description)
curl http://localhost:5143/models

# Get a single datasource (credentials masked)
curl http://localhost:5143/datasources/my_postgres

# Health check
curl http://localhost:5143/health
```

### MCP Server

SLayer supports two MCP transports:

**Stdio** — the agent spawns SLayer as a subprocess (for Claude Code, Cursor, etc.). You do not run `slayer mcp` manually; instead, register it with your agent:

```bash
# Register with Claude Code (the agent will spawn the process itself)
claude mcp add slayer -- slayer mcp --models-dir ./my_models

# If slayer is in a virtualenv, use the full path to the executable:
#   poetry env info -p   # prints e.g. /home/user/.venvs/slayer-xyz
#   claude mcp add slayer -- /home/user/.venvs/slayer-xyz/bin/slayer mcp --models-dir /path/to/my_models
```

**SSE (Server-Sent Events)** — MCP over HTTP, served alongside the REST API on `/mcp`. You run `slayer serve` yourself, then point the agent at the URL:

```bash
# 1. Start the server (REST API + MCP SSE)
slayer serve --models-dir ./my_models
# REST API at http://localhost:5143/query, /models, etc.
# MCP SSE at http://localhost:5143/mcp/sse

# 2. In a separate terminal, register the remote MCP endpoint with your agent
claude mcp add slayer-remote --transport sse --url http://localhost:5143/mcp/sse
```

Both transports expose the same tools — no duplication.

MCP tools:

| Tool | Description |
|------|-------------|
| `models_summary` | List all models with their schemas (dimensions, measures) |
| `inspect_model` | Detailed model info with sample data |
| `query` | Execute semantic queries |
| `create_model` | Create a new model from table/SQL |
| `add_measures` | Add measures to an existing model |
| `add_dimensions` | Add dimensions to an existing model |
| `delete_measures_dimensions` | Remove measures or dimensions by name |
| `update_model` | Update model metadata (description, data source) |
| `delete_model` | Delete a model |
| `create_datasource` | Configure a database connection (with connection test and auto-ingestion; set `auto_ingest=false` to skip) |
| `list_datasources` | List configured datasources |
| `describe_datasource` | Show datasource details, test connection, list schemas |
| `list_tables` | Explore tables in a database |
| `delete_datasource` | Remove a datasource |
| `ingest_datasource_models` | Auto-generate models from DB schema |

Typical agent workflow:
1. `create_datasource` (auto-ingests models by default) → `models_summary` → `inspect_model` → `query`
2. Or with `auto_ingest=false`: `create_datasource` → `describe_datasource` → `ingest_datasource_models` → `models_summary` → `query`

### Python Client

```python
from slayer.client.slayer_client import SlayerClient
from slayer.core.query import SlayerQuery, ColumnRef

# Remote mode (connects to running server)
client = SlayerClient(url="http://localhost:5143")

# Or local mode (no server needed)
from slayer.storage.yaml_storage import YAMLStorage
client = SlayerClient(storage=YAMLStorage(base_dir="./my_models"))

# Query data
query = SlayerQuery(
    model="orders",
    measures=[ColumnRef(name="count"), ColumnRef(name="revenue_sum")],
    dimensions=[ColumnRef(name="status")],
    limit=10,
)
df = client.query_df(query)
print(df)
```

### CLI Query

```bash
# Run a query directly from the terminal
slayer query '{"model": "orders", "measures": [{"name": "count"}], "dimensions": [{"name": "status"}]}'

# Or from a file
slayer query @query.json --format json
```


## Models

Models are defined as YAML files. Add an optional `description` to help users and agents understand complex models:

```yaml
name: orders
sql_table: public.orders
data_source: my_postgres
description: "Core orders table with revenue metrics"

dimensions:
  - name: id
    sql: id
    type: number
    primary_key: true
  - name: status
    sql: status
    type: string
  - name: created_at
    sql: created_at
    type: time

measures:
  - name: count
    type: count
  - name: revenue_sum
    sql: amount
    type: sum
  - name: revenue_avg
    sql: amount
    type: avg
```


## Expressions and Transforms

Derive new columns from existing measures — no model changes needed.

**Expressions** perform arithmetic on measures, generating a CTE with an outer SELECT:

```json
{
  "model": "orders",
  "measures": [{"name": "count"}, {"name": "revenue_sum"}],
  "dimensions": [{"name": "status"}],
  "expressions": [{"name": "aov", "sql": "revenue_sum / count"}]
}
```

**Transforms** apply window functions over a time dimension:

```json
{
  "model": "orders",
  "measures": [{"name": "revenue_sum"}],
  "time_dimensions": [{"dimension": {"name": "created_at"}, "granularity": "month"}],
  "transforms": [
    {"name": "rev_cumsum", "transform": "cumsum", "measure": "revenue_sum"},
    {"name": "rev_change_pct", "transform": "change_pct", "measure": "revenue_sum"}
  ]
}
```

Available transform types: `cumsum`, `lag`, `lead`, `change`, `change_pct`, `rank`.

Transforms that need ordering over time (`cumsum`, `lag`, `lead`, `change`, `change_pct`) resolve the time dimension via: query `time_dimensions` (if exactly one) -> model `default_time_dimension` -> error.


## Filter Templates

Use `variables` to parameterize queries with `{var_name}` placeholders in filter values:

```json
{
  "model": "orders",
  "measures": [{"name": "count"}, {"name": "revenue_sum"}],
  "filters": [
    {"member": {"name": "status"}, "operator": "equals", "values": ["{status}"]},
    {"member": {"name": "customer_id"}, "operator": "equals", "values": ["{cid}"]}
  ],
  "variables": {"status": "completed", "cid": 42}
}
```

If the entire value is a single `{var}`, the variable's native type is preserved (int, date, etc.). Mixed strings like `"prefix_{var}"` do string substitution. Missing variables are left as-is.


## Auto-Ingestion

Connect to a database and generate models automatically. SLayer introspects the schema, detects foreign key relationships, and creates denormalized models with rollup-style LEFT JOINs.

For example, given tables `orders → customers → regions` (via FKs), the `orders` model will automatically include:
- Rolled-up dimensions: `customers__name`, `regions__name`, etc.
- Count-distinct measures: `customers__count`, `regions__count`
- A SQL query with transitive LEFT JOINs baked in

```bash
# Via CLI
slayer ingest --datasource my_postgres --schema public

# Via API
curl -X POST http://localhost:5143/ingest \
  -d '{"datasource": "my_postgres", "schema_name": "public"}'
```

Via MCP, agents can do this conversationally:
1. `create_datasource(name="mydb", type="postgres", host="localhost", database="app", username="user", password="pass")`
2. `ingest_datasource_models(datasource_name="mydb", schema_name="public")`
3. `models_summary()` → `inspect_model(model_name="orders")` → `query(...)`


## Configuration

Datasources are configured as individual YAML files in the `datasources/` directory:

```yaml
# datasources/my_postgres.yaml
name: my_postgres
type: postgres
host: ${DB_HOST}
port: 5432
database: ${DB_NAME}
username: ${DB_USER}
password: ${DB_PASSWORD}
```

Environment variable references (`${VAR}`) are resolved at read time. Both `username` and `user` field names are accepted.


## Storage Backends

SLayer ships with two storage backends:

- **YAMLStorage** (default) — models and datasources as YAML files on disk. Great for version control.
- **SQLiteStorage** — everything in a single SQLite file. Good for embedded use or when you don't want to manage files.

```python
from slayer.storage.yaml_storage import YAMLStorage
from slayer.storage.sqlite_storage import SQLiteStorage

# YAML files in a directory
storage = YAMLStorage(base_dir="./slayer_data")

# Single SQLite file
storage = SQLiteStorage(db_path="./slayer.db")
```

Both implement the `StorageBackend` protocol, so you can swap them freely or write your own:

```python
from slayer.storage.base import StorageBackend

class MyCustomStorage(StorageBackend):
    def save_model(self, model): ...
    def get_model(self, name): ...
    def list_models(self): ...
    def delete_model(self, name): ...
    # same for datasources
```

Pass any backend to `create_app()`, `create_mcp_server()`, or `SlayerClient(storage=...)`.


## Examples

The `examples/` directory contains runnable examples that also serve as integration tests:

| Example | Description | How to run |
|---------|-------------|------------|
| [embedded](examples/embedded/) | SQLite, no server needed | `python examples/embedded/run.py` |
| [postgres](examples/postgres/) | Docker Compose with Postgres + REST API | `cd examples/postgres && docker compose up -d` |

Each example includes a `verify.py` script that runs assertions against the seeded data.

Both examples use a shared seed dataset (`examples/seed.py`) with a small e-commerce schema: regions, customers, products, and orders (68 orders across 12 months). The embedded example includes expression and transform demo queries.


## Claude Code Skills

SLayer includes Claude Code skills in `.claude/skills/` to help Claude understand the codebase:

- **slayer-overview** — architecture, package structure, MCP tools list
- **slayer-query** — how to construct queries with measures, dimensions, filters, time dimensions
- **slayer-models** — model definitions, datasource configs, auto-ingestion, incremental editing


## Development

```bash
# Install with all extras
poetry install -E all

# Run tests
poetry run pytest

# Lint
poetry run ruff check slayer/ tests/

# Start dev server
poetry run slayer serve
```


## Why SLayer?

### SLayer is embeddable

Besides being a standalone service, SLayer is also a Python module. That's why it can also be directly imported and used in Python applications with no network communication involved.

For example, using it in a multi-tenant application could be as simple as:

```python
client = SlayerClient(storage=MyStorage(tenant_id=...))
```

No need for setting up network and auth at all.

### SLayer models can be updated on the fly

In SLayer, models are treated as a dynamic part of the process, rather than something that is preconfigured and frozen.
Models can be created or edited at any time and immediately queried.
We expose tools for interacting with the models via API, CLI and MCP.

### SLayer is flexible

Slayer is agnostic to the datasource type: it can be an SQL database, a BI tool or even a REST API.
Adapters for popular databases are included, but adding a new one just takes implementing 3 straightforward methods: a query-to-datasource-input translator and a result-to-dataframe parser. Please open a pull request if you write one!


## Known limitations

SLayer currently has no caching or pre-aggregation engine.
If you need to process lots of requests to large databases at sub-second latency, consider specialized tools like CubeJS, dbt, AtScale, Looker, etc.


## License

MIT — see [LICENSE](LICENSE).

