Metadata-Version: 2.4
Name: symbolic-lob
Version: 0.2.1
Summary: Symbolic dynamics framework for limit order book analysis
Project-URL: Homepage, https://github.com/xiongyu23/lob-symbolic-dynamics
Project-URL: Repository, https://github.com/xiongyu23/lob-symbolic-dynamics
Author-email: xiongyu <me@xiongyu.org>
License: MIT
License-File: LICENSE
Keywords: information-theory,limit-order-book,market-microstructure,symbolic-dynamics,thermodynamics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: live
Requires-Dist: psutil>=5.9; extra == 'live'
Requires-Dist: pyarrow>=12.0; extra == 'live'
Requires-Dist: websockets>=11.0; extra == 'live'
Description-Content-Type: text/markdown

<p align="center">
  <strong>symbolic-lob</strong>
</p>

<p align="center">
  Symbolic Dynamics Framework for Limit Order Book Analysis
</p>

<p align="center">
  <a href="https://pypi.org/project/symbolic-lob/"><img src="https://img.shields.io/pypi/v/symbolic-lob.svg" alt="PyPI"></a>
  <a href="https://pypi.org/project/symbolic-lob/"><img src="https://img.shields.io/pypi/pyversions/symbolic-lob.svg" alt="Python Versions"></a>
  <a href="https://github.com/xiongyu23/lob-dashboard"><img src="https://img.shields.io/badge/Live%20Dashboard-lob--dashboard.onrender.com-blue" alt="Live Dashboard"></a>
  <img src="https://img.shields.io/pypi/l/symbolic-lob.svg" alt="License: MIT">
</p>

---

## Theoretical Foundation

This package is the computational companion to the paper:

> **Symbolic Dynamics of the Limit Order Book: An Algebraic Foundation**
>
> Available on [SSRN](https://papers.ssrn.com/) · Author: [xiongyu](https://www.xiongyu.org/)

The paper establishes a rigorous algebraic framework in which every limit-order-book event is mapped to a symbol drawn from a six-letter alphabet. The resulting symbolic word is then analysed through reduction operators, information-theoretic functionals, and thermodynamic analogues — yielding a unified, quantitative description of market microstructure.

## Symbol Alphabet

| Symbol | Code | Event | Side | Sign |
|:------:|:----:|-------|:----:|:----:|
| α | `ALPHA` | New limit sell order placed on the ask side | ask | +1 |
| β | `BETA` | New limit buy order placed on the bid side | bid | +1 |
| μ_b | `MU_BUY` | Market buy order (aggressor takes the ask) | ask | −1 |
| μ_s | `MU_SELL` | Market sell order (aggressor hits the bid) | bid | −1 |
| γ_b | `GAMMA_BID` | Cancellation or partial reduction of a bid | bid | −1 |
| γ_a | `GAMMA_ASK` | Cancellation or partial reduction of an ask | ask | −1 |

## Core Concepts

### Θ Reduction

The Θ operator cancels matched (limit, cancel) pairs with identical (type, price, quantity), partitioning the symbolic word into:

- **Alive sub-word** — surviving (un-cancelled) limit orders
- **Dead sub-word** — cancelled limit orders

This decomposition is the foundation for all downstream metrics.

### Thermodynamic Analogues

| Metric | Formula | Interpretation |
|--------|---------|----------------|
| Temperature *T* | λ_μ / (λ_α + λ_β) | Ratio of market-order intensity to limit-order intensity |
| Internal Energy *U* | Σ \|p − p_mid\| · q | Distance-weighted outstanding liquidity |
| Free Energy *F* | U − T · H(Θ) | Balances energy cost against informational disorder |
| Entropy Production Σ | H(dead) − ΔH(Θ) | Irreversibility of the cancellation process |

### Information-Theoretic Measures

| Metric | Description |
|--------|-------------|
| H(W) | Shannon entropy of the full symbolic word |
| H(Θ) | Reduced entropy of the alive sub-word |
| ε = H(W) − H(Θ) − H(dead) | Information-theoretic residual |
| τ_mix | Markov-chain mixing time from the transition matrix |
| Spectral Gap | 1 − λ₂, distance from the second eigenvalue to 1 |
| Fano Factor | Variance-to-mean ratio of inter-impact arrival times (burstiness) |
| Transfer Entropy TE(src → tgt) | Directed information flow between symbol types |
| Conditional Entropy H(X\_{t+lag} \| X_t) | Predictability of the next symbol given the current |
| Mutual Information I(X_t; X\_{t+lag}) | Shared information between symbols at different lags |
| KL Divergence D_KL(P \| Q) | Distributional distance between two symbol sequences |
| Clustering Index | Self-transition probability relative to uniform baseline |

### Advanced Metrics (v0.2.0)

| Metric | Description |
|--------|-------------|
| Kolmogorov Complexity | Lempel-Ziv compression ratio as algorithmic complexity estimate |
| Hurst Exponent | R/S analysis on sign walk — detects long-range dependence |
| Autocorrelation | Autocorrelation of the liquidity sign sequence at multiple lags |
| Power-Law Exponent | Hill estimator on market-order size tail distribution |
| Order Flow Imbalance (OFI) | Rolling (limit − cancel − market) / window |
| Variance Ratio Test | VR(lag) detects momentum (>1) or mean reversion (<1) |
| Volume-Weighted Spread | Average distance of limit orders from mid price, weighted by quantity |
| Realized Volatility | Annualized volatility from log-return standard deviation |

### Market Microstructure

| Metric | Description |
|--------|-------------|
| Mid Price | (best bid + best ask) / 2 |
| Weighted Mid | Volume-weighted mid price across bid/ask depth |
| Spread | best ask − best bid |
| Bid / Ask Depth | Number of surviving bid / ask levels after Θ reduction |
| Imbalance Ratio | (bid − ask) / (bid + ask) ∈ [−1, 1] |
| λ_μ | Market-order arrival rate (events / second) |
| Spread-Depth Ratio | Spread normalized by total depth |
| VWAP | Volume-weighted average price per side |

## Live Dashboard

A real-time dashboard powered by `symbolic-lob` is publicly available:

🌐 **[lob-dashboard.onrender.com](https://lob-dashboard.onrender.com/)**

Source code: [github.com/xiongyu23/lob-dashboard](https://github.com/xiongyu23/lob-dashboard)

No registration is required. The dashboard streams live Binance data, computes all metrics in real time, and displays:

```
┌─────────────────────────────────────────────────────────┐
│  Symbol            btcusdt                              │
│  Events Collected  54,707                               │
│  Buffer Size       50,000                               │
│  Uptime            6m 31s                               │
│  Memory            98.56 MB                             │
│  DB Records        10,952                               │
├─────────────────────────────────────────────────────────┤
│  Thermodynamic Metrics                                  │
│  Temperature (T)   0.263923                             │
│  Internal Energy   22,932,413.95                        │
│  Free Energy       22,932,410.66                        │
│  Entropy Prod.     0.000000                             │
├─────────────────────────────────────────────────────────┤
│  Information Metrics                                    │
│  Word Size |W|     50,000                               │
│  Entropy H(W)      15.04                                │
│  Reduced H(Θ)      12.50                                │
│  Epsilon           -11.386                              │
│  Mixing Time       32.03                                │
│  Fano Factor       330.90                               │
├─────────────────────────────────────────────────────────┤
│  Market Microstructure                                  │
│  Mid Price         79,751.32                            │
│  Spread            0.0100                               │
│  Bid Depth         3,384                                │
│  Ask Depth         2,578                                │
│  Imbalance Ratio   0.1352                               │
│  Lambda Mu         196.03                               │
├─────────────────────────────────────────────────────────┤
│  Real-time Charts                                       │
│  • Temperature & Free Energy                            │
│  • Entropy H(W) & H(Θ)                                 │
│  • Mid Price                                            │
│  • Order Book Depth                                     │
├─────────────────────────────────────────────────────────┤
│  Recent Events                                          │
│  3:13:30 PM  μ_s   79751.31   0.0018                   │
│  3:13:30 PM  α     80055.33   0.0810                   │
│  3:13:30 PM  γ_a   80031.85   0.0810                   │
│  3:13:30 PM  γ_a   79913.58   0.1499                   │
│  3:13:30 PM  α     79797.21   1.0753                   │
│  ...                                                    │
└─────────────────────────────────────────────────────────┘
```

## Installation

```bash
pip install symbolic-lob
```

For live streaming (Binance WebSocket, Parquet storage, memory monitoring):

```bash
pip install symbolic-lob[live]
```

For development:

```bash
pip install symbolic-lob[dev]
```

## Quick Start

### One-Shot Report (v0.2.0)

The easiest way to compute all metrics at once:

```python
from symbolic_lob import Symbol, SymbolType, MetricsReport

symbols = [
    Symbol(SymbolType.ALPHA, 100.0, 1.0, 0.0),
    Symbol(SymbolType.BETA, 99.5, 0.5, 0.0),
    Symbol(SymbolType.MU_BUY, 99.75, 0.3, 0.0),
]

report = MetricsReport.compute(symbols, mid_price=99.75)

print(report.thermodynamic)   # T, U, F, Sigma
print(report.information)     # H(W), H(Θ), ε, τ_mix, TE, ...
print(report.microstructure)  # mid, spread, depth, imbalance, ...
print(report.advanced)        # Kolmogorov, Hurst, MI, OFI, VR, ...

# Flat dict for CSV / DataFrame export
flat = report.flat_dict()
```

### Word Container (v0.2.0)

```python
from symbolic_lob import Symbol, SymbolType, Word

word = Word()
word.append(Symbol(SymbolType.ALPHA, 100.0, 1.0, 1.0))
word.append(Symbol(SymbolType.BETA, 99.0, 0.5, 2.0))

print(len(word))                  # 2
print(word.type_counts())         # {ALPHA: 1, BETA: 1, ...}
print(word.type_frequencies())    # {ALPHA: 0.5, BETA: 0.5, ...}
print(word.alive_count)           # 2
print(word.market_count)          # 0

ts, prices, qtys = word.to_arrays()   # numpy arrays
recent = word.recent(10)              # last 10 symbols
sliced = word.slice_by_time(0.5, 1.5) # filter by timestamp
```

### Binance Factory Functions (v0.2.0)

```python
from symbolic_lob.symbols import symbol_from_binance_trade, symbol_from_binance_depth

# From a Binance trade payload
trade_data = {"E": 1700000000000, "p": "50000.00", "q": "0.5", "m": False}
sym = symbol_from_binance_trade(trade_data)

# From a depth update delta
symbols = symbol_from_binance_depth("bid", 100.0, old_qty=0.0, new_qty=1.0, timestamp=1.0)
```

### Manual Analysis

```python
from symbolic_lob import Symbol, SymbolType, SymbolicAnalyzer

symbols = [
    Symbol(SymbolType.ALPHA, 100.0, 1.0, 0.0),
    Symbol(SymbolType.BETA, 99.5, 0.5, 0.0),
    Symbol(SymbolType.GAMMA_ASK, 100.0, 1.0, 0.0),
]

alive, dead = SymbolicAnalyzer.theta(symbols)

h_w = SymbolicAnalyzer.entropy(symbols)
h_alive = SymbolicAnalyzer.entropy(alive)

t = SymbolicAnalyzer.compute_temperature(symbols)
u = SymbolicAnalyzer.compute_internal_energy(alive, mid_price=99.75)
f = SymbolicAnalyzer.compute_free_energy(alive, mid_price=99.75, temperature=t)
sigma = SymbolicAnalyzer.compute_entropy_production(alive, dead, prev_h_alive=None)

tm = SymbolicAnalyzer.transition_matrix(symbols)
tau, gap = SymbolicAnalyzer.mixing_time(tm)
fano = SymbolicAnalyzer.impact_burst_fano(symbols)

te = SymbolicAnalyzer.transfer_entropy(symbols, SymbolType.GAMMA_BID, SymbolType.ALPHA)
clust = SymbolicAnalyzer.clustering_indices(tm)

kc = SymbolicAnalyzer.kolmogorov_complexity_estimate(symbols)
hurst = SymbolicAnalyzer.hurst_exponent(symbols)
mi = SymbolicAnalyzer.mutual_information(symbols)
kl = SymbolicAnalyzer.kullback_leibler(symbols_p, symbols_q)
vr = SymbolicAnalyzer.var_ratio_test(symbols)
ofi = SymbolicAnalyzer.order_flow_imbalance(symbols, window=100)
```

### Live Pipeline

```bash
symbolic-lob --symbol btcusdt --cold-interval 30 --log-level INFO
```

This connects to the Binance WebSocket, streams depth and trade data, emits symbolic events, and periodically writes the full metric suite (including all v0.2.0 advanced metrics) to `metrics.csv`.

```python
from symbolic_lob import LiveConfig
from symbolic_lob.pipeline import run_live_pipeline

config = LiveConfig(symbol="btcusdt", cold_interval=30.0)
run_live_pipeline(config)
```

## API Reference

### `SymbolType`

Enum with six members: `ALPHA`, `BETA`, `MU_BUY`, `MU_SELL`, `GAMMA_BID`, `GAMMA_ASK`.

Properties: `.greek` (e.g. `"α"`), `.category` (`"limit"` / `"market"` / `"cancel"`), `.side` (`"bid"` / `"ask"`), `.sign` (`+1` / `−1`).

### `Symbol(sym_type, price=None, quantity=0.0, timestamp=0.0)`

Frozen dataclass representing a single symbolic event.

### `Word(symbols=None)`

Lightweight container for a symbol sequence with convenience methods:

| Method / Property | Returns |
|-------------------|---------|
| `append(symbol)` | Add a symbol |
| `type_counts()` | `{SymbolType: int}` |
| `type_frequencies()` | `{SymbolType: float}` |
| `slice_by_time(start, end)` | `Word` filtered by timestamp |
| `to_arrays()` | `(timestamps, prices, quantities)` as numpy arrays |
| `recent(n)` | Last n symbols as `Word` |
| `alive_count` | Count of α + β |
| `cancel_count` | Count of γ_b + γ_a |
| `market_count` | Count of μ_b + μ_s |

### `symbol_from_binance_trade(data) → Symbol`

Factory from a Binance trade JSON payload.

### `symbol_from_binance_depth(side, price, old_qty, new_qty, timestamp) → list[Symbol]`

Factory from a Binance depth update delta.

### `SymbolicAnalyzer`

Static analysis toolkit — all methods are stateless:

| Method | Returns |
|--------|---------|
| `theta(symbols)` | `(alive, dead)` — Θ reduction |
| `entropy(symbols)` | Shannon entropy H (base-2) |
| `transition_matrix(symbols)` | `{from: {to: count}}` |
| `mixing_time(tm)` | `(tau, spectral_gap)` |
| `impact_burst_fano(symbols)` | Fano factor of inter-impact times |
| `transfer_entropy(symbols, src, tgt, lag=1)` | TE(src → tgt) in bits |
| `transfer_entropy_lags(symbols, src, tgt, max_lag=5)` | `{lag: TE}` |
| `clustering_indices(tm)` | `{symbol: index}` |
| `impact_quantiles(symbols)` | `{q25, q50, q75, q90, q99}` |
| `compute_temperature(symbols)` | T = λ_μ / (λ_α + λ_β) |
| `compute_internal_energy(alive, mid_price)` | U = Σ \|p − p_mid\| · q |
| `compute_free_energy(alive, mid_price, temperature)` | F = U − T · H(Θ) |
| `compute_entropy_production(alive, dead, prev_h_alive)` | Σ = H(dead) − ΔH(Θ) |
| `realized_volatility(mid_prices)` | Annualized RV from log returns |
| `imbalance_ratio(bid_depth, ask_depth)` | ∈ [−1, 1] |
| `lambda_mu(symbols, time_window_sec)` | Market-order arrival rate |
| `mean_row_entropy(tm)` | Average row entropy of transition matrix |
| `kolmogorov_complexity_estimate(symbols)` | LZ compression ratio |
| `hurst_exponent(symbols)` | R/S analysis on sign walk |
| `conditional_entropy(symbols, lag=1)` | H(X\_{t+lag} \| X_t) |
| `mutual_information(symbols, lag=1)` | I(X_t; X\_{t+lag}) |
| `kullback_leibler(symbols_p, symbols_q)` | D_KL(P \| Q) with Laplace smoothing |
| `autocorrelation(symbols, max_lag=10)` | `{lag: ρ}` |
| `power_law_exponent(symbols)` | Hill estimator on market-order sizes |
| `order_flow_imbalance(symbols, window=100)` | Rolling OFI as numpy array |
| `var_ratio_test(symbols, lag=10)` | VR(lag) statistic |
| `spread_depth_ratio(spread, bid_depth, ask_depth)` | spread / total_depth |
| `volume_weighted_spread(symbols, mid_price)` | Volume-weighted avg distance from mid |

### `MetricsReport`

One-shot computation of the full metric suite:

```python
report = MetricsReport.compute(symbols, order_book, mid_price, prev_h_alive, time_window_sec)
report.to_dict()     # nested dict by section
report.flat_dict()   # flat dict for CSV / DataFrame
```

### `RealtimeOrderBook`

Maintains a live order book and emits `Symbol` events on each Binance `depthUpdate`.

| Method / Property | Returns |
|-------------------|---------|
| `apply_update(data)` | `list[Symbol]` |
| `apply_updates(data_list)` | `list[Symbol]` |
| `snapshot()` | Dict with bids, asks, depths, VWAP, imbalance, etc. |
| `depth_at_price(price, side)` | Quantity at a given price level |
| `vwap(side, levels=None)` | Volume-weighted average price |
| `reset()` | Clear the book |
| `mid_price` | Best mid price |
| `spread` | Bid-ask spread |
| `best_bid` | Highest bid price |
| `best_ask` | Lowest ask price |

### `LiveConfig`

Configuration dataclass for the live pipeline (symbol, WebSocket URL, window size, intervals, etc.).

### `EventStore`

Buffered Parquet-backed event storage. Flushes every 5 000 events; rotates files at 2 000 000 events.

### `ColdPathAnalyzer`

Background thread that periodically snapshots the symbol window, applies Θ reduction, and writes the full metric suite (including all advanced metrics) to CSV.

## Architecture

```
Binance WebSocket
  │
  ├─ depth stream ──→ RealtimeOrderBook ──→ Symbol events (α, β, γ_b, γ_a)
  │
  └─ trade stream ──────────────────────→ Symbol events (μ_b, μ_s)
                                              │
                                              ▼
                                     ┌───────────────┐
                                     │  Word Buffer   │  (sliding window, default 50 000)
                                     └───────┬───────┘
                                             │
                              ┌──────────────┼──────────────┐
                              ▼              ▼              ▼
                        EventStore    ColdPathAnalyzer   Heartbeat
                        (Parquet)     (Θ → metrics.csv)  (memory monitor)
```

## Changelog

### v0.2.0

- **Advanced metrics**: Kolmogorov complexity, Hurst exponent, conditional entropy, mutual information, KL divergence, autocorrelation, power-law exponent, order flow imbalance, variance ratio test, volume-weighted spread, realized volatility
- **`MetricsReport`**: one-shot computation of the full metric suite with `to_dict()` and `flat_dict()` export
- **`Word` container**: lightweight wrapper with `type_counts()`, `type_frequencies()`, `slice_by_time()`, `to_arrays()`, `recent()`, and count properties
- **`SymbolType` extensions**: `.side` (bid/ask) and `.sign` (+1/−1) properties
- **Binance factory functions**: `symbol_from_binance_trade()` and `symbol_from_binance_depth()`
- **`RealtimeOrderBook` upgrades**: `snapshot()`, `depth_at_price()`, `vwap()`, `best_bid`, `best_ask`, `reset()`, `apply_updates()`
- **`ColdPathAnalyzer`**: now computes all advanced metrics and writes them to CSV
- **CI/CD**: PyPI publish workflow via GitHub Actions with trusted publishing

### v0.1.0

- Initial release: Θ reduction, entropy, transition matrix, mixing time, transfer entropy, thermodynamic analogues, live Binance pipeline

## Links

| Resource | URL |
|----------|-----|
| SSRN Paper | *Symbolic Dynamics of the Limit Order Book: An Algebraic Foundation* |
| PyPI | [pypi.org/project/symbolic-lob](https://pypi.org/project/symbolic-lob/) |
| Dashboard Source | [github.com/xiongyu23/lob-dashboard](https://github.com/xiongyu23/lob-dashboard) |
| Live Dashboard | [lob-dashboard.onrender.com](https://lob-dashboard.onrender.com/) |
| Author | [xiongyu.org](https://www.xiongyu.org/) |

## License

[MIT](LICENSE)
