Metadata-Version: 2.4
Name: simgen-vla
Version: 1.5.2
Summary: SimGen VLA: Exact GPU Computation. 212-bit precision that BEATS FP64.
Home-page: https://simgen.dev
Author: Clouthier Simulation Labs
Author-email: Clouthier Simulation Labs <support@simgen.dev>
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://simgen.dev
Project-URL: Documentation, https://simgen.dev/docs
Keywords: exact-arithmetic,GPU,precision,scientific-computing,machine-learning,simulation,finance,HPC,triton
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: triton>=2.0
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# SimGen

**Exact GPU Computation. Every calculation. Zero error.**

SimGen is a proprietary exact arithmetic library for GPU computing. It eliminates floating-point errors in scientific computing, machine learning, finance, and simulation.

## Installation

```bash
pip install simgen-vla
```

## Requirements

- **Linux** (Ubuntu 20.04+, RHEL 8+, or similar)
- Python 3.10+
- PyTorch 2.0+
- Triton 2.0+
- NVIDIA GPU with CUDA support

## Supported GPUs

SimGen includes precompiled kernels for all major NVIDIA architectures:

| Architecture | GPUs |
|--------------|------|
| sm_75 (Turing) | T4, RTX 20xx |
| sm_80 (Ampere) | A10, A30, A100 |
| sm_86 (Ampere) | RTX 30xx |
| sm_89 (Ada) | RTX 40xx |
| sm_90 (Hopper) | H100 |

## Quick Start

```python
from simgen import vla

# Exact dot product - ZERO error
result = vla.dot(x, y)

# Exact matrix multiplication - ZERO error
C = vla.matmul(A, B)

# Exact cross-entropy loss
loss = vla.cross_entropy(logits, targets)

# Exact layer normalization
out = vla.layer_norm(x, normalized_shape)
```

## v1.5.0 Features

### Precompiled GPU Kernels
- 51 optimized kernels precompiled for all supported architectures
- Instant startup - no JIT compilation delay
- Production-ready deployment

### Auto-GPU
Tensors are automatically moved to CUDA:

```python
x = torch.randn(64, 768)  # CPU tensor
result = vla.matmul(x, w)  # Automatically on GPU
```

### Autograd Support
Full PyTorch autograd integration for training:

```python
x = torch.randn(64, 768, requires_grad=True)
w = torch.randn(768, 512, requires_grad=True)

# Forward with exact arithmetic
y = vla.matmul(x, w)
loss = y.sum()

# Backward works automatically
loss.backward()
```

### High-Precision Transcendentals
- `arcsin`, `arccos`, `arctan`, `arctan2` - ~1e-16 precision
- Full FP64 accuracy for scientific applications

## Operations

### Mathematically Exact (Zero Error)
| Category | Operations |
|----------|------------|
| **Reductions** | `sum`, `mean`, `norm`, `dot` |
| **Linear Algebra** | `matmul`, `bmm`, `linear`, `outer` |
| **Convolutions** | `conv1d`, `conv2d` |
| **Pooling** | `avg_pool2d`, `max_pool2d` |

### Near-Exact (Millions of x improvement)
| Category | Operations |
|----------|------------|
| **Attention** | `softmax`, `log_softmax` |
| **Normalization** | `layer_norm`, `rms_norm`, `batch_norm`, `group_norm` |
| **Loss Functions** | `cross_entropy`, `logsumexp` |
| **Activations** | `gelu`, `silu`, `tanh`, `sigmoid`, `relu`, `mish` |
| **Math** | `exp`, `log`, `sqrt`, `sin`, `cos`, `pow` |

### Numerical Methods
- `cumsum`, `cumprod` - Cumulative operations
- `trapz`, `diff` - Integration and differentiation
- `searchsorted`, `histc` - Search and statistics

### ML Training
- `adam_step`, `adamw_step`, `sgd_step` - Optimizers
- `GradientAccumulator` - Exact gradient accumulation
- `KVCache` - Exact KV caching for inference

## Use Cases

| Domain | Benefit |
|--------|---------|
| **Finance** | Penny-perfect calculations, no rounding drift |
| **Scientific Simulation** | Exact conservation laws, reproducible results |
| **Machine Learning** | No gradient drift, exact loss computation |
| **Molecular Dynamics** | Energy conservation over billions of steps |
| **Climate Modeling** | Century-scale predictions without error accumulation |
| **Quantum Computing** | Exact unitary operations, precise state evolution |

## Benchmarks

Tested on NVIDIA T4, RTX 4070, A100:

| Operation | Improvement vs FP32 |
|-----------|---------------------|
| Gradient Accumulation | 4,800,000,000x |
| Matrix Chain | 245,000,000,000,000,000x |
| sigmoid | 780,000,000x |
| softmax | 629,000,000x |
| FFT Roundtrip | 738,000,000x |
| matmul | **EXACT** |
| sum | **EXACT** |
| dot | **EXACT** |

## Version History

- **v1.5.0** - Precompiled kernels for all GPU architectures, precision fixes
- **v1.4.0** - Full Triton kernel suite, quantum simulation support
- **v1.1.0** - Autograd support, auto-GPU

## License

Proprietary. All rights reserved.
Clouthier Simulation Labs.

## Contact

- Website: https://simgen.dev
- Email: kyle@simgen.dev
