Metadata-Version: 2.4
Name: simgen-vla
Version: 1.2.0
Summary: TRUE ZERO Error GPU Arithmetic - Drop-in PyTorch Replacement
Home-page: https://simgen.dev
Author: Clouthier Simulation Labs
Author-email: Clouthier Simulation Labs <kyle@simgen.dev>
License: Proprietary
Project-URL: Homepage, https://simgen.dev
Project-URL: Documentation, https://simgen.dev/docs/vla
Keywords: gpu,arithmetic,precision,exact,pytorch,cuda,zero-error,simgen
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Cython
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# SimGen VLA - Zero-Error GPU Arithmetic

**Drop-in PyTorch replacement. Eliminates floating point drift.**

```python
from simgen import vla

x = vla.tensor([1e16, 1.0, -1e16])
print(x.sum())  # 1.0 (exact) - PyTorch returns 0.0
```

## Installation

```bash
pip install simgen-vla
```

**Requires:** PyTorch 2.0+ with CUDA (install separately to avoid breaking your build)
```bash
pip install torch --index-url https://download.pytorch.org/whl/cu121
```

## The Problem VLA Solves

Standard floating point loses precision:

| Operation | PyTorch | VLA |
|-----------|---------|-----|
| `[1e16, 1, -1e16].sum()` | 0.0 | **1.0** |
| `[1e-16] * 1M .sum()` | ~1e-10 (wrong) | **1e-10** (exact) |
| `(a - b) + b` where a=1e15, b=1 | 0.0 | **1.0** |

VLA eliminates this drift entirely.

## Quick Start

```python
from simgen import vla

# Basic operations
a = vla.tensor([1.0, 2.0, 3.0])
b = vla.tensor([4.0, 5.0, 6.0])

# Exact arithmetic
c = a + b
d = vla.dot(a, b)
M = vla.matmul(vla.randn(100, 100), vla.randn(100, 100))

# Reproducibility - same SHA256 on any machine
vla.manual_seed(42)
result = vla.randn(10, 10)
print(result.sha256())  # Identical on Windows, Linux, any GPU
```

## API Reference

### Creation
```python
vla.tensor([1, 2, 3])
vla.zeros(3, 3), vla.ones(3, 3), vla.eye(3)
vla.randn(3, 3), vla.rand(3, 3)
vla.arange(10), vla.linspace(0, 1, 10)
```

### Arithmetic (Exact)
```python
a + b, a - b, a * b      # Exact
a / b, a ** 2            # High precision
a // b, a % b            # Floor div, mod
```

### Reductions (Exact)
```python
x.sum(), x.mean(), x.prod()
x.min(), x.max(), x.std(), x.var()
x.argmin(), x.argmax()
vla.norm(x)
```

### Linear Algebra (Exact)
```python
vla.dot(a, b)
vla.matmul(A, B), vla.mm(A, B)
vla.mv(A, v), vla.bmm(A, B)
```

### Linear Algebra - Full Suite
```python
from simgen.vla import linalg

x = linalg.solve(A, b)       # Solve Ax = b
A_inv = linalg.inv(A)        # Matrix inverse
d = linalg.det(A)            # Determinant
Q, R = linalg.qr(A)          # QR decomposition
L = linalg.cholesky(A)       # Cholesky decomposition
U, S, Vh = linalg.svd(A)     # SVD
vals, vecs = linalg.eig(A)   # Eigenvalues
x, res = linalg.lstsq(A, b)  # Least squares
r = linalg.matrix_rank(A)    # Matrix rank
c = linalg.cond(A)           # Condition number
```

### Math Functions
```python
x.exp(), x.log(), x.sqrt(), x.abs()
x.sin(), x.cos(), x.tan(), x.tanh()
x.floor(), x.ceil(), x.round()
x.clamp(min, max)
```

### Neural Network Activations
```python
x.relu(), x.sigmoid()
x.softmax(dim=-1), x.log_softmax(dim=-1)
x.gelu(), x.silu()
```

### Shape Operations
```python
x.reshape(2, 3), x.transpose(0, 1), x.T
x.squeeze(), x.unsqueeze(0), x.flatten()
vla.stack([a, b]), vla.cat([a, b])
vla.split(x, 2), vla.chunk(x, 2)
```

### Reproducibility
```python
vla.manual_seed(42)
x.sha256()       # Full cryptographic hash
x.fingerprint()  # Short 8-char hash
x.verify(hash)   # Verify result matches
```

### Conversion
```python
x.item()         # To Python scalar
x.tolist()       # To Python list
x.numpy()        # To NumPy (FP64)
x.cpu(), x.cuda()
```

## Reproducibility Guarantee

```python
from simgen import vla

# On Machine A
vla.manual_seed(42)
result = vla.matmul(A, B)
hash = result.sha256()  # 'a1b2c3d4...'

# On Machine B (different GPU, OS, etc.)
vla.manual_seed(42)
result = vla.matmul(A, B)
assert result.verify(hash)  # Always True
```

Same inputs = same SHA256, guaranteed across machines.

## Memory

8x memory vs PyTorch (64 bytes/element vs 8).
Trade-off: exact precision vs memory.

## Use Cases

- **Scientific Computing**: Simulations where drift accumulates over millions of steps
- **Financial**: Exact arithmetic for compliance and auditing
- **ML Research**: Bit-exact reproducibility across hardware
- **Verification**: Prove your results match across machines

## License

Free for personal, academic, and research use.
Commercial use: [simgen.dev](https://simgen.dev)

## Support

- Documentation: [simgen.dev/docs/vla](https://simgen.dev/docs/vla)
- Issues: kyle@simgen.dev
