Metadata-Version: 2.4
Name: dual-fw-svm
Version: 0.1.0
Summary: Fast Python reproduction of dual proximal and Frank-Wolfe SVM optimizers.
License-Expression: LicenseRef-Proprietary
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Requires-Dist: scipy>=1.9
Provides-Extra: benchmark
Requires-Dist: pandas>=1.5; extra == "benchmark"
Requires-Dist: scikit-learn>=1.2; extra == "benchmark"
Provides-Extra: test
Requires-Dist: scikit-learn>=1.2; extra == "test"
Dynamic: license-file

# dual-fw-svm

This workspace contains a Python reproduction of the two main algorithms in
`10_New_Optimization_Methods_fo.pdf`.

Implemented components:

- `BinaryL2DualSVM`: Algorithm 1, a proximal-gradient solver for binary L2-SVM
  dual variables with the paper's second-dual projection step.
- `MulticlassFrankWolfeSVM`: Algorithm 2, matrix-wise Frank-Wolfe for
  Crammer-Singer (`formulation="cs"`) and Weston-Watkins (`formulation="ww"`)
  multiclass SVMs.
- `BlockCoordinateFrankWolfeSVM`: Algorithm 3 style stochastic row-wise FW
  baseline for comparison with the paper's matrix-wise update.
- `benchmarks/compare_svm.py`: compares the reproduction with common sklearn
  baselines: `LinearSVC`, `LinearSVC(multi_class="crammer_singer")`,
  one-vs-rest `LinearSVC`, and `SGDClassifier`.

The default linear solvers avoid materializing the large Gram matrix. Binary
training uses `X.T @ (y * alpha)` and multiclass training keeps
`W = X.T @ alpha`, which is the main speed and memory choice for large
datasets. For small custom-kernel experiments, both main solvers also accept
`kernel="precomputed"` with a train Gram matrix during `fit` and a
test-by-train kernel matrix during prediction.

## Quick Start

Run tests:

```powershell
python -m unittest discover -s tests
```

Run the benchmark:

```powershell
python benchmarks/compare_svm.py
```

The benchmark writes:

```text
benchmarks/results_latest.csv
```

## Minimal Usage

```python
from dual_fw_svm import BinaryL2DualSVM, MulticlassFrankWolfeSVM

binary = BinaryL2DualSVM(C=1.0, max_iter=1000, tol=1e-5)
binary.fit(X_train, y_train)
pred = binary.predict(X_test)

multi = MulticlassFrankWolfeSVM(C=1.0, formulation="cs", max_iter=500)
multi.fit(X_train, y_train)
pred = multi.predict(X_test)
```

Precomputed kernel usage:

```python
K_train = X_train @ X_train.T
K_test = X_test @ X_train.T

binary_kernel = BinaryL2DualSVM(C=1.0, kernel="precomputed")
binary_kernel.fit(K_train, y_train)
pred = binary_kernel.predict(K_test)
```

## Notes

- `BinaryL2DualSVM.C` follows the paper's scaling:
  `0.5 ||w||^2 + C/2 * sum_i xi_i^2`.
- The multiclass implementations follow the no-bias formulation in the paper.
  Standardizing dense features before fitting is recommended for faster
  convergence and fair comparison.
- The benchmark uses `LinearSVC(C=C/2, loss="squared_hinge")` for the binary
  sklearn baseline because sklearn's squared-hinge objective uses a slightly
  different constant factor.
