Metadata-Version: 2.4
Name: satisfactoscript
Version: 0.5.2
Summary: An Enterprise-Ready, Declarative Data Engineering Framework for Databricks Lakehouse.
Author: julhouba
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pyspark>=3.3.0
Requires-Dist: delta-spark>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"

# 🚀 SatisfactoScript Framework (v0.5.1)

> **An Enterprise-Ready, Declarative Data Engineering Framework for Databricks Lakehouse.**

SatisfactoScript transforms complex PySpark pipelines into standardized, readable, and highly maintainable declarative contracts. By strictly decoupling the **What** (JSON Schemas) from the **How** (Python Business Rules), it empowers Data Engineers to build robust Bronze ➔ Silver ➔ Gold pipelines optimized for Power BI Direct Query.

---

## ✨ Key Capabilities

- 📄 **Declarative Pipelines:** Define your data contracts (sources, joins, casts, and output schemas) using a clear, JSON-like dictionary. No more 1,000-line PySpark spaghetti notebooks.
- 🧠 **Business Logic Isolation:** Write pure Python/PySpark business rules and register them dynamically using the `@RuleRegistry` decorator.
- ⚡ **Direct Query Optimized:** Pre-calculate complex logic (e.g., OBT, Last Year shifts, Distinct Counts) in the Gold layer to keep Power BI DAX ultra-light and lightning-fast.
- 🛡️ **Built-in Observability:** Automated, standardized logging at every step (Extraction, Joins, Rule Application, Delta Writes) for instant debugging.
- 🌍 **Environment Aware:** Seamless CI/CD integration with auto-detection of Dev/QA/Prod environments and dynamic catalog routing.

---

## 🏗️ Architecture

The framework is built on a highly modular architecture:

- `core.py`: The central orchestrator and schema parser.
- `registry.py`: The dynamic catalog isolating business rules and data loaders.
- `config.py`: The environment and YAML configuration manager.
- `loaders.py`: The abstraction layer for Databricks I/O and Delta Lake mechanics.

```text
 🥉 Bronze (Raw)  ➔  🥈 Silver (Standardized)  ➔  🥇 Gold (Semantic / OBT)
                             │
       ┌─────────────────────┴─────────────────────┐
       │ ⚙️ SatisfactoScript Engine               │
       │  ├─ 1. Declarative Schema (JSON)          │
       │  ├─ 2. Rule Registry (Python Logic)       │
       │  └─ 3. Delta I/O & Z-Order Optimization   │
       └───────────────────────────────────────────┘
                             │
                             ▼
                    📊 Power BI (Direct Query)


##🚀 Quick Start: Building a Pipeline in 5 Steps
Creating a new data pipeline requires minimal boilerplate. Here is how you build a standard pipeline:

┌─────────────────────────────────────────────────────┐
│  # 1. Load framework dependencies
│  %run ../../satisfactoscript/init_framework
└─────────────────────────────────────────────────────┘


# 2. Instantiate the Engine
┌─────────────────────────────────────────────────────┐
│  engine = SatisfactoEngine()
│  print(f"Working context: {engine.db}")
└─────────────────────────────────────────────────────┘

# 3. Define Parameters
┌─────────────────────────────────────────────────────┐
│  source_layer = engine.get_target_schema("silver")
│  target_layer = engine.get_target_schema("gold")
└─────────────────────────────────────────────────────┘

# 4. Define Schemas

┌───────────────────────────────────────────────────────────────────────────────────────┐
│  gold_schema = {
│      "tables": [
│          {
│              "name": f"`{engine.db}`.`{source_layer}`.`silver_transactions`",
│              "alias": "csa",
│              "filter": [{"column": "region", "operator": "eq", "value": "EMEA"}]
│          }
│      ],
│      "business_rules": [
│          "my_rule" # ➔ Points to a registered rule
│      ],
│      "select_final": [
│          ("transaction_id", "transaction_id", []),
│          ("transaction_date", "transaction_date", ["cast:date"]),
│          ("transaction_amount", "transaction_amount", ["cast:double"]),
│          ("clean_status", "status", []),
│          ("is_high_value", "is_high_value", ["cast:int"])
│      ]
}
└───────────────────────────────────────────────────────────────────────────────────────┘

# 4. Define Schemas
┌─────────────────────────────────────────────────────┐
│  engine.run_follow_schema(
│   schema_dict=gold_schema,
│   target_layer=target_layer,
│   target_table_name="my_new_table"`
│  )
└─────────────────────────────────────────────────────┘

## 🧩 Adding Business Rules
Business rules are decoupled from the execution notebook. You can define them in a centralized rules file (e.g., rules.py) using the registry decorator:

from pyspark.sql import functions as F
from satisfactoscript.core.registry import RuleRegistry
┌────────────────────────────────────────────────────────────────────────────────────────────────────┐
│    @RuleRegistry.register_rule()
│    def enrich_transaction_data(df):
│        """Generic business rule to flag high-value transactions and clean statuses."""
│        print("   -> [Business Rule] Applying: enrich_transaction_data")
│
│        # 1. Standardize columns
│        col_status = F.lower(F.col("transaction_status"))
│
│        # 2. Apply business logic
│        df_enriched = df.withColumn(
│            "is_high_value",
│            F.when(F.col("transaction_amount") >= 1000, 1).otherwise(0)
│        ).withColumn(
│            "clean_status",
│            F.when(col_status.isin(["completed", "done", "success"]), "Paid").otherwise("Pending")
│        )
│
│        # 3. Secure outputs
│        df_enriched = df_enriched.fillna({"transaction_amount": 0.0})
│
│        return df_enriched
│
└───────────────────────────────────────────────────────────────────────────────────────────────────┘
