Metadata-Version: 2.1
Name: module-dataquality
Version: 0.0.1
Summary: data profiling and basic data quality rules check
Author: Balbir
Author-email: Balbir250894@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown

#### `Working Example`
 
```sh
import dataqualitycheck as dq
from datetime import date
import time
```

### Applying SingleDatasetQualityCheck
**Step-1:**
Pass the configuration of the  blob connector  in `blob_connector_config` and
add a datasource by defining a `data_read_ob` and `data_write_ob`.

```sh
blob_connector_config = {"storage_account_name": "rgmdemostorage", "storage_account_access_key": "Yi0oL/FTXMVT1GqmKEtg57gshyWxIw15o+AyhcC27qnHfk9ljLPzzG4Fw+Z6u1yp3tfNqYEZ+wln+AStEYJGug==" , "container_name":"cooler-images", "sas_token":"?sv=2021-06-08&ss=bfqt&srt=co&sp=rwdlacupytf&se=2024-01-31T19:10:46Z&st=2022-12-16T11:10:46Z&spr=https&sig=3dzIPEHiPRohQpJn90XpaEKuER7D5TY5lvWZGm0yvbk%3D"}
```
```sh
data_read_ob = dq.AzureBlobDF(storage_name = blob_connector_config["storage_account_name"], sas_token = blob_connector_config["sas_token"])
data_write_ob =dq.AzureBlobDF(storage_name = blob_connector_config["storage_account_name"], sas_token = blob_connector_config["sas_token"])
```
**Step-2:**
 `tables_list` is a dictionary that contains the list of sources along with the container_name , source_type , layer , source_name , filename , read_connector_method and latest_file_path  for the tables on which the validations has to be applied .
```sh
tables_list={}
```

**Step-3:**
Instantiate a DataContext by passing `tables_list`,`interaction_between_tables`,`data_read_ob`,`data_write_ob`, `data_right_structure`,`job_id`,`time_zone`,`no_of_partition` and `output_db_name `.
You can also pass the `run_engine` with which you want to apply the quality checks. There are two run_engines  available:
- `pyspark`
- `polars`

By default, the run_engine is `pyspark`.
```sh
dq_ob =dq.SingleDatasetQualityCheck(tables_list={}, 
                                  interaction_between_tables=[],  
                                  data_read_ob  = data_read_ob, 
                                  data_write_ob = data_write_ob, 
                                  data_right_structure = 'file',
                                  job_id='blob_1',
                                  time_zone=None,
                                  output_db_name="data_quality_output",
                                  no_of_partition=4)
```
**Step-4:**
 Passing a `rules_diagnosys_summery_file_path` and `config_df` as an input and apply validations on various columns of respective table defined in the `config_df`.
```sh
 rules_diagnosys_summery_folder_path = "abfss://%s@%s.dfs.core.windows.net/processed/data_quality/summary3/" %(blob_connector_config["container_name"], blob_connector_config["storage_account_name"])

config_df = spark.read.option("header",True).csv("dbfs:/FileStore/shared_uploads/vaishali.garg@decisionpoint.in/quality_checks_config.csv")

dq_ob.apply_validation(config_df, write_summary_on_database = True, failed_schema_source_list = [], output_summary_folder_path = rules_diagnosys_summery_folder_path)
```
        
                                  
                                  
                                  
                                  
