Metadata-Version: 2.1
Name: hypereda
Version: 1.0.0
Summary: Automated and Interactive Exploratory Data Analysis with Visualization and Streamlit Interface
Author: Saurav Bhabak
Author-email: souravbhabak7@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas==2.2.2
Requires-Dist: numpy==1.26.4
Requires-Dist: matplotlib==3.9.2
Requires-Dist: seaborn==0.13.2
Requires-Dist: plotly==5.24.0
Requires-Dist: tqdm==4.66.4
Requires-Dist: termcolor==2.5.0
Requires-Dist: streamlit==1.38.0

# HyperEDA

HyperEDA is an advanced, fully automated, and interactive Exploratory Data Analysis (EDA) framework designed to transform raw data into meaningful insights with minimal human effort. Built for data scientists, analysts, and developers, HyperEDA intelligently analyzes any dataset, automatically detects its structure, performs statistical exploration, visualizes key patterns, and builds an elegant Streamlit-based graphical interface for interactive data understanding. The goal of HyperEDA is to help professionals move faster from data collection to decision-making by providing an instant analytical view of the dataset — without requiring complex code or external setup. This tool not only simplifies EDA but also enhances comprehension through rich visual feedback, smooth user experience, and a well-structured, dynamic dashboard. Unlike traditional EDA scripts that require manual coding for every plot and metric, HyperEDA automates the entire workflow: from inspecting missing values and duplicates to generating correlation matrices, univariate and multivariate plots, and statistical summaries. It can handle small to very large datasets efficiently using optimized sampling and memory-safe data handling techniques, ensuring that even gigabyte-scale datasets can be analyzed interactively. Once initialized, HyperEDA runs a data preparation pipeline that progressively reports its progress (with visual percentage updates such as `[10%] Loading data... [80%] Generating visual summaries... [100%] Report Ready!`). After completion, it automatically launches a Streamlit-based dashboard that is highly interactive, intuitive, and customizable — allowing users to control plot types, parameters, and display settings directly from the GUI without editing code.

HyperEDA supports over fifty of the most popular plots and visual analyses used in data science, including histograms, bar charts, scatter plots, pairplots, boxplots, violin plots, correlation heatmaps, distribution comparisons, and feature-wise statistics. Each visualization can be dynamically customized through the interface — users can select columns, adjust the number of bins, enable or disable annotations, change the sampling size, compare multiple variables side-by-side, and view all available metrics in one place. This makes HyperEDA not only a visual toolkit but also a flexible experimentation hub for data exploration. The interface automatically adapts to the system’s theme (light or dark mode) for better readability and aesthetics. It also provides advanced comparison features, allowing users to visualize relationships between one, two, or multiple columns simultaneously — including side-by-side comparisons of numerical and categorical attributes. Beyond visuals, HyperEDA automatically computes dataset-level metadata such as shape, column datatypes, missing value counts, duplicate records, and overall data health indicators. It generates comprehensive numeric summaries (mean, median, mode, skewness, kurtosis, variance, standard deviation) and categorical summaries (unique value counts, frequency distributions, and category ratios). At the end of the analysis, HyperEDA produces a detailed summary paragraph — a natural language description summarizing the entire dataset’s characteristics, structure, and key insights. This summary acts like an automated “data story,” helping analysts quickly interpret the dataset before diving into modeling or feature engineering.

From a technical perspective, HyperEDA is designed with modularity and extensibility in mind. The core module `EDA.py` defines the `EDA` class, which can be directly imported using `from HyperEDA import EDA`. This class encapsulates the full functionality of the tool — initialization, preprocessing, metadata generation, visualization, and dashboard creation. Users can instantiate it with their DataFrame object and run it using a single line of code such as `EDA(df).run()`. It automatically handles sampling, detects column types, and prepares optimized subsets for plotting to maintain interactivity even with massive datasets. The dashboard is powered by Streamlit, which ensures that users can explore the dataset in a browser-based, interactive environment. Each section of the dashboard is organized for clarity — dataset overview at the top, univariate analysis panels next, correlation and pairwise sections for relationship exploration, followed by detailed insights and an automatically generated textual summary at the end. Every visual section provides user controls like dropdowns, sliders, and checkboxes to adjust parameters in real time without reloading the app. The entire experience feels more like an intelligent assistant than a static report.

In terms of workflow, HyperEDA follows a clean and well-defined pipeline. When initialized, it logs each stage of progress — starting from data loading, cleaning, sampling, and statistical analysis, to the final rendering of the interactive report. All intermediate data such as the full dataset and sampled subset are safely stored in temporary directories (e.g., `/tmp/hypereda_timestamp/`) for quick access during visualization. These files are automatically managed and deleted after the session to maintain privacy and disk efficiency. The progress messages are designed to mimic professional model training logs, giving users real-time insight into what the tool is doing internally. This transparency helps build trust and improves user understanding of the data preparation flow. Moreover, HyperEDA’s backend is built purely using Python and open-source libraries like Pandas, NumPy, Matplotlib, Seaborn, Plotly, and Streamlit — ensuring there are no external dependencies or closed-source APIs. Each of these libraries is used in compliance with their open-source usage terms, and HyperEDA itself is developed as a safe, independent utility suitable for both local and enterprise environments. Users can deploy it locally, inside Jupyter notebooks, or integrate it into data pipelines where automatic EDA reports are required before model training.

What makes HyperEDA stand out is the level of control and interactivity it provides. While most automated EDA tools produce static HTML reports or notebook outputs, HyperEDA offers a live, interactive session that updates plots, summaries, and metrics based on user interaction. Users can experiment freely — view only selected features, change plotting parameters, switch between numeric and categorical analysis, or visualize the correlation heatmap with zoom and annotation options. The entire interface feels cohesive and premium, with careful attention to design aesthetics, typography, layout spacing, and responsiveness. Each chart is rendered dynamically using Plotly, ensuring smooth zooming, panning, and hovering effects, while Seaborn provides statistical overlays such as regression fits and kernel density estimates. The dashboard’s top section includes dataset statistics cards summarizing record count, feature count, missing data percentage, and duplicate ratio in a clean, minimalist layout. The middle section focuses on variable-level exploration, while the bottom section summarizes overall trends and automatically generates a descriptive interpretation of the findings. Users can view the first few rows (`head`) of the dataset, optionally select how many rows to preview, or display the entire dataset if desired. Every visualization and table in the app responds instantly to user inputs, maintaining a high level of interactivity.

HyperEDA is designed not only for professional analysts but also for students, educators, and researchers who want a guided and interactive approach to data exploration. It bridges the gap between manual analysis and automation by combining the intelligence of automated reporting with the flexibility of human-controlled exploration. The codebase is structured with readability and maintainability in mind, making it easy for developers to extend the tool with new visualization types or analytics modules. The Streamlit UI layer can be customized further to include machine learning previews, data cleaning modules, or feature selection tools in the future. HyperEDA’s vision is to become the most comprehensive and intuitive open-source platform for exploratory data analysis — one that can automatically generate insights but still give full control to the user when needed. Whether used as a quick preview tool before modeling, a teaching aid in data science courses, or a data validation layer in enterprise workflows, HyperEDA provides the perfect balance between automation, insight, and user experience. Its combination of technical robustness, visual clarity, and interactive depth makes it a must-have utility for anyone working with data.
