Metadata-Version: 2.1
Name: softhauzpy
Version: 0.0.1
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.32.3
Requires-Dist: beautifulsoup4>=4.12.3
Requires-Dist: nltk>=3.9.4

# SofthauzPy
**SofthauzPy** is a comprehensive Python toolkit built for developers creating intelligent, data-driven web applications. It provides a powerful suite of web utilities including web scraping tools, crawling systems, content extraction pipelines, and search engine components that help developers build fully customizable in-house website search solutions.

Designed for scalability and flexibility, Softhauz enables teams to collect, process, index, and search website content efficiently â€” all within a clean Python-first development ecosystem.

Built for developers who need scalable web data tools and intelligent search capabilities, Softhauz simplifies the process of scraping, processing, indexing, and searching website content.
From lightweight crawlers to fully customizable in-house search engine functionality, Softhauz helps developers build smarter web applications without relying heavily on external search services.


## Key Features

**Web Scraping & Crawling**

-   High-performance web scraping utilities
-   HTML parsing and structured data extraction
-   Recursive website crawling
-   Sitemap discovery and URL indexing
-   Support for asynchronous scraping workflows
-   Rate limiting and request handling utilities

**Search Engine Toolkit**

-   In-house website search engine creation
-   Full-text indexing and querying
-   Custom relevance ranking algorithms
-   Search filtering and query optimization
-   Incremental indexing support
-   Lightweight search infrastructure for internal platforms

**Content Processing**

-   Text normalization and cleaning
-   Metadata extraction
-   Duplicate content detection
-   Keyword extraction and tagging
-   Content chunking for AI and search applications

**AI & Semantic Search Ready**

-   Embedding generation helpers
-   Vector database compatibility
-   Semantic similarity search utilities
-   Retrieval-Augmented Generation (RAG) support
-   AI-powered content indexing workflows

**Developer Experience**

-   Modular and extensible architecture
-   Framework-friendly design for Flask, Django, and FastAPI
-   Easy API integration
-   Clean, Pythonic interfaces
-   Production-ready utilities for scalable deployments

> This program may incorporate artificial intelligence (AI) tools solely
> to support and enhance development efficiency, code quality, and
> overall performance. All software design, implementation, testing,
> validation, and quality assurance processes are conducted and reviewed
> by a qualified human software professional to ensure accuracy,
> reliability, security, and compliance with applicable standards.

Author:
**Urate, Karen**<br>
*Softhauz Software Architect*<br>
[softhauz.ca](https://softhauz.ca)
