Metadata-Version: 2.4
Name: stocksstrategy
Version: 0.1.0
Summary: A CLI tool for stock selection and backtesting
Author-email: Asachoo <asa_choo@hotmail.com>
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.14
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0.0
Requires-Dist: pydantic>=2.10.0
Requires-Dist: tushare>=1.4.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: tenacity>=8.0.0
Requires-Dist: pandera>=0.29.0
Dynamic: license-file

# StocksStrategy

A CLI program for stock selection and backtesting.

## Usage

### Backtest

```bash
stocksstrategy backtest --stock 600519 --strategy ma20 --start 2020-01-01 --end 2024-12-31
```

**Note:** Backtest command is not yet fully implemented.

**Options:**

| Option | Required | Default | Description |
|--------|----------|---------|-------------|
| `--stock` | Yes | - | Stock code, supports comma-separated |
| `--strategy` | Yes | - | Strategy name |
| `--start` | Yes | - | Start date (YYYY-MM-DD) |
| `--end` | No | today | End date (YYYY-MM-DD) |
| `--capital` | No | 100000.0 | Initial capital |

### Fetch Data

```bash
stocksstrategy fetch --stock 600519 --start 2020-01-01
```

**Options:**

| Option | Required | Default | Description |
|--------|----------|---------|-------------|
| `--stock` | Yes | - | Stock code, supports comma-separated |
| `--start` | Yes | - | Start date (YYYY-MM-DD) |
| `--end` | No | today | End date (YYYY-MM-DD) |
| `--refresh` | No | False | Force refresh cache |

### List Data

```bash
stocksstrategy list
stocksstrategy list --sync
stocksstrategy list --sync --since 2024-01-01
```

**Options:**

| Option | Required | Default | Description |
|--------|----------|---------|-------------|
| `--sync` | No | False | Sync all cached stocks from Tushare |
| `--since` | No | - | Only sync data since this date (YYYY-MM-DD) |

### Configuration

```bash
# Set Tushare token
stocksstrategy config set token <your_token>

# Show current configuration
stocksstrategy config show

# Clear configuration
stocksstrategy config clear
```

**Commands:**

| Command | Description |
|---------|-------------|
| `config set token` | Set Tushare Pro API token |
| `config show` | Show current configuration (token shown as masked) |
| `config clear` | Clear all configuration |

**Note:** Token can also be set via `TUSHARE_TOKEN` environment variable. Data directory can be customized via `STOCKSSTRATEGY_DATA_DIR` environment variable.

When setting token via `config set token`, the token is saved to your shell configuration file (~/.bashrc or ~/.zshrc). You may need to run `source ~/.bashrc` or restart your terminal to apply changes.


## Architecture

### Design Principles

The data layer follows a **three-layer architecture** with clear separation of concerns:

```
Application
    ↓
Repository (Domain Coordination)
    ↓ depends on
DataCoordinator (Data Orchestration)
    ↓ coordinates
    ├─→ DataAcquisitionStrategy (Acquisition Logic)
    ├─→ DataStorage (Stock Data Persistence)
    ├─→ MetadataStorage (Dataset Metadata Persistence)
    └─→ DataSource (External Data Acquisition)
```

**Core Principles:**
- **Single Responsibility**: Each component has one clear purpose
- **Unidirectional Dependency**: Upper layers depend on lower layers, never the reverse
- **Protocol-Oriented**: All components use Protocol for abstraction and testability
- **Local Data Management**: Local storage is a working dataset, not a cache (external API is source of truth)
- **Separation of Decision and Execution**: Strategy (decision) is separate from Coordinator (execution)

---

### Layer 1: Repository (`stocksstrategy.data.repositories`)

**Role**: Domain Coordination Layer - Provides business-friendly API and domain logic

**Responsibilities:**
- ✅ Provide domain-friendly interface
- ✅ Handle business-level logic (multi-stock aggregation, data standardization)
- ✅ Coordinate multiple data sources when needed
- ✅ Error handling and propagation (not enforcement)
- ✅ Parameter parsing and normalization
- ✅ Request orchestration across components
- ❌ NO data acquisition decisions (delegated to DataAcquisitionStrategy)
- ❌ NO data filtering logic (delegated to DataAcquisitionStrategy)
- ❌ NO consistency guarantees across components (handled by individual components)

**API:**
- `get(stock_code, start_date, end_date) -> pd.DataFrame` - Get stock data (fetch from local or external source)
- `refresh(stock_code, start_date, end_date) -> pd.DataFrame` - Force refresh from external source
- `get_multiple(stock_codes, start_date, end_date) -> dict[str, pd.DataFrame]` - Batch query (sequential execution)
- `get_latest(stock_code, days) -> pd.DataFrame` - Get recent N days of data
- `get_dataset_info(stock_code) -> DatasetMetadata | None` - Get dataset metadata
- `clear_local_data(stock_code) -> None` - Clear local dataset for specific stock
- `exists(stock_code) -> bool` - Check if data exists in storage
- `get_date_range(stock_code) -> tuple[date | None, date | None]` - Get cached date range for a stock
- `list_all_stocks() -> list[str]` - List all cached stock codes

**Dependencies:**
- DataCoordinator (primary)
- (Future: multiple data sources, validators, etc.)

**Key Design:**
- Repository is a **domain façade** - orchestration only, no invariant enforcement
- Simple sequential execution for batch operations
- Domain-specific abstractions (e.g., `get_latest()` instead of manual date calculation)
- Isolates domain logic from data management details

**Note on Consistency:**
- Repository does NOT enforce cross-component consistency
- Consistency guarantees are provided by individual components (DataCoordinator)
- No distributed transactions - single-process file operations only



---

### Layer 2: DataCoordinator (`stocksstrategy.data.coordinator`)

**Role**: Data Orchestration Layer - Manages local dataset lifecycle

**Responsibilities:**
- ✅ Orchestrate data acquisition and storage workflow
- ✅ Coordinate DataAcquisitionStrategy, Storage, and DataSource
- ✅ Execute data workflow (load → check → fetch → merge → save)
- ✅ Ensure data-metadata consistency within this component
- ❌ NO decision logic (delegated to DataAcquisitionStrategy)
- ❌ NO data operations (delegated to DataAcquisitionStrategy)
- ❌ NO direct I/O (delegated to Storage)

**API:**
- `get(key, start_date, end_date) -> DataResult` - Get data with transparent acquisition
- `refresh(key, start_date, end_date) -> DataResult` - Force refresh from DataSource
- `invalidate(key)` - Invalidate local dataset
- `get_metadata(key) -> DatasetMetadata | None` - Get dataset metadata
- `list_all_keys() -> list[str]` - List all cached dataset keys

**Supporting Types:**
- `DataResult` - Result with data, source indicator, and metadata
- `DatasetMetadata` - Metadata (key, date_range, row_count, last_updated, version)

**Dependencies:**
- DataAcquisitionStrategy (for decisions)
- DataStorage (for data persistence)
- MetadataStorage (for dataset metadata)
- DataSource (for fetching missing data)

**Key Design:**
- **Thin orchestration layer** - delegates all decisions to Strategy
- **No business logic** - purely procedural coordination
- **Local dataset manager** - external API is source of truth, local storage is working dataset
- **Protocol-based** - easy to test with mocks

**Orchestration Flow:**
```
get(key, params)
  │
  ├─ Load metadata from MetadataStorage
  │
  ├─ Ask Strategy: is_local_data_valid(metadata, params)?
  │   └─ If valid → Load from DataStorage + Strategy.filter()
  │
  ├─ Ask Strategy: calculate_fetch_ranges(cached, requested)
  │   └─ If ranges needed → Fetch from DataSource
  │
  ├─ If new data → Ask Strategy: merge_data(local, new)
  │
  ├─ Atomic update (consistency guarantee):
  │   ├─ 1. Save merged data to DataStorage (partitioned by year)
  │   ├─ 2. Update MetadataStorage (atomic commit)
  │   └─ 3. On failure: metadata not updated, old dataset remains valid
  │
  └─ Return DataResult(data, source, metadata)
```

**Data Consistency Guarantee:**
- **Write Order**: Data first, then metadata (ensures no phantom references)
- **Failure Handling**: If metadata update fails, old dataset remains valid
- **Partition Isolation**: Updates only affect specific year partitions
- **Scope**: Component-level consistency only (data + metadata within DataCoordinator)

---

### Layer 3: Infrastructure Components

The infrastructure layer consists of four specialized components that CacheCoordinator depends on:

---

#### Component 3.1: DataAcquisitionStrategy (`stocksstrategy.data.strategy`)

**Role**: Decision Engine - Data acquisition logic and operations

**Responsibilities:**
- ✅ **Decide**: Local data validity based on metadata and request
- ✅ **Calculate**: Incremental update ranges
- ✅ **Determine**: Fetch strategy (full vs incremental)
- ✅ **Execute**: Data merging (deduplication, sorting)
- ✅ **Execute**: Data filtering by date range
- ✅ **Define**: Data acquisition policies (validity rules, gap tolerance, etc.)
- ❌ NO I/O operations
- ❌ NO external calls

**API:**

*Decision Methods:*
- `is_local_data_valid(metadata, requested_range)` - Check if local data is valid for request
- `calculate_fetch_ranges(local_range, requested_range)` - Calculate missing date ranges
- `should_use_incremental_update(local_range, requested_range, gap_days)` - Decide incremental vs full fetch

*Data Operation Methods:*
- `merge_data(local_data, new_data)` - Merge local and new data (deduplication, sorting)
- `filter_by_range(data, start_date, end_date)` - Filter data by date range

**Supporting Types:**
- `DateRange` - Date range with start and end dates
- `DatasetMetadata` - See DataCoordinator for definition

**Key Design:**
- **Stateless and pure** - all methods are deterministic
- **Pluggable** - different strategies for different scenarios
- **Testable** - pure functions, easy to unit test

**Strategy Implementations:**
- `DefaultAcquisitionStrategy` - Validity-based with incremental updates (configurable validity_days, max_gap_days)
- `AlwaysFreshStrategy` - *Future Extension:* Always fetch fresh data from external source
- `CustomStrategy` - *Future Extension:* Implement custom acquisition logic for specific use cases

**Decision Logic:**
- **Local Data Valid**: Requested range within local range → return local data
- **Incremental Update**: Partial overlap → fetch only missing ranges (e.g., local 2024-01 to 2024-06, request 2024-01 to 2024-12 → fetch 2024-07 to 2024-12)
- **Full Refresh**: Large gap between local and requested → full fetch more efficient than incremental

**Future Split Trigger** (when scale increases):
- When merge_data() exceeds 50 lines
- When need to add split/dividend adjustment
- When need multiple merge strategies
- Then split into: DataValidityPolicy + DataAcquisitionPlanner + DataTransformer

---

#### Component 3.2: Storage (`stocksstrategy.data.storage`)

**Role**: Persistence Layer - Pure CRUD operations without logic

**Responsibilities:**
- ✅ Save/Load/Delete operations
- ✅ File/Database management
- ❌ NO data validation
- ❌ NO business decisions
- ❌ NO data transformation

**API:**

*DataStorage Protocol:*
- `save(key, data, partition_key)` - Save stock data to specific partition
- `load(key, partition_keys)` - Load stock data from partitions
- `list_partitions(key)` - List available partitions for a key
- `exists(key, partition_key)` - Check if partition exists
- `delete(key, partition_key)` - Delete specific partition

*MetadataStorage Protocol:*
- `save_metadata(key, metadata)` - Atomically save cache metadata
- `get_metadata(key)` - Get cache metadata
- `exists(key)` - Check if metadata exists
- `delete_metadata(key)` - Delete cache metadata
- `list_all_keys() -> list[str]` - List all dataset keys that have metadata

**Implementations:**
- `ParquetStorage` - Columnar data storage with **year-based partitioning**
  ```
  data/
    600519.SH/
      2020.parquet  # Only this partition updated when adding 2020 data
      2021.parquet
      2024.parquet
  ```
- `SqliteMetadataStorage` - Relational storage for dataset metadata with atomic commits

**Key Design:**
- **Partitioning Strategy**: Data partitioned by year to avoid full-file rewrites
- **Incremental Updates**: Only affected year partitions are rewritten
- **Write Performance**: O(partition_size) instead of O(total_size)
- **Metadata Atomicity**: SQLite transaction ensures metadata consistency
- Storage is "dumb" - saves what it receives, returns what it has
- No validation or transformation
- Easy to swap implementations (e.g., Parquet → DuckDB, SQLite → PostgreSQL)

**Performance Characteristics:**
- **Full Dataset**: 10 years × 10 stocks = 100MB (10 partitions per stock)
- **Incremental Update**: Only rewrite affected partition (~1MB)
- **Merge Cost**: O(partition_size) instead of O(dataset_size)
- **Query Performance**: Read only required partitions

---

#### Component 3.3: DataSource (`stocksstrategy.data.sources`)

**Role**: Acquisition Layer - External data fetching and validation

**Responsibilities:**
- ✅ Fetch data from external APIs
- ✅ **Validate data schema** (boundary validation)
- ✅ Format conversion (API format → internal format)
- ✅ Error handling and retry mechanisms
- ❌ NO caching logic

**API:**
- `fetch(stock_code, start_date, end_date)` - Fetch stock data from external API
- `validate(data)` - Validate data schema and format
- `is_available()` - Check if data source is available
- `get_listing_date(stock_code) -> date | None` - Get listing date for a stock

**Current Implementation:**
- `TushareDataSource` - Tushare Pro API integration
  - **This is the source of truth for stock data**
  - Automatic retry for rate limits
  - Stock code formatting (`600519` → `600519.SH`)
  - Schema validation with Pandera
  - Date normalization

**Key Design:**
- DataSource is called by DataCoordinator, not Repository
- Data validation happens at system boundary
- **External API is source of truth, local storage is working dataset**
- Returns validated, normalized data ready for local storage

---

### Dependency Graph

```
┌──────────────────────────────────────┐
│        Repository                    │ ← Layer 1: Domain
│  (Domain Coordination)               │
└────────────────┬─────────────────────┘
                 │ depends on
                 ▼
┌──────────────────────────────────────┐
│     DataCoordinator                  │ ← Layer 2: Orchestration
│  (Data Management)                   │
└──┬────────┬──────────┬──────────┬────┘
   │        │          │          │ depends on
   ▼        ▼          ▼          ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Strategy│ │Data    │ │Metadata│ │Data    │ ← Layer 3: Infrastructure
│        │ │Storage │ │Storage │ │Source  │    (4 components)
└────────┘ └────────┘ └────────┘ └────────┘
     │                              │
     │                              └─ (Source of Truth)
     └─ (Stateless, no dependencies)
```

**Layering Rationale:**
- **Layer 1 - Repository**: High-level domain abstraction (façade only)
- **Layer 2 - DataCoordinator**: Mid-level orchestration (manages local dataset)
- **Layer 3 - Infrastructure**: Four specialized components (Strategy, DataStorage, MetadataStorage, DataSource)

---

### Responsibility Matrix

| Function | Repository | DataCoordinator | DataAcquisitionStrategy | Storage | DataSource |
|----------|------------|-----------------|------------------------|---------|------------|
| Domain API | ✅ | ❌ | ❌ | ❌ | ❌ |
| Multi-source coordination | ✅ | ❌ | ❌ | ❌ | ❌ |
| Batch operations | ✅ | ❌ | ❌ | ❌ | ❌ |
| Data standardization | ✅ | ❌ | ❌ | ❌ | ❌ |
| Parameter parsing | ✅ | ❌ | ❌ | ❌ | ❌ |
| List all stocks/keys | ✅ | ✅ | ❌ | ❌ | ❌ |
| Data workflow orchestration | ❌ | ✅ | ❌ | ❌ | ❌ |
| Component coordination | ❌ | ✅ | ❌ | ❌ | ❌ |
| Data-metadata consistency | ❌ | ✅ | ❌ | ❌ | ❌ |
| Local data validity check | ❌ | ❌ | ✅ | ❌ | ❌ |
| Incremental range calc | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data filtering | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data merging | ❌ | ❌ | ✅ | ❌ | ❌ |
| Acquisition decisions | ❌ | ❌ | ✅ | ❌ | ❌ |
| CRUD operations | ❌ | ❌ | ❌ | ✅ | ❌ |
| File/DB management | ❌ | ❌ | ❌ | ✅ | ❌ |
| API calls | ❌ | ❌ | ❌ | ❌ | ✅ |
| Schema validation | ❌ | ❌ | ❌ | ❌ | ✅ |
| Retry mechanism | ❌ | ❌ | ❌ | ❌ | ✅ |
| Listing date query | ❌ | ❌ | ❌ | ❌ | ✅ |

---

### Data Flow Example

**User Request**: `repository.get("600519", date(2024, 1, 1), date(2024, 12, 31))`

```
1. Repository (Domain Layer)
   ├─ Normalize stock_code: "600519"
   ├─ Validate date range
   └─ Call: coordinator.get(key="600519", params=...)
   
2. DataCoordinator (Orchestration Layer)
   ├─ Load metadata from MetadataStorage
   │  └─ Found: local range = [2024-01-01, 2024-06-30]
   │
   ├─ Ask Strategy: calculate_fetch_ranges(...)
   │  └─ Returns: [DateRange(2024-07-01, 2024-12-31)]
   │
   ├─ Load local data from DataStorage (partitions: 2024.parquet)
   │  └─ Returns: DataFrame with 120 rows
   │
   ├─ Fetch missing data from DataSource (source of truth)
   │  └─ DataSource.fetch("600519", 2024-07-01, 2024-12-31)
   │     └─ Returns: DataFrame with 123 rows (validated)
   │
   ├─ Ask Strategy: merge_data(local, new)
   │  └─ Returns: DataFrame with 243 rows (deduplicated, sorted)
   │
   ├─ Atomic update with consistency guarantee:
   │  ├─ 1. DataStorage.save("600519", merged_data, partition_key="2024")
   │  │    └─ Writes to 2024.parquet (only this partition updated)
   │  ├─ 2. MetadataStorage.save_metadata("600519", new_metadata)
   │  │    └─ SQLite transaction commits atomically
   │  └─ On failure: old metadata retained, old dataset remains consistent
   │
   ├─ Ask Strategy: filter_by_range(merged, 2024-01-01, 2024-12-31)
   │  └─ Returns: DataFrame with 243 rows
   │
   └─ Return: DataResult(data, source="partial", metadata)

3. Repository
   ├─ (Optional) Post-processing
   └─ Return DataFrame to user
```

---

## Architecture Philosophy

**This is a personal data management tool for backtesting, NOT a production data platform.**

### Conceptual Model

**What This System Is:**
- **Local Data Manager**: Manages working datasets for offline analysis
- **Request-driven Acquisition**: Fetches data on-demand when needed
- **Single-user CLI Tool**: Designed for personal use, not multi-user server

**What This System Is NOT:**
- ❌ NOT a production data lake or data warehouse
- ❌ NOT a real-time streaming system
- ❌ NOT a distributed multi-user platform
- ❌ NOT a "cache" (external API is source of truth, local storage is working dataset)

### Source of Truth

**External API (DataSource) = Source of Truth**
- Tushare API (or other data providers) is the canonical source
- Local storage is a **materialized copy** for offline work
- Data refresh = synchronize with external truth

**Local Storage = Working Dataset**
- Not a "cache" but a persistent working dataset
- Enables offline backtesting without repeated API calls
- Can be cleared and rebuilt from source at any time

### Data Acquisition Model

**Primary Mode: Request-driven**
```python
# User query triggers data fetch if missing
repository.get("600519", start, end)  # Fetches if not locally available
```

**Trade-offs:**
- ✅ Flexible: Query any stock anytime
- ✅ Simple: No scheduling infrastructure
- ❌ Query latency varies (fast if local, slow if fetch needed)
- ❌ API rate limits coupled with user queries

**Future Extension: Scheduled Ingestion**
```python
# Periodic batch updates (when needed)
scheduler = DataIngestionScheduler(repository)
scheduler.schedule_daily_update(stock_list)
```

**When to add Scheduler:**
- When running production backtests (need predictable data)
- When analyzing many stocks regularly
- When query latency variance becomes problematic

### Consistency Model

**Component-level Consistency:**
- ✅ DataCoordinator ensures data-metadata consistency
- ✅ Atomic file operations (write data → commit metadata)
- ✅ Partition isolation (updates only affect changed partitions)

**NO System-level Consistency:**
- ❌ Repository does NOT enforce cross-component invariants
- ❌ No distributed transactions across Storage + DataSource
- ❌ No multi-process concurrency control
- ✅ Single-process, single-writer model only

### Design Assumptions

1. **Stock historical data is append-mostly**
   - Past data rarely changes (except corporate actions)
   - Most updates add new dates, not modify old data

2. **Queries request contiguous date ranges**
   - Typical: "Get 2020-2024 daily data"
   - Not optimized for: "Get every Monday from 2010-2024"

3. **Updates happen periodically, not continuously**
   - Daily or weekly updates, not real-time streaming
   - Acceptable to have slightly stale data

4. **Single-process access**
   - CLI tool run by one user
   - No concurrent writes from multiple processes

### Scalability Limits

**Suitable For:**
- ✅ 10-100 stocks
- ✅ 10-20 years daily data per stock
- ✅ ~1GB total storage
- ✅ Single-user analysis

**NOT Suitable For:**
- ❌ Tick-level data (millions of rows)
- ❌ Real-time streaming updates
- ❌ Multi-user concurrent access
- ❌ Production trading systems

---

### Architectural Constraints and Trade-offs

**Data Consistency Model:**
- **Consistency Level**: Best-effort, not ACID-compliant
- **No Distributed Transactions**: Single-process file operations only
- **Failure Recovery**: Stale cache remains valid on update failure
- **Concurrency Model**: Single-writer, multiple-readers (file system level)

**Performance Trade-offs:**
- **Partitioning**: Year-based partitioning reduces write amplification
  - ✅ Incremental updates: O(partition_size)
  - ❌ Small queries may read multiple partitions
- **Merge Strategy**: In-memory merge limits dataset size
  - ✅ Simple implementation, no external dependencies
  - ❌ Memory usage: O(partition_size) per operation

**Scalability Limits:**
- **Single Stock Dataset**: 10-20 years daily data (~50K rows, ~5-10MB) ✅
- **Total Storage**: 100 stocks × 10MB = ~1GB ✅
- **Concurrent Requests**: Limited by file system locks
- **Not Suitable For**: 
  - Tick-level data (millions of rows)
  - Real-time streaming updates
  - Multi-process concurrent writes

**Design Assumptions:**
- Stock historical data is **append-mostly** (past data rarely changes)
- Queries typically request **contiguous date ranges**
- Updates happen **periodically** (daily/weekly), not continuously
- Single-process access (CLI tool, not multi-user server)

---

### Benefits

**Single Responsibility Principle Enforced:**
- Repository: Domain façade only (no invariant enforcement)
- DataCoordinator: Orchestration and component-level consistency
- DataAcquisitionStrategy: Decisions and data operations only
- Storage: Persistence only
- DataSource: Acquisition only (source of truth)

**Testability:**
- Each component can be tested in isolation
- Strategy is pure functions → trivial to test
- Coordinator can mock Strategy and Storage
- Repository can mock Coordinator
- Protocol-based design enables easy test doubles

**Maintainability:**
- Clear boundaries prevent responsibility leakage
- Changes to acquisition logic only affect DataAcquisitionStrategy
- Changes to storage only affect Storage implementations
- Each component is under 200 lines of focused code

**Extensibility:**
- **Swap Strategies**: Implement `AlwaysFreshStrategy` *(future extension)*, `ValidityBasedStrategy` *(future extension)*, etc.
- **Swap Storage**: Parquet → DuckDB, no changes to Coordinator or Repository
- **Add DataSources**: Support multiple data providers (Tushare, Wind, etc.)
- **Extend Repository**: Add batch operations, data fusion, validation pipelines
- **Add Scheduler**: Implement scheduled ingestion when needed (optional extension)

**Transparency:**
- Users call simple `repository.get()` without knowing:
  - Whether data comes from local storage or external source
  - How incremental updates work
  - What acquisition strategy is used
  - Storage implementation details

**Flexibility:**
- Different strategies for different scenarios
- Strategy can be configured or swapped at runtime
- Repository can coordinate multiple data sources
- Local data can be bypassed for always-fresh queries
