README.md aktualisiert
This commit is contained in:
118
README.md
118
README.md
@@ -58,7 +58,7 @@ This project uses [uv](https://github.com/astral-sh/uv) for fast dependency mana
|
||||
uv venv --python 3.13
|
||||
```
|
||||
|
||||
3. **Activate the environment**
|
||||
3. **Environment**
|
||||
|
||||
- Linux/macOS:
|
||||
|
||||
@@ -72,7 +72,7 @@ This project uses [uv](https://github.com/astral-sh/uv) for fast dependency mana
|
||||
.venv\Scripts\activate
|
||||
```
|
||||
|
||||
4. **Install dependencies**
|
||||
4. **Dependencies**
|
||||
This command installs locked dependencies and links the local `semeion` package in editable mode.
|
||||
|
||||
```bash
|
||||
@@ -81,8 +81,6 @@ This project uses [uv](https://github.com/astral-sh/uv) for fast dependency mana
|
||||
|
||||
### Running the Application
|
||||
|
||||
You can execute the module directly:
|
||||
|
||||
```bash
|
||||
python src/semeion/main.py
|
||||
```
|
||||
@@ -93,116 +91,6 @@ python src/semeion/main.py
|
||||
pytest
|
||||
```
|
||||
|
||||
## Data Flow (subject to change)
|
||||
|
||||
### Ingestion Pipeline
|
||||
|
||||
```bash
|
||||
Raw Evidence Sources
|
||||
├─ Forensic Images (E01, DD, AFF4)
|
||||
├─ Timeline CSV (Timesketch format)
|
||||
└─ Loose Files (documents, logs, databases)
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Artifact Extraction │
|
||||
│ • pytsk3 (images) │
|
||||
│ • CSV parser │
|
||||
│ • File processors │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Content Extraction │
|
||||
│ • PDF, DOCX, XLSX │
|
||||
│ • SQLite databases │
|
||||
│ • Text files │
|
||||
│ • OCR for images │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Semantic Enrichment │
|
||||
│ • Classify type │
|
||||
│ • Extract entities │
|
||||
│ • Detect relationships │
|
||||
│ • Add metadata │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Embedding Generation │
|
||||
│ → Remote/Local Service │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Index in Qdrant │
|
||||
│ • Vector + Payload │
|
||||
│ • Create indexes │
|
||||
│ • Snapshot for audit │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
Reproducibility: Each ingestion run generates a manifest file containing:
|
||||
|
||||
- Source hashes (MD5/SHA256 of evidence)
|
||||
- Model versions (embedding model, LLM)
|
||||
- Configuration parameters
|
||||
- Processing statistics
|
||||
- Timestamp and operator ID
|
||||
|
||||
This manifest allows exact reproduction of the index from the same source data.
|
||||
|
||||
### Query Execution Pipeline
|
||||
|
||||
```bash
|
||||
Natural Language Query
|
||||
"bitcoin transaction after drug deal"
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ LLM Query Parser │
|
||||
│ → Remote/Local Service │
|
||||
│ Returns: JSON Plan │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Query Plan Editor (UI) │
|
||||
│ • Review plan │
|
||||
│ • Adjust parameters │
|
||||
│ • Modify steps │
|
||||
│ • User approves │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Search Orchestrator │
|
||||
│ • Execute Step 1 │
|
||||
│ • Extract timestamps │
|
||||
│ • Execute Step 2 │
|
||||
│ • Apply temporal logic │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Correlation Engine │
|
||||
│ • Calculate proximity │
|
||||
│ • Weight scores │
|
||||
│ • Build relationships │
|
||||
└───────┬────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Results Presentation │
|
||||
│ • Timeline view │
|
||||
│ • Correlation graph │
|
||||
│ • Export options │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
|
||||
## Technical Stack
|
||||
|
||||
### Core Technologies
|
||||
@@ -249,8 +137,6 @@ Natural Language Query
|
||||
|
||||
- TBD, out of scope
|
||||
|
||||
---
|
||||
|
||||
## Supported Ingestion Formats
|
||||
|
||||
### Primary: Specialized Data Objects
|
||||
|
||||
Reference in New Issue
Block a user