Files
llm-eval-forensics/test_suite.md
2026-01-16 09:18:07 +01:00

453 lines
11 KiB
Markdown

# IT Forensics Tests
This document provides detailed explanations of the IT Forensics tests in the evaluation suite.
## Overview
The forensics tests are designed to evaluate an AI model's ability to:
1. Interpret raw hex data from various forensic artifacts
2. Apply domain knowledge of file systems, registry, and network protocols
3. Perform accurate byte-order conversions (little-endian)
4. Correlate events and reconstruct timelines
5. Explain technical concepts clearly
## 🔍 Test Breakdown
### IT Forensics - File Systems
#### Test: forensics_mft_01 - MFT Entry Analysis (Basic)
**Purpose**: Evaluate basic NTFS Master File Table interpretation
**Key Concepts**:
- MFT Signature: "FILE" (46 49 4C 45 in hex, ASCII)
- Entry flags at offset 0x16:
- 0x01 = In use
- 0x02 = Directory
- Sequence number: 16-bit value at offset 0x10 (little-endian)
**Example Hex Dump**:
```bash
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 46 49 4C 45 30 00 03 00 95 1F 23 00 00 00 00 00
00000010 01 00 01 00 38 00 01 00 A0 01 00 00 00 04 00 00
```
**Expected Analysis**:
- Signature: "FILE" (bytes 00-03)
- Update Sequence Offset: 0x0030 (bytes 04-05, little-endian)
- Update Sequence Size: 0x0003 (bytes 06-07, little-endian)
- Sequence Number: 0x0001 (bytes 10-11, little-endian)
- Flags: 0x0001 at offset 0x16 = In use
**Scoring Criteria**:
- 5 points: Identifies all fields correctly with offset references
- 3-4 points: Identifies most fields, minor errors in interpretation
- 1-2 points: Recognizes MFT but misses key fields
- 0 points: Cannot identify as MFT entry
---
#### Test: forensics_mft_02 - MFT Entry Analysis (Advanced)
**Purpose**: Deep understanding of MFT structure
**Additional Concepts**:
- Update Sequence Array (USA): Anti-corruption mechanism
- $LogFile Sequence Number (LSN): Transaction logging
- First Attribute Offset: Where attribute records begin
- MFT Entry Flags: Bitfield indicating file properties
**Key Offsets**:
- 0x00-0x03: Signature "FILE"
- 0x04-0x05: Update Sequence Offset
- 0x06-0x07: Update Sequence Size
- 0x08-0x0F: $LogFile Sequence Number (LSN, 64-bit)
- 0x10-0x11: Sequence Number
- 0x14-0x15: First Attribute Offset
- 0x16-0x17: Flags (0x01=in use, 0x02=directory)
**Example Analysis for LSN**:
```bash
Offset 08: EA 3F 00 00 00 00 00 00
Little-endian 64-bit: 0x0000000000003FEA = 16362 decimal
```
**Scoring Criteria**:
- 5 points: All fields correct with little-endian conversion shown
- 3-4 points: Most fields correct, minor calculation errors
- 1-2 points: Understands structure but significant errors
- 0 points: Cannot parse MFT header
---
#### Test: forensics_signature_01 - File Signature Identification
**Purpose**: Recognition of common file magic numbers
**Magic Numbers to Know**:
| Signature | File Type | Notes |
|-----------|-----------|-------|
| FF D8 FF E0 | JPEG | Often followed by "JFIF" |
| 89 50 4E 47 0D 0A 1A 0A | PNG | \\x89PNG + line endings |
| 25 50 44 46 | PDF | "%PDF" in ASCII |
| 50 4B 03 04 | ZIP | "PK" headers (PKZip) |
| 52 61 72 21 1A 07 | RAR | "Rar!" + markers |
| 4D 5A | EXE/DLL | DOS "MZ" header |
| 7F 45 4C 46 | ELF | Linux executables |
**Test Example**:
```bash
A) FF D8 FF E0 00 10 4A 46 49 46
→ JPEG (FF D8 FF + JFIF marker)
B) 50 4B 03 04 14 00 06 00
→ ZIP/DOCX/XLSX (PKZip format)
```
**Scoring Criteria**:
- 5 points: All signatures identified with explanations
- 3-4 points: Most correct, understands concept
- 1-2 points: Recognizes some but misses key ones
- 0 points: Cannot identify file signatures
---
### IT Forensics - Registry & Artifacts
#### Test: forensics_registry_01 - Windows Registry Hive Header
**Purpose**: Parse Windows Registry binary format
**Key Structure**:
```bash
Offset Field Size
0x00 Signature "regf" 4 bytes
0x04 Primary Seq Number 4 bytes (little-endian)
0x08 Secondary Seq Number 4 bytes (little-endian)
0x0C Timestamp 8 bytes (FILETIME)
0x14 Major Version 4 bytes
0x18 Minor Version 4 bytes
```
**Example**:
```bash
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 72 65 67 66 E6 07 00 00 E6 07 00 00 00 00 00 00
Analysis:
- Signature: "regf" (72 65 67 66)
- Primary Seq: 0x000007E6 = 2022 decimal
- Secondary Seq: 0x000007E6 = 2022 decimal
```
**Scoring Criteria**:
- 5 points: Correct parsing with endianness consideration
- 3-4 points: Identifies structure, minor errors
- 1-2 points: Recognizes registry but inaccurate parsing
- 0 points: Cannot identify registry hive
---
#### Test: forensics_timestamp_01 - FILETIME Conversion
**Purpose**: Convert Windows timestamps to human-readable format
**FILETIME Format**:
- 64-bit value (little-endian)
- Counts 100-nanosecond intervals
- Epoch: January 1, 1601 00:00:00 UTC
**Conversion Process**:
1. Reverse byte order (little-endian to big-endian)
2. Convert to decimal
3. Divide by 10,000,000 to get seconds
4. Add to Unix epoch conversion factor
**Example**:
```bash
Hex: 01 D8 93 4B 7C F3 D9 01
Reversed: 01 D9 F3 7C 4B 93 D8 01
Decimal: 133,000,000,000,000,001
Seconds: 13,300,000,000
Date: Approximately 2023-05-15 (depends on epoch calculation)
```
**Scoring Criteria**:
- 5 points: Correct conversion with methodology explained
- 3-4 points: Understands process, calculation errors acceptable
- 1-2 points: Recognizes FILETIME but significant errors
- 0 points: Cannot explain conversion
### IT Forensics - Memory & Network
#### Test: forensics_memory_01 - Memory Artifact Identification
**Purpose**: Extract meaningful data from memory dumps
**Key Artifacts to Identify**:
- HTTP headers (GET/POST requests)
- Session cookies (PHPSESSID, etc.)
- IP addresses and hostnames
- User agents
- Authentication tokens
**Example Analysis**:
```bash
GET /admin/login.php HTTP/1.1
Host: 192.168.1.100
Cookie: PHPSESSID=a3f7d8bc9e2a1d5c
Forensic Value:
- Web access to admin panel
- Target: 192.168.1.100
- Session: a3f7d8bc9e2a1d5c
- Timeline: Can correlate with web server logs
```
**Scoring Criteria**:
- 5 points: All artifacts extracted with forensic significance explained
- 3-4 points: Most artifacts identified, basic analysis
- 1-2 points: Recognizes HTTP but misses key details
- 0 points: Cannot identify artifacts
---
#### Test: forensics_network_01 - TCP Header Analysis
**Purpose**: Parse TCP packet headers
**TCP Header Structure** (first 20 bytes):
```bash
Offset Field Size Notes
0-1 Source Port 16 bits Big-endian
2-3 Destination Port 16 bits Big-endian
4-7 Sequence Number 32 bits Big-endian
8-11 Acknowledgment 32 bits Big-endian
12 Data Offset+Flags 8 bits Upper 4=offset, lower 4=reserved
13 Flags 8 bits SYN, ACK, FIN, RST, PSH, URG
14-15 Window Size 16 bits Big-endian
16-17 Checksum 16 bits
18-19 Urgent Pointer 16 bits
```
**TCP Flags** (byte 13):
- 0x01: FIN
- 0x02: SYN
- 0x04: RST
- 0x08: PSH
- 0x10: ACK
- 0x20: URG
**Example**:
```bash
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 C3 5E 01 BB 6B 8B 9C 41 00 00 00 00 50 02 20 00
Analysis:
- Source Port: 0xC35E = 50014
- Dest Port: 0x01BB = 443 (HTTPS)
- Sequence: 0x6B8B9C41
- Flags: 0x02 = SYN (connection initiation)
- Window: 0x2000 = 8192 bytes
```
**Scoring Criteria**:
- 5 points: All fields correct with protocol understanding
- 3-4 points: Most fields correct, minor errors
- 1-2 points: Basic structure recognized, significant errors
- 0 points: Cannot parse TCP header
---
### IT Forensics - Timeline & Log Analysis
#### Test: forensics_timeline_01 - Event Reconstruction
**Purpose**: Correlate logs to identify attack patterns
**Timeline Analysis Skills**:
1. Chronological ordering
2. Event correlation across sources
3. Anomaly identification
4. Attack pattern recognition
5. Impact assessment
**Example Scenario**:
```bash
14:23:15 - Admin login from 10.0.0.5 ✓ Normal
14:23:47 - Access /etc/passwd ⚠️ Suspicious (enumeration)
14:24:12 - Write shell.php to web dir 🚨 Malicious (web shell)
14:24:45 - Netcat listener on 4444 🚨 Malicious (backdoor)
14:25:01 - External connection 🚨 Compromise (C2 callback)
14:26:33 - Admin logout
14:30:00 - Failed login from external 🚨 Lateral movement attempt
```
**Attack Pattern**: Web application compromise → web shell upload → reverse shell → persistence → lateral movement
**Scoring Criteria**:
- 5 points: Complete attack narrative with IOCs and recommendations
- 3-4 points: Identifies compromise, basic timeline
- 1-2 points: Recognizes suspicious activity, incomplete analysis
- 0 points: Cannot identify attack pattern
---
## 🎯 Multi-Turn Conversation Tests
### Test: multiturn_01 - Progressive Hex Analysis
**Purpose**: Maintain context across multiple exchanges while building understanding
**Turn 1**: File type identification from initial bytes
**Turn 2**: Structure parsing with offset references
**Turn 3**: Next steps and deeper analysis
**Key Evaluation Points**:
- Remembers initial findings
- Builds on previous responses
- Shows progressive understanding
- Maintains technical accuracy
---
### Test: multiturn_02 - Forensic Investigation Scenario
**Purpose**: Simulate real investigation workflow
**Stages**:
1. Initial triage (data source identification)
2. Evidence correlation (connecting artifacts)
3. Impact assessment (IOC identification, response planning)
**Scoring Focus**:
- Logical investigation flow
- Context retention across turns
- Practical recommendations
- Complete picture integration
---
### Test: multiturn_03 - Technical Depth Building
**Purpose**: Progress from concept to implementation
**Progression**:
1. Concept explanation (NTFS ADS)
2. Practical application (attack scenarios)
3. Hands-on implementation (PowerShell commands)
**Expected Depth**:
- Turn 1: Clear conceptual understanding
- Turn 2: Builds on concept with examples
- Turn 3: Demonstrates practical application
---
## 📊 Evaluation Guidelines
### Little-Endian Conversions
**Always verify**:
- Byte order reversal shown
- Decimal conversion provided
- Offset references included
**Example**:
```bash
Bytes at offset 0x10: 42 00
Little-endian: 0x0042 = 66 decimal
```
### Hex to ASCII
Common conversions to know:
- 0x20-0x7E: Printable ASCII
- 46 49 4C 45 = "FILE"
- 50 4B = "PK"
- 4D 5A = "MZ"
### Forensic Significance
Always ask:
- What does this artifact tell us?
- How can it be used in investigation?
- What are the limitations?
- What other data sources confirm/refute this?
---
## 🎓 Recommended Resources
For deeper understanding:
- NTFS Documentation (Microsoft)
- RFC 793 (TCP)
- File Signatures Database (Gary Kessler)
- Windows Registry Forensics (Harlan Carvey)
- The Art of Memory Forensics (Ligh, Case, Levy, Walters)
---
## ⚖️ Scoring Summary
**Exceptional (4-5)**:
- Accurate hex interpretation
- Correct endianness handling
- Forensic context provided
- Clear explanations
**Pass (2-3)**:
- Basic accuracy
- Some interpretation errors
- Limited context
- Incomplete explanations
**Fail (0-1)**:
- Major misinterpretations
- No endianness consideration
- Missing forensic value
- Incoherent explanations