Files
llm-eval-forensics/test_suite.md
2026-01-16 09:18:07 +01:00

11 KiB

IT Forensics Tests

This document provides detailed explanations of the IT Forensics tests in the evaluation suite.

Overview

The forensics tests are designed to evaluate an AI model's ability to:

  1. Interpret raw hex data from various forensic artifacts
  2. Apply domain knowledge of file systems, registry, and network protocols
  3. Perform accurate byte-order conversions (little-endian)
  4. Correlate events and reconstruct timelines
  5. Explain technical concepts clearly

🔍 Test Breakdown

IT Forensics - File Systems

Test: forensics_mft_01 - MFT Entry Analysis (Basic)

Purpose: Evaluate basic NTFS Master File Table interpretation

Key Concepts:

  • MFT Signature: "FILE" (46 49 4C 45 in hex, ASCII)
  • Entry flags at offset 0x16:
    • 0x01 = In use
    • 0x02 = Directory
  • Sequence number: 16-bit value at offset 0x10 (little-endian)

Example Hex Dump:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000  46 49 4C 45 30 00 03 00 95 1F 23 00 00 00 00 00
00000010  01 00 01 00 38 00 01 00 A0 01 00 00 00 04 00 00

Expected Analysis:

  • Signature: "FILE" (bytes 00-03)
  • Update Sequence Offset: 0x0030 (bytes 04-05, little-endian)
  • Update Sequence Size: 0x0003 (bytes 06-07, little-endian)
  • Sequence Number: 0x0001 (bytes 10-11, little-endian)
  • Flags: 0x0001 at offset 0x16 = In use

Scoring Criteria:

  • 5 points: Identifies all fields correctly with offset references
  • 3-4 points: Identifies most fields, minor errors in interpretation
  • 1-2 points: Recognizes MFT but misses key fields
  • 0 points: Cannot identify as MFT entry

Test: forensics_mft_02 - MFT Entry Analysis (Advanced)

Purpose: Deep understanding of MFT structure

Additional Concepts:

  • Update Sequence Array (USA): Anti-corruption mechanism
  • $LogFile Sequence Number (LSN): Transaction logging
  • First Attribute Offset: Where attribute records begin
  • MFT Entry Flags: Bitfield indicating file properties

Key Offsets:

  • 0x00-0x03: Signature "FILE"
  • 0x04-0x05: Update Sequence Offset
  • 0x06-0x07: Update Sequence Size
  • 0x08-0x0F: $LogFile Sequence Number (LSN, 64-bit)
  • 0x10-0x11: Sequence Number
  • 0x14-0x15: First Attribute Offset
  • 0x16-0x17: Flags (0x01=in use, 0x02=directory)

Example Analysis for LSN:

Offset 08: EA 3F 00 00 00 00 00 00
Little-endian 64-bit: 0x0000000000003FEA = 16362 decimal

Scoring Criteria:

  • 5 points: All fields correct with little-endian conversion shown
  • 3-4 points: Most fields correct, minor calculation errors
  • 1-2 points: Understands structure but significant errors
  • 0 points: Cannot parse MFT header

Test: forensics_signature_01 - File Signature Identification

Purpose: Recognition of common file magic numbers

Magic Numbers to Know:

Signature File Type Notes
FF D8 FF E0 JPEG Often followed by "JFIF"
89 50 4E 47 0D 0A 1A 0A PNG \x89PNG + line endings
25 50 44 46 PDF "%PDF" in ASCII
50 4B 03 04 ZIP "PK" headers (PKZip)
52 61 72 21 1A 07 RAR "Rar!" + markers
4D 5A EXE/DLL DOS "MZ" header
7F 45 4C 46 ELF Linux executables

Test Example:

A) FF D8 FF E0 00 10 4A 46 49 46
   → JPEG (FF D8 FF + JFIF marker)
   
B) 50 4B 03 04 14 00 06 00
   → ZIP/DOCX/XLSX (PKZip format)

Scoring Criteria:

  • 5 points: All signatures identified with explanations
  • 3-4 points: Most correct, understands concept
  • 1-2 points: Recognizes some but misses key ones
  • 0 points: Cannot identify file signatures

IT Forensics - Registry & Artifacts

Test: forensics_registry_01 - Windows Registry Hive Header

Purpose: Parse Windows Registry binary format

Key Structure:

Offset  Field                   Size
0x00    Signature "regf"        4 bytes
0x04    Primary Seq Number      4 bytes (little-endian)
0x08    Secondary Seq Number    4 bytes (little-endian)
0x0C    Timestamp              8 bytes (FILETIME)
0x14    Major Version          4 bytes
0x18    Minor Version          4 bytes

Example:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000  72 65 67 66 E6 07 00 00 E6 07 00 00 00 00 00 00

Analysis:
- Signature: "regf" (72 65 67 66)
- Primary Seq: 0x000007E6 = 2022 decimal
- Secondary Seq: 0x000007E6 = 2022 decimal

Scoring Criteria:

  • 5 points: Correct parsing with endianness consideration
  • 3-4 points: Identifies structure, minor errors
  • 1-2 points: Recognizes registry but inaccurate parsing
  • 0 points: Cannot identify registry hive

Test: forensics_timestamp_01 - FILETIME Conversion

Purpose: Convert Windows timestamps to human-readable format

FILETIME Format:

  • 64-bit value (little-endian)
  • Counts 100-nanosecond intervals
  • Epoch: January 1, 1601 00:00:00 UTC

Conversion Process:

  1. Reverse byte order (little-endian to big-endian)
  2. Convert to decimal
  3. Divide by 10,000,000 to get seconds
  4. Add to Unix epoch conversion factor

Example:

Hex: 01 D8 93 4B 7C F3 D9 01
Reversed: 01 D9 F3 7C 4B 93 D8 01
Decimal: 133,000,000,000,000,001
Seconds: 13,300,000,000
Date: Approximately 2023-05-15 (depends on epoch calculation)

Scoring Criteria:

  • 5 points: Correct conversion with methodology explained
  • 3-4 points: Understands process, calculation errors acceptable
  • 1-2 points: Recognizes FILETIME but significant errors
  • 0 points: Cannot explain conversion

IT Forensics - Memory & Network

Test: forensics_memory_01 - Memory Artifact Identification

Purpose: Extract meaningful data from memory dumps

Key Artifacts to Identify:

  • HTTP headers (GET/POST requests)
  • Session cookies (PHPSESSID, etc.)
  • IP addresses and hostnames
  • User agents
  • Authentication tokens

Example Analysis:

GET /admin/login.php HTTP/1.1
Host: 192.168.1.100
Cookie: PHPSESSID=a3f7d8bc9e2a1d5c

Forensic Value:
- Web access to admin panel
- Target: 192.168.1.100
- Session: a3f7d8bc9e2a1d5c
- Timeline: Can correlate with web server logs

Scoring Criteria:

  • 5 points: All artifacts extracted with forensic significance explained
  • 3-4 points: Most artifacts identified, basic analysis
  • 1-2 points: Recognizes HTTP but misses key details
  • 0 points: Cannot identify artifacts

Test: forensics_network_01 - TCP Header Analysis

Purpose: Parse TCP packet headers

TCP Header Structure (first 20 bytes):

Offset  Field                Size       Notes
0-1     Source Port          16 bits    Big-endian
2-3     Destination Port     16 bits    Big-endian
4-7     Sequence Number      32 bits    Big-endian
8-11    Acknowledgment       32 bits    Big-endian
12      Data Offset+Flags    8 bits     Upper 4=offset, lower 4=reserved
13      Flags                8 bits     SYN, ACK, FIN, RST, PSH, URG
14-15   Window Size          16 bits    Big-endian
16-17   Checksum             16 bits
18-19   Urgent Pointer       16 bits

TCP Flags (byte 13):

  • 0x01: FIN
  • 0x02: SYN
  • 0x04: RST
  • 0x08: PSH
  • 0x10: ACK
  • 0x20: URG

Example:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000  C3 5E 01 BB 6B 8B 9C 41 00 00 00 00 50 02 20 00

Analysis:
- Source Port: 0xC35E = 50014
- Dest Port: 0x01BB = 443 (HTTPS)
- Sequence: 0x6B8B9C41
- Flags: 0x02 = SYN (connection initiation)
- Window: 0x2000 = 8192 bytes

Scoring Criteria:

  • 5 points: All fields correct with protocol understanding
  • 3-4 points: Most fields correct, minor errors
  • 1-2 points: Basic structure recognized, significant errors
  • 0 points: Cannot parse TCP header

IT Forensics - Timeline & Log Analysis

Test: forensics_timeline_01 - Event Reconstruction

Purpose: Correlate logs to identify attack patterns

Timeline Analysis Skills:

  1. Chronological ordering
  2. Event correlation across sources
  3. Anomaly identification
  4. Attack pattern recognition
  5. Impact assessment

Example Scenario:

14:23:15 - Admin login from 10.0.0.5 ✓ Normal
14:23:47 - Access /etc/passwd ⚠️ Suspicious (enumeration)
14:24:12 - Write shell.php to web dir 🚨 Malicious (web shell)
14:24:45 - Netcat listener on 4444 🚨 Malicious (backdoor)
14:25:01 - External connection 🚨 Compromise (C2 callback)
14:26:33 - Admin logout
14:30:00 - Failed login from external 🚨 Lateral movement attempt

Attack Pattern: Web application compromise → web shell upload → reverse shell → persistence → lateral movement

Scoring Criteria:

  • 5 points: Complete attack narrative with IOCs and recommendations
  • 3-4 points: Identifies compromise, basic timeline
  • 1-2 points: Recognizes suspicious activity, incomplete analysis
  • 0 points: Cannot identify attack pattern

🎯 Multi-Turn Conversation Tests

Test: multiturn_01 - Progressive Hex Analysis

Purpose: Maintain context across multiple exchanges while building understanding

Turn 1: File type identification from initial bytes Turn 2: Structure parsing with offset references Turn 3: Next steps and deeper analysis

Key Evaluation Points:

  • Remembers initial findings
  • Builds on previous responses
  • Shows progressive understanding
  • Maintains technical accuracy

Test: multiturn_02 - Forensic Investigation Scenario

Purpose: Simulate real investigation workflow

Stages:

  1. Initial triage (data source identification)
  2. Evidence correlation (connecting artifacts)
  3. Impact assessment (IOC identification, response planning)

Scoring Focus:

  • Logical investigation flow
  • Context retention across turns
  • Practical recommendations
  • Complete picture integration

Test: multiturn_03 - Technical Depth Building

Purpose: Progress from concept to implementation

Progression:

  1. Concept explanation (NTFS ADS)
  2. Practical application (attack scenarios)
  3. Hands-on implementation (PowerShell commands)

Expected Depth:

  • Turn 1: Clear conceptual understanding
  • Turn 2: Builds on concept with examples
  • Turn 3: Demonstrates practical application

📊 Evaluation Guidelines

Little-Endian Conversions

Always verify:

  • Byte order reversal shown
  • Decimal conversion provided
  • Offset references included

Example:

Bytes at offset 0x10: 42 00
Little-endian: 0x0042 = 66 decimal

Hex to ASCII

Common conversions to know:

  • 0x20-0x7E: Printable ASCII
  • 46 49 4C 45 = "FILE"
  • 50 4B = "PK"
  • 4D 5A = "MZ"

Forensic Significance

Always ask:

  • What does this artifact tell us?
  • How can it be used in investigation?
  • What are the limitations?
  • What other data sources confirm/refute this?

For deeper understanding:

  • NTFS Documentation (Microsoft)
  • RFC 793 (TCP)
  • File Signatures Database (Gary Kessler)
  • Windows Registry Forensics (Harlan Carvey)
  • The Art of Memory Forensics (Ligh, Case, Levy, Walters)

⚖️ Scoring Summary

Exceptional (4-5):

  • Accurate hex interpretation
  • Correct endianness handling
  • Forensic context provided
  • Clear explanations

Pass (2-3):

  • Basic accuracy
  • Some interpretation errors
  • Limited context
  • Incomplete explanations

Fail (0-1):

  • Major misinterpretations
  • No endianness consideration
  • Missing forensic value
  • Incoherent explanations