This commit is contained in:
overcuriousity 2025-10-15 12:40:56 +02:00
parent 531b8ad48b
commit f721c936e8
2 changed files with 599 additions and 354 deletions

221
README.md
View File

@ -1,87 +1,60 @@
# Browser History to Timesketch Converter # Browser History to Timesketch Converter
Converts browser history from the three major browser engines to Timesketch-compatible CSV format. Converts browser history from Firefox, Chrome, Safari, and all Chromium-based browsers to Timesketch-compatible CSV format.
## Supported Browser Engines ## Requirements
- **Gecko** - Firefox and derivatives (Waterfox, LibreWolf, etc.) - Python 3.6+
- **Chromium** - All Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi, Arc, etc.) - No external dependencies (standard library only)
- **WebKit** - Safari
## Why Only Three Types?
All Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi, etc.) use **identical database schemas**. There's no need to handle them differently - they all use the same History database format with the same table structures and timestamp formats. The only difference is the file location, which you provide as input.
Similarly, all Gecko-based browsers (Firefox forks) use the same places.sqlite format.
## Usage ## Usage
### Simple (Auto-detect browser type)
```bash ```bash
python browser2timesketch.py -b <browser_engine> -i <database_path> -o <output.csv> python browser2timesketch.py -i <database_path>
``` ```
### Arguments ### With Options
```bash
python browser2timesketch.py [OPTIONS] -i <database_path>
```
- `-b, --browser`: Browser engine type ## Command-Line Arguments
- `firefox` or `gecko` - For Firefox and Firefox-based browsers
- `chromium` - For all Chromium-based browsers
- `safari` or `webkit` - For Safari
- `-i, --input`: Path to browser history database file
- `-o, --output`: Output CSV file path (optional, default: browser_history_timesketch.csv)
- `--browser-name`: Custom browser name for the data_type field (optional)
## Database File Locations | Argument | Required | Description |
|----------|----------|-------------|
| `-i`, `--input` | Yes | Path to browser history database file |
| `-b`, `--browser` | No | Browser type: `firefox`, `chromium`, `safari`, or `auto` (default: auto) |
| `-o`, `--output` | No | Output CSV file path (default: auto-generated) |
| `--browser-name` | No | Custom browser name for data_type field (e.g., "Brave", "Edge") |
### How to Find Your Profile Path ## Finding Browser Database Files
### Firefox (all platforms)
#### Gecko / Firefox
1. Open Firefox 1. Open Firefox
2. Type `about:support` in the address bar and press Enter 2. Type `about:support` in address bar
3. Look for **Profile Folder** or **Profile Directory** 3. Look for **Profile Folder** or **Profile Directory**
4. Click "Open Folder" / "Open Directory" button, or note the path shown 4. Click **Open Folder** button
5. The `places.sqlite` file is in this directory 5. Find `places.sqlite` in that folder
Alternative: Type `about:profiles` to see all profiles and their locations.
#### Chromium (Chrome/Edge/Brave/Opera/Vivaldi/etc.)
1. Open your Chromium-based browser
2. Type `chrome://version/` in the address bar and press Enter
3. Look for **Profile Path** - this shows the full path to your profile directory
4. The `History` file (no extension) is in this directory
Note: For browsers based on Chromium, use the same URL even if it's not Chrome:
- Edge: `edge://version/`
- Brave: `brave://version/`
- Opera: `opera://about/`
- Vivaldi: `vivaldi://about/`
#### WebKit / Safari
Safari's history database is always at the same location on macOS:
`~/Library/Safari/History.db`
To view in Finder:
1. Open Finder
2. Press `Cmd + Shift + G` (Go to Folder)
3. Type `~/Library/Safari/`
4. Press Enter
### Standard Profile Locations
If you prefer to navigate directly to the standard locations:
### Gecko / Firefox
**Database file:** `places.sqlite`
**Standard locations:**
- **Linux:** `~/.mozilla/firefox/<profile>/places.sqlite` - **Linux:** `~/.mozilla/firefox/<profile>/places.sqlite`
- **macOS:** `~/Library/Application Support/Firefox/Profiles/<profile>/places.sqlite` - **macOS:** `~/Library/Application Support/Firefox/Profiles/<profile>/places.sqlite`
- **Windows:** `%APPDATA%\Mozilla\Firefox\Profiles\<profile>\places.sqlite` - **Windows:** `%APPDATA%\Mozilla\Firefox\Profiles\<profile>\places.sqlite`
### Chromium (Chrome/Edge/Brave/Opera/Vivaldi/etc.) ### Chrome, Edge, Brave, Opera, Vivaldi (all Chromium browsers)
**Database file:** `History` (no file extension) 1. Open your browser
2. Type `chrome://version/` in address bar
- For Edge: `edge://version/`
- For Brave: `brave://version/`
- For Opera: `opera://about/`
- For Vivaldi: `vivaldi://about/`
3. Look for **Profile Path**
4. Find `History` file (no extension) in that folder
All Chromium browsers use the same database format. Only the location differs: **Standard locations:**
**Google Chrome:** **Google Chrome:**
- **Linux:** `~/.config/google-chrome/Default/History` - **Linux:** `~/.config/google-chrome/Default/History`
@ -108,124 +81,38 @@ All Chromium browsers use the same database format. Only the location differs:
- **macOS:** `~/Library/Application Support/Vivaldi/Default/History` - **macOS:** `~/Library/Application Support/Vivaldi/Default/History`
- **Windows:** `%LOCALAPPDATA%\Vivaldi\User Data\Default\History` - **Windows:** `%LOCALAPPDATA%\Vivaldi\User Data\Default\History`
### WebKit / Safari ### Safari (macOS only)
**Database file:** `History.db` **Location:** `~/Library/Safari/History.db`
- **macOS:** `~/Library/Safari/History.db` **To open in Finder:**
1. Press `Cmd + Shift + G`
2. Type `~/Library/Safari/`
3. Press Enter
## Examples ## Examples
### Firefox (or any Gecko-based browser) ### Auto-detect (simplest)
```bash ```bash
# Linux python browser2timesketch.py -i ~/.mozilla/firefox/abc123.default/places.sqlite
python browser2timesketch.py -b firefox -i ~/.mozilla/firefox/xyz123.default/places.sqlite -o firefox_history.csv python browser2timesketch.py -i ~/.config/google-chrome/Default/History
python browser2timesketch.py -i ~/Library/Safari/History.db
# macOS
python browser2timesketch.py -b gecko -i "~/Library/Application Support/Firefox/Profiles/xyz123.default/places.sqlite" -o firefox_history.csv
# Windows
python browser2timesketch.py -b firefox -i "C:\Users\YourUser\AppData\Roaming\Mozilla\Firefox\Profiles\xyz123.default\places.sqlite" -o firefox_history.csv
``` ```
### Chrome (or any Chromium-based browser) ### Specify browser type
```bash ```bash
# Linux - Chrome python browser2timesketch.py -b firefox -i places.sqlite -o firefox.csv
python browser2timesketch.py -b chromium -i ~/.config/google-chrome/Default/History -o chrome_history.csv python browser2timesketch.py -b chromium -i History -o chrome.csv
python browser2timesketch.py -b safari -i History.db -o safari.csv
# macOS - Chrome
python browser2timesketch.py -b chromium -i "~/Library/Application Support/Google/Chrome/Default/History" -o chrome_history.csv
# Windows - Chrome
python browser2timesketch.py -b chromium -i "C:\Users\YourUser\AppData\Local\Google\Chrome\User Data\Default\History" -o chrome_history.csv
# Linux - Brave with custom label
python browser2timesketch.py -b chromium --browser-name "Brave" -i ~/.config/BraveSoftware/Brave-Browser/Default/History -o brave_history.csv
# Windows - Edge
python browser2timesketch.py -b chromium -i "C:\Users\YourUser\AppData\Local\Microsoft\Edge\User Data\Default\History" -o edge_history.csv
``` ```
### Safari ### With custom browser name
```bash ```bash
# macOS python browser2timesketch.py --browser-name "Brave" -i ~/.config/BraveSoftware/Brave-Browser/Default/History
python browser2timesketch.py -b safari -i ~/Library/Safari/History.db -o safari_history.csv
# Or using the webkit alias
python browser2timesketch.py -b webkit -i ~/Library/Safari/History.db -o safari_history.csv
``` ```
## Output Format ## Notes
The script generates a CSV file with Timesketch-compatible fields: - Close your browser before running to avoid database locks (or the script will use read-only mode)
- Output contains complete browsing history - handle securely
| Field | Description | All Browsers | - On Windows, use quotes around paths with spaces
|-------|-------------|--------------|
| `timestamp` | Unix timestamp in microseconds | ✓ |
| `datetime` | ISO 8601 formatted datetime | ✓ |
| `timestamp_desc` | Description of timestamp | ✓ |
| `message` | Human-readable event description | ✓ |
| `url` | The visited URL | ✓ |
| `title` | Page title | ✓ |
| `data_type` | Source identifier | ✓ |
| `visit_type` | Type of visit | Gecko, Chromium |
| `visit_duration_us` | Visit duration in microseconds | Chromium only |
| `total_visits` | Total visits to this URL | Chromium only |
| `typed_count` | Times URL was typed | Chromium only |
## Browser Engine Details
### Timestamp Formats
Each browser engine uses a different timestamp format:
- **Gecko (Firefox):** Microseconds since Unix epoch (1970-01-01 00:00:00 UTC)
- **Chromium:** Microseconds since Windows epoch (1601-01-01 00:00:00 UTC)
- **WebKit (Safari):** Seconds since Cocoa epoch (2001-01-01 00:00:00 UTC)
The script automatically converts all timestamps to Unix microseconds for Timesketch.
### Database Schemas
- **Gecko:** Uses `moz_historyvisits` and `moz_places` tables in `places.sqlite`
- **Chromium:** Uses `visits` and `urls` tables in `History` database
- **WebKit:** Uses `history_visits` and `history_items` tables in `History.db`
## Important Notes
1. **Close the browser** before running the script to avoid database lock errors
2. **Copy the database file** to a temporary location if you want to avoid potential issues
3. **Handle output carefully** - the CSV contains your complete browsing history
4. Different browsers may have multiple profiles - make sure you're pointing to the correct profile directory
5. On Windows, use quotes around paths that contain spaces
## Troubleshooting
### Database is locked
- Close the browser completely
- Copy the database file to a temporary location and run the script on the copy
### File not found
- Verify the profile directory name (the random string like `xyz123.default`)
- Check that the browser has been used and has history
- On macOS, use tab completion or check the exact path
### Permission denied
- Run with appropriate permissions
- On Linux/macOS, check file permissions with `ls -l`
- On Windows, run as Administrator if needed
## Requirements
- Python 3.6 or higher
- No external dependencies (uses only standard library)
## Privacy and Security
This tool exports your complete browsing history. The output file contains:
- All visited URLs
- Page titles
- Visit timestamps
- Visit types and patterns
Handle the output files appropriately and delete them when no longer needed.

View File

@ -9,11 +9,144 @@ Supports: Gecko (Firefox), Chromium (Chrome/Edge/Brave/etc.), WebKit (Safari)
import sqlite3 import sqlite3
import csv import csv
import argparse import argparse
import sys
from datetime import datetime, timedelta from datetime import datetime, timedelta
from pathlib import Path from pathlib import Path
from typing import Tuple, Optional, Dict, Any, List
def convert_gecko_timestamp(gecko_timestamp): class BrowserDetectionError(Exception):
"""Raised when browser type cannot be detected"""
pass
class DatabaseValidationError(Exception):
"""Raised when database validation fails"""
pass
class TimestampValidationError(Exception):
"""Raised when timestamp validation fails"""
pass
def validate_sqlite_database(db_path: str) -> None:
"""
Validate that the file is a SQLite database and is accessible.
Args:
db_path: Path to database file
Raises:
DatabaseValidationError: If validation fails
"""
path = Path(db_path)
if not path.exists():
raise DatabaseValidationError(f"Database file not found: {db_path}")
if not path.is_file():
raise DatabaseValidationError(f"Path is not a file: {db_path}")
# Try to open as SQLite database
try:
conn = sqlite3.connect(f'file:{db_path}?mode=ro', uri=True)
cursor = conn.cursor()
# Check if it's a valid SQLite database by querying sqlite_master
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' LIMIT 1")
conn.close()
except sqlite3.DatabaseError as e:
raise DatabaseValidationError(f"Not a valid SQLite database: {db_path}. Error: {e}")
except sqlite3.OperationalError as e:
raise DatabaseValidationError(f"Cannot access database (may be locked or corrupted): {db_path}. Error: {e}")
def detect_browser_type(db_path: str) -> str:
"""
Auto-detect browser type by examining database schema.
Args:
db_path: Path to database file
Returns:
Detected browser type: 'gecko', 'chromium', or 'webkit'
Raises:
BrowserDetectionError: If browser type cannot be determined
"""
try:
conn = sqlite3.connect(f'file:{db_path}?mode=ro', uri=True)
cursor = conn.cursor()
# Get all table names
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = {row[0] for row in cursor.fetchall()}
conn.close()
# Check for Gecko/Firefox tables
if 'moz_historyvisits' in tables and 'moz_places' in tables:
return 'gecko'
# Check for Chromium tables
if 'visits' in tables and 'urls' in tables:
return 'chromium'
# Check for WebKit/Safari tables
if 'history_visits' in tables and 'history_items' in tables:
return 'webkit'
raise BrowserDetectionError(
f"Cannot determine browser type. Found tables: {', '.join(sorted(tables))}\n"
f"Expected one of:\n"
f" - Gecko/Firefox: moz_historyvisits, moz_places\n"
f" - Chromium: visits, urls\n"
f" - WebKit/Safari: history_visits, history_items"
)
except sqlite3.Error as e:
raise BrowserDetectionError(f"Error reading database schema: {e}")
def validate_timestamp(unix_microseconds: int, browser_type: str) -> None:
"""
Validate that a timestamp is within reasonable bounds.
Args:
unix_microseconds: Timestamp in Unix microseconds
browser_type: Browser type for error messages
Raises:
TimestampValidationError: If timestamp is unreasonable
"""
if unix_microseconds <= 0:
return # Allow 0 for missing timestamps
# Convert to seconds for validation
timestamp_seconds = unix_microseconds / 1000000
# Check if timestamp is reasonable (between 1990 and 2040)
min_date = datetime(1990, 1, 1)
max_date = datetime(2040, 1, 1)
min_seconds = min_date.timestamp()
max_seconds = max_date.timestamp()
if timestamp_seconds < min_seconds:
dt = datetime.utcfromtimestamp(timestamp_seconds)
raise TimestampValidationError(
f"Timestamp appears too old: {dt.strftime('%Y-%m-%d %H:%M:%S')} (before 1990). "
f"This may indicate a timestamp conversion error for {browser_type}."
)
if timestamp_seconds > max_seconds:
dt = datetime.utcfromtimestamp(timestamp_seconds)
raise TimestampValidationError(
f"Timestamp appears to be in the future: {dt.strftime('%Y-%m-%d %H:%M:%S')} (after 2040). "
f"This may indicate a timestamp conversion error for {browser_type}."
)
def convert_gecko_timestamp(gecko_timestamp: Optional[int]) -> Tuple[int, str]:
""" """
Convert Gecko/Firefox timestamp (microseconds since Unix epoch) to ISO format. Convert Gecko/Firefox timestamp (microseconds since Unix epoch) to ISO format.
Firefox stores timestamps as microseconds since 1970-01-01 00:00:00 UTC. Firefox stores timestamps as microseconds since 1970-01-01 00:00:00 UTC.
@ -24,16 +157,19 @@ def convert_gecko_timestamp(gecko_timestamp):
Returns: Returns:
tuple: (microseconds, ISO formatted datetime string) tuple: (microseconds, ISO formatted datetime string)
""" """
if gecko_timestamp is None: if gecko_timestamp is None or gecko_timestamp == 0:
return 0, "" return 0, ""
# Validate
validate_timestamp(gecko_timestamp, "Gecko/Firefox")
# Convert microseconds to seconds # Convert microseconds to seconds
timestamp_seconds = gecko_timestamp / 1000000 timestamp_seconds = gecko_timestamp / 1000000
dt = datetime.utcfromtimestamp(timestamp_seconds) dt = datetime.utcfromtimestamp(timestamp_seconds)
return gecko_timestamp, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00') return gecko_timestamp, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00')
def convert_chromium_timestamp(chromium_timestamp): def convert_chromium_timestamp(chromium_timestamp: Optional[int]) -> Tuple[int, str]:
""" """
Convert Chromium timestamp to Unix microseconds and ISO format. Convert Chromium timestamp to Unix microseconds and ISO format.
Chromium stores timestamps as microseconds since 1601-01-01 00:00:00 UTC (Windows epoch). Chromium stores timestamps as microseconds since 1601-01-01 00:00:00 UTC (Windows epoch).
@ -58,11 +194,14 @@ def convert_chromium_timestamp(chromium_timestamp):
# Convert to Unix microseconds for Timesketch # Convert to Unix microseconds for Timesketch
unix_microseconds = int(timestamp_seconds * 1000000) unix_microseconds = int(timestamp_seconds * 1000000)
# Validate
validate_timestamp(unix_microseconds, "Chromium")
dt = datetime.utcfromtimestamp(timestamp_seconds) dt = datetime.utcfromtimestamp(timestamp_seconds)
return unix_microseconds, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00') return unix_microseconds, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00')
def convert_webkit_timestamp(webkit_timestamp): def convert_webkit_timestamp(webkit_timestamp: Optional[float]) -> Tuple[int, str]:
""" """
Convert WebKit/Safari timestamp to Unix microseconds and ISO format. Convert WebKit/Safari timestamp to Unix microseconds and ISO format.
Safari stores timestamps as seconds (with decimal) since 2001-01-01 00:00:00 UTC (Cocoa/Core Data epoch). Safari stores timestamps as seconds (with decimal) since 2001-01-01 00:00:00 UTC (Cocoa/Core Data epoch).
@ -87,11 +226,82 @@ def convert_webkit_timestamp(webkit_timestamp):
# Convert to Unix microseconds for Timesketch # Convert to Unix microseconds for Timesketch
unix_microseconds = int(timestamp_seconds * 1000000) unix_microseconds = int(timestamp_seconds * 1000000)
# Validate
validate_timestamp(unix_microseconds, "WebKit/Safari")
dt = datetime.utcfromtimestamp(timestamp_seconds) dt = datetime.utcfromtimestamp(timestamp_seconds)
return unix_microseconds, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00') return unix_microseconds, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00')
def extract_chromium_history(db_path, output_csv, browser_name=None): def write_timesketch_csv(output_csv: str, fieldnames: List[str], rows: List[Dict[str, Any]]) -> None:
"""
Write history data to Timesketch-compatible CSV format.
Args:
output_csv: Path to output CSV file
fieldnames: List of CSV field names
rows: List of row dictionaries to write
"""
with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in rows:
writer.writerow(row)
def connect_database_readonly(db_path: str) -> sqlite3.Connection:
"""
Connect to database in read-only mode to avoid lock issues.
Args:
db_path: Path to database file
Returns:
SQLite connection object
Raises:
sqlite3.OperationalError: If database is locked or inaccessible
"""
try:
# Use URI with read-only mode to avoid locking issues
conn = sqlite3.connect(f'file:{db_path}?mode=ro', uri=True)
return conn
except sqlite3.OperationalError as e:
raise sqlite3.OperationalError(
f"Cannot open database (it may be locked by the browser): {db_path}\n"
f"Please close {db_path.split('/')[-2] if '/' in db_path else 'the browser'} "
f"and try again, or copy the database file to a temporary location.\n"
f"Original error: {e}"
)
def validate_browser_schema(conn: sqlite3.Connection, expected_tables: List[str], browser_name: str) -> None:
"""
Validate that required tables exist in the database.
Args:
conn: Database connection
expected_tables: List of required table names
browser_name: Browser name for error messages
Raises:
DatabaseValidationError: If required tables are missing
"""
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
existing_tables = {row[0] for row in cursor.fetchall()}
missing_tables = set(expected_tables) - existing_tables
if missing_tables:
raise DatabaseValidationError(
f"Database does not appear to be a valid {browser_name} history database.\n"
f"Missing required tables: {', '.join(missing_tables)}\n"
f"Found tables: {', '.join(sorted(existing_tables))}"
)
def extract_chromium_history(db_path: str, output_csv: str, browser_name: Optional[str] = None) -> int:
""" """
Extract browser history from Chromium-based browsers and convert to Timesketch CSV. Extract browser history from Chromium-based browsers and convert to Timesketch CSV.
Works with all Chromium-based browsers: Chrome, Edge, Brave, Chromium, Opera, Vivaldi, etc. Works with all Chromium-based browsers: Chrome, Edge, Brave, Chromium, Opera, Vivaldi, etc.
@ -100,17 +310,23 @@ def extract_chromium_history(db_path, output_csv, browser_name=None):
db_path: Path to Chromium History database db_path: Path to Chromium History database
output_csv: Path to output CSV file output_csv: Path to output CSV file
browser_name: Optional custom name for data_type field (default: "Chromium") browser_name: Optional custom name for data_type field (default: "Chromium")
"""
Returns:
Number of entries processed
Raises:
DatabaseValidationError: If database validation fails
sqlite3.Error: If database query fails
"""
if browser_name is None: if browser_name is None:
browser_name = "Chromium" browser_name = "Chromium"
# Check if database exists # Connect to database
if not Path(db_path).exists(): conn = connect_database_readonly(db_path)
raise FileNotFoundError(f"Chromium database not found: {db_path}")
# Validate schema
validate_browser_schema(conn, ['visits', 'urls'], browser_name)
# Connect to Chromium SQLite database
conn = sqlite3.connect(db_path)
cursor = conn.cursor() cursor = conn.cursor()
# Query to extract history visits with URL information # Query to extract history visits with URL information
@ -129,11 +345,14 @@ def extract_chromium_history(db_path, output_csv, browser_name=None):
ORDER BY visits.visit_time ORDER BY visits.visit_time
""" """
cursor.execute(query) try:
results = cursor.fetchall() cursor.execute(query)
results = cursor.fetchall()
except sqlite3.Error as e:
conn.close()
raise sqlite3.Error(f"Error querying {browser_name} history: {e}")
# Transition type mapping (Chromium transition types) # Transition type mapping (Chromium transition types)
# Core types (bits 0-7)
transition_types = { transition_types = {
0: "Link", 0: "Link",
1: "Typed", 1: "Typed",
@ -148,66 +367,72 @@ def extract_chromium_history(db_path, output_csv, browser_name=None):
10: "Keyword_Generated" 10: "Keyword_Generated"
} }
# Write to Timesketch CSV format rows = []
with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile: validation_errors = []
fieldnames = [
'timestamp',
'datetime',
'timestamp_desc',
'message',
'url',
'title',
'visit_type',
'visit_duration_us',
'total_visits',
'typed_count',
'data_type'
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames) for idx, row in enumerate(results):
writer.writeheader() chromium_timestamp = row[0]
url = row[1] or ""
title = row[2] or "(No title)"
transition = row[3]
visit_duration = row[4] or 0
visit_count = row[5] or 0
typed_count = row[6] or 0
for row in results: # Extract core transition type (lower 8 bits)
chromium_timestamp = row[0] core_transition = transition & 0xFF
url = row[1] or "" transition_name = transition_types.get(core_transition, f"Unknown({core_transition})")
title = row[2] or "(No title)"
transition = row[3]
visit_duration = row[4] or 0
visit_count = row[5] or 0
typed_count = row[6] or 0
last_visit = row[7]
# Extract core transition type (lower 8 bits) # Convert timestamp
core_transition = transition & 0xFF try:
transition_name = transition_types.get(core_transition, f"Unknown({core_transition})")
# Convert timestamp
unix_microseconds, iso_datetime = convert_chromium_timestamp(chromium_timestamp) unix_microseconds, iso_datetime = convert_chromium_timestamp(chromium_timestamp)
except TimestampValidationError as e:
validation_errors.append(f"Entry {idx + 1}: {e}")
if len(validation_errors) <= 3: # Only store first few errors
continue
else:
break # Too many errors, likely a systematic issue
# Construct message # Construct message
message = f"Visited: {title}" message = f"Visited: {title}"
writer.writerow({ rows.append({
'timestamp': unix_microseconds, 'timestamp': unix_microseconds,
'datetime': iso_datetime, 'datetime': iso_datetime,
'timestamp_desc': 'Visit Time', 'timestamp_desc': 'Visit Time',
'message': message, 'message': message,
'url': url, 'url': url,
'title': title, 'title': title,
'visit_type': transition_name, 'visit_type': transition_name,
'visit_duration_us': visit_duration, 'visit_duration_us': visit_duration,
'total_visits': visit_count, 'total_visits': visit_count,
'typed_count': typed_count, 'typed_count': typed_count,
'data_type': f'{browser_name.lower()}:history:visit' 'data_type': f'{browser_name.lower()}:history:visit'
}) })
conn.close() conn.close()
print(f"Successfully converted {len(results)} history entries from {browser_name}") # Report validation errors if any
print(f"Output saved to: {output_csv}") if validation_errors:
print(f"Warning: Found {len(validation_errors)} timestamp validation errors:", file=sys.stderr)
for error in validation_errors[:3]:
print(f" {error}", file=sys.stderr)
if len(validation_errors) > 3:
print(f" ... and {len(validation_errors) - 3} more errors", file=sys.stderr)
print(f"Continuing with {len(rows)} valid entries...", file=sys.stderr)
# Write CSV
fieldnames = [
'timestamp', 'datetime', 'timestamp_desc', 'message',
'url', 'title', 'visit_type', 'visit_duration_us',
'total_visits', 'typed_count', 'data_type'
]
write_timesketch_csv(output_csv, fieldnames, rows)
return len(rows)
def extract_gecko_history(db_path, output_csv, browser_name=None): def extract_gecko_history(db_path: str, output_csv: str, browser_name: Optional[str] = None) -> int:
""" """
Extract browser history from Gecko-based browsers (Firefox) and convert to Timesketch CSV. Extract browser history from Gecko-based browsers (Firefox) and convert to Timesketch CSV.
Works with Firefox and Firefox derivatives (Waterfox, LibreWolf, etc.) Works with Firefox and Firefox derivatives (Waterfox, LibreWolf, etc.)
@ -216,17 +441,23 @@ def extract_gecko_history(db_path, output_csv, browser_name=None):
db_path: Path to Gecko places.sqlite database db_path: Path to Gecko places.sqlite database
output_csv: Path to output CSV file output_csv: Path to output CSV file
browser_name: Optional custom name for data_type field (default: "Firefox") browser_name: Optional custom name for data_type field (default: "Firefox")
"""
Returns:
Number of entries processed
Raises:
DatabaseValidationError: If database validation fails
sqlite3.Error: If database query fails
"""
if browser_name is None: if browser_name is None:
browser_name = "Firefox" browser_name = "Firefox"
# Check if database exists # Connect to database
if not Path(db_path).exists(): conn = connect_database_readonly(db_path)
raise FileNotFoundError(f"Gecko database not found: {db_path}")
# Validate schema
validate_browser_schema(conn, ['moz_historyvisits', 'moz_places'], browser_name)
# Connect to Firefox SQLite database
conn = sqlite3.connect(db_path)
cursor = conn.cursor() cursor = conn.cursor()
# Query to extract history visits with URL information # Query to extract history visits with URL information
@ -243,8 +474,12 @@ def extract_gecko_history(db_path, output_csv, browser_name=None):
ORDER BY moz_historyvisits.visit_date ORDER BY moz_historyvisits.visit_date
""" """
cursor.execute(query) try:
results = cursor.fetchall() cursor.execute(query)
results = cursor.fetchall()
except sqlite3.Error as e:
conn.close()
raise sqlite3.Error(f"Error querying {browser_name} history: {e}")
# Visit type mapping (Firefox visit types) # Visit type mapping (Firefox visit types)
visit_types = { visit_types = {
@ -259,59 +494,66 @@ def extract_gecko_history(db_path, output_csv, browser_name=None):
9: "Reload" 9: "Reload"
} }
# Write to Timesketch CSV format rows = []
with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile: validation_errors = []
# Timesketch expected fields
fieldnames = [
'timestamp',
'datetime',
'timestamp_desc',
'message',
'url',
'title',
'visit_type',
'data_type'
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames) for idx, row in enumerate(results):
writer.writeheader() timestamp_us = row[0]
url = row[1] or ""
title = row[2] or "(No title)"
description = row[3] or ""
visit_type_id = row[4]
for row in results: visit_type_name = visit_types.get(visit_type_id, f"Unknown({visit_type_id})")
timestamp_us = row[0] # Firefox timestamp in microseconds
url = row[1] or ""
title = row[2] or "(No title)"
description = row[3] or ""
visit_type_id = row[4]
from_visit = row[5]
visit_type_name = visit_types.get(visit_type_id, f"Unknown({visit_type_id})") # Convert timestamp
try:
# Convert timestamp
unix_microseconds, iso_datetime = convert_gecko_timestamp(timestamp_us) unix_microseconds, iso_datetime = convert_gecko_timestamp(timestamp_us)
except TimestampValidationError as e:
validation_errors.append(f"Entry {idx + 1}: {e}")
if len(validation_errors) <= 3:
continue
else:
break
# Construct message # Construct message
message = f"Visited: {title}" message = f"Visited: {title}"
if description: if description:
message += f" - {description}" message += f" - {description}"
writer.writerow({ rows.append({
'timestamp': unix_microseconds, 'timestamp': unix_microseconds,
'datetime': iso_datetime, 'datetime': iso_datetime,
'timestamp_desc': 'Visit Time', 'timestamp_desc': 'Visit Time',
'message': message, 'message': message,
'url': url, 'url': url,
'title': title, 'title': title,
'visit_type': visit_type_name, 'visit_type': visit_type_name,
'data_type': f'{browser_name.lower()}:history:visit' 'data_type': f'{browser_name.lower()}:history:visit'
}) })
conn.close() conn.close()
print(f"Successfully converted {len(results)} history entries from {browser_name}") # Report validation errors if any
print(f"Output saved to: {output_csv}") if validation_errors:
print(f"Warning: Found {len(validation_errors)} timestamp validation errors:", file=sys.stderr)
for error in validation_errors[:3]:
print(f" {error}", file=sys.stderr)
if len(validation_errors) > 3:
print(f" ... and {len(validation_errors) - 3} more errors", file=sys.stderr)
print(f"Continuing with {len(rows)} valid entries...", file=sys.stderr)
# Write CSV
fieldnames = [
'timestamp', 'datetime', 'timestamp_desc', 'message',
'url', 'title', 'visit_type', 'data_type'
]
write_timesketch_csv(output_csv, fieldnames, rows)
return len(rows)
def extract_webkit_history(db_path, output_csv, browser_name=None): def extract_webkit_history(db_path: str, output_csv: str, browser_name: Optional[str] = None) -> int:
""" """
Extract browser history from WebKit-based browsers (Safari) and convert to Timesketch CSV. Extract browser history from WebKit-based browsers (Safari) and convert to Timesketch CSV.
@ -319,17 +561,23 @@ def extract_webkit_history(db_path, output_csv, browser_name=None):
db_path: Path to Safari History.db database db_path: Path to Safari History.db database
output_csv: Path to output CSV file output_csv: Path to output CSV file
browser_name: Optional custom name for data_type field (default: "Safari") browser_name: Optional custom name for data_type field (default: "Safari")
"""
Returns:
Number of entries processed
Raises:
DatabaseValidationError: If database validation fails
sqlite3.Error: If database query fails
"""
if browser_name is None: if browser_name is None:
browser_name = "Safari" browser_name = "Safari"
# Check if database exists # Connect to database
if not Path(db_path).exists(): conn = connect_database_readonly(db_path)
raise FileNotFoundError(f"WebKit database not found: {db_path}")
# Validate schema
validate_browser_schema(conn, ['history_visits', 'history_items'], browser_name)
# Connect to Safari SQLite database
conn = sqlite3.connect(db_path)
cursor = conn.cursor() cursor = conn.cursor()
# Query to extract history visits with URL information # Query to extract history visits with URL information
@ -344,52 +592,102 @@ def extract_webkit_history(db_path, output_csv, browser_name=None):
ORDER BY history_visits.visit_time ORDER BY history_visits.visit_time
""" """
cursor.execute(query) try:
results = cursor.fetchall() cursor.execute(query)
results = cursor.fetchall()
except sqlite3.Error as e:
conn.close()
raise sqlite3.Error(f"Error querying {browser_name} history: {e}")
# Write to Timesketch CSV format rows = []
with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile: validation_errors = []
fieldnames = [
'timestamp',
'datetime',
'timestamp_desc',
'message',
'url',
'title',
'data_type'
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames) for idx, row in enumerate(results):
writer.writeheader() webkit_timestamp = row[0]
url = row[1] or ""
title = row[2] or row[3] or "(No title)"
for row in results: # Convert timestamp
webkit_timestamp = row[0] try:
url = row[1] or ""
title = row[2] or row[3] or "(No title)" # Use visit_title as fallback
# Convert timestamp
unix_microseconds, iso_datetime = convert_webkit_timestamp(webkit_timestamp) unix_microseconds, iso_datetime = convert_webkit_timestamp(webkit_timestamp)
except TimestampValidationError as e:
validation_errors.append(f"Entry {idx + 1}: {e}")
if len(validation_errors) <= 3:
continue
else:
break
# Construct message # Construct message
message = f"Visited: {title}" message = f"Visited: {title}"
writer.writerow({ rows.append({
'timestamp': unix_microseconds, 'timestamp': unix_microseconds,
'datetime': iso_datetime, 'datetime': iso_datetime,
'timestamp_desc': 'Visit Time', 'timestamp_desc': 'Visit Time',
'message': message, 'message': message,
'url': url, 'url': url,
'title': title, 'title': title,
'data_type': f'{browser_name.lower()}:history:visit' 'data_type': f'{browser_name.lower()}:history:visit'
}) })
conn.close() conn.close()
print(f"Successfully converted {len(results)} history entries from {browser_name}") # Report validation errors if any
print(f"Output saved to: {output_csv}") if validation_errors:
print(f"Warning: Found {len(validation_errors)} timestamp validation errors:", file=sys.stderr)
for error in validation_errors[:3]:
print(f" {error}", file=sys.stderr)
if len(validation_errors) > 3:
print(f" ... and {len(validation_errors) - 3} more errors", file=sys.stderr)
print(f"Continuing with {len(rows)} valid entries...", file=sys.stderr)
# Write CSV
fieldnames = [
'timestamp', 'datetime', 'timestamp_desc', 'message',
'url', 'title', 'data_type'
]
write_timesketch_csv(output_csv, fieldnames, rows)
return len(rows)
def main(): def generate_default_output_filename(browser_type: str, input_path: str) -> str:
"""
Generate a sensible default output filename based on browser type and input.
Args:
browser_type: Browser type (gecko, chromium, webkit)
input_path: Input database path
Returns:
Generated output filename
"""
# Extract browser name from path if possible
path_lower = input_path.lower()
browser_names = {
'firefox': 'firefox',
'chrome': 'chrome',
'edge': 'edge',
'brave': 'brave',
'opera': 'opera',
'vivaldi': 'vivaldi',
'safari': 'safari',
}
detected_name = None
for name_key, name_value in browser_names.items():
if name_key in path_lower:
detected_name = name_value
break
if detected_name:
return f"{detected_name}_history_timesketch.csv"
else:
return f"{browser_type}_history_timesketch.csv"
def main() -> int:
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description='Convert browser history to Timesketch CSV format', description='Convert browser history to Timesketch CSV format',
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
@ -398,6 +696,7 @@ Browser Engine Types:
gecko, firefox - Gecko-based browsers (Firefox, Waterfox, LibreWolf, etc.) gecko, firefox - Gecko-based browsers (Firefox, Waterfox, LibreWolf, etc.)
chromium - Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi, etc.) chromium - Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi, etc.)
webkit, safari - WebKit-based browsers (Safari) webkit, safari - WebKit-based browsers (Safari)
auto - Auto-detect browser type from database schema
All Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi) use identical database All Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi) use identical database
schemas and can be processed with the "chromium" option. Use --browser-name to customize schemas and can be processed with the "chromium" option. Use --browser-name to customize
@ -420,17 +719,20 @@ HOW TO FIND YOUR PROFILE PATH:
Always at: ~/Library/Safari/History.db Always at: ~/Library/Safari/History.db
Example usage: Example usage:
# Firefox # Auto-detect browser type
python browser_to_timesketch.py -b firefox -i ~/.mozilla/firefox/xyz.default/places.sqlite -o output.csv python browser2timesketch.py -i ~/.mozilla/firefox/xyz.default/places.sqlite
# Firefox with custom output
python browser2timesketch.py -b firefox -i ~/.mozilla/firefox/xyz.default/places.sqlite -o firefox.csv
# Any Chromium browser (Chrome, Edge, Brave, etc.) # Any Chromium browser (Chrome, Edge, Brave, etc.)
python browser_to_timesketch.py -b chromium -i ~/.config/google-chrome/Default/History -o output.csv python browser2timesketch.py -b chromium -i ~/.config/google-chrome/Default/History -o output.csv
# Chromium browser with custom label # Chromium browser with custom label
python browser_to_timesketch.py -b chromium --browser-name "Brave" -i ~/.config/BraveSoftware/Brave-Browser/Default/History -o output.csv python browser2timesketch.py -b chromium --browser-name "Brave" -i ~/.config/BraveSoftware/Brave-Browser/Default/History
# Safari (macOS) # Safari (macOS)
python browser_to_timesketch.py -b safari -i ~/Library/Safari/History.db -o output.csv python browser2timesketch.py -b safari -i ~/Library/Safari/History.db
Database Locations: Database Locations:
Gecko/Firefox: Gecko/Firefox:
@ -453,16 +755,16 @@ Database Locations:
WebKit/Safari: WebKit/Safari:
macOS: ~/Library/Safari/History.db macOS: ~/Library/Safari/History.db
Note: Close the browser before running this script to avoid database lock issues. Note: The script uses read-only mode to avoid database lock issues, but closing
You may want to copy the database file to a temporary location first. the browser is still recommended for best results.
""" """
) )
parser.add_argument( parser.add_argument(
'-b', '--browser', '-b', '--browser',
required=True, choices=['gecko', 'firefox', 'chromium', 'webkit', 'safari', 'auto'],
choices=['gecko', 'firefox', 'chromium', 'webkit', 'safari'], default='auto',
help='Browser engine type (firefox and gecko are aliases, safari and webkit are aliases)' help='Browser engine type (default: auto-detect)'
) )
parser.add_argument( parser.add_argument(
@ -473,37 +775,93 @@ You may want to copy the database file to a temporary location first.
parser.add_argument( parser.add_argument(
'-o', '--output', '-o', '--output',
default='browser_history_timesketch.csv', help='Output CSV file path (default: auto-generated based on browser type)'
help='Output CSV file path (default: browser_history_timesketch.csv)'
) )
parser.add_argument( parser.add_argument(
'--browser-name', '--browser-name',
default=None,
help='Custom browser name for the data_type field (e.g., "Chrome", "Brave", "Edge")' help='Custom browser name for the data_type field (e.g., "Chrome", "Brave", "Edge")'
) )
args = parser.parse_args() args = parser.parse_args()
try: try:
# Normalize browser type # Validate database file
print(f"Validating database: {args.input}")
validate_sqlite_database(args.input)
print("✓ Database is valid SQLite file")
# Detect or validate browser type
browser_type = args.browser.lower() browser_type = args.browser.lower()
if browser_type in ['gecko', 'firefox']: if browser_type == 'auto':
extract_gecko_history(args.input, args.output, args.browser_name) print("Auto-detecting browser type...")
elif browser_type == 'chromium': browser_type = detect_browser_type(args.input)
extract_chromium_history(args.input, args.output, args.browser_name) print(f"✓ Detected browser type: {browser_type}")
elif browser_type in ['webkit', 'safari']: else:
extract_webkit_history(args.input, args.output, args.browser_name) # Normalize aliases
if browser_type == 'firefox':
browser_type = 'gecko'
elif browser_type == 'safari':
browser_type = 'webkit'
# Validate that the database matches the specified type
detected_type = detect_browser_type(args.input)
if detected_type != browser_type:
print(f"Warning: You specified '{args.browser}' but database appears to be '{detected_type}'",
file=sys.stderr)
response = input("Continue anyway? [y/N]: ")
if response.lower() != 'y':
return 1
# Generate output filename if not provided
if args.output:
output_csv = args.output
else:
output_csv = generate_default_output_filename(browser_type, args.input)
print(f"Using output filename: {output_csv}")
# Extract history based on browser type
print(f"Extracting history from {browser_type} database...")
if browser_type == 'gecko':
num_entries = extract_gecko_history(args.input, output_csv, args.browser_name)
elif browser_type == 'chromium':
num_entries = extract_chromium_history(args.input, output_csv, args.browser_name)
elif browser_type == 'webkit':
num_entries = extract_webkit_history(args.input, output_csv, args.browser_name)
else:
raise ValueError(f"Unknown browser type: {browser_type}")
print(f"\n✓ Successfully converted {num_entries} history entries")
print(f"✓ Output saved to: {output_csv}")
return 0
except DatabaseValidationError as e:
print(f"Database Validation Error: {e}", file=sys.stderr)
return 1
except BrowserDetectionError as e:
print(f"Browser Detection Error: {e}", file=sys.stderr)
return 1
except TimestampValidationError as e:
print(f"Timestamp Validation Error: {e}", file=sys.stderr)
print("This indicates a systematic problem with timestamp conversion.", file=sys.stderr)
return 1
except sqlite3.Error as e:
print(f"Database Error: {e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"File Error: {e}", file=sys.stderr)
return 1
except KeyboardInterrupt:
print("\nOperation cancelled by user", file=sys.stderr)
return 130
except Exception as e: except Exception as e:
print(f"Error: {e}") print(f"Unexpected Error: {e}", file=sys.stderr)
import traceback import traceback
traceback.print_exc() traceback.print_exc()
return 1 return 1
return 0
if __name__ == "__main__": if __name__ == "__main__":
exit(main()) sys.exit(main())