diff --git a/README.md b/README.md index f970275..f84f216 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,231 @@ -# browser2timesketch +# Browser History to Timesketch Converter +Converts browser history from the three major browser engines to Timesketch-compatible CSV format. + +## Supported Browser Engines + +- **Gecko** - Firefox and derivatives (Waterfox, LibreWolf, etc.) +- **Chromium** - All Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi, Arc, etc.) +- **WebKit** - Safari + +## Why Only Three Types? + +All Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi, etc.) use **identical database schemas**. There's no need to handle them differently - they all use the same History database format with the same table structures and timestamp formats. The only difference is the file location, which you provide as input. + +Similarly, all Gecko-based browsers (Firefox forks) use the same places.sqlite format. + +## Usage + +```bash +python browser2timesketch.py -b -i -o +``` + +### Arguments + +- `-b, --browser`: Browser engine type + - `firefox` or `gecko` - For Firefox and Firefox-based browsers + - `chromium` - For all Chromium-based browsers + - `safari` or `webkit` - For Safari +- `-i, --input`: Path to browser history database file +- `-o, --output`: Output CSV file path (optional, default: browser_history_timesketch.csv) +- `--browser-name`: Custom browser name for the data_type field (optional) + +## Database File Locations + +### How to Find Your Profile Path + +#### Gecko / Firefox +1. Open Firefox +2. Type `about:support` in the address bar and press Enter +3. Look for **Profile Folder** or **Profile Directory** +4. Click "Open Folder" / "Open Directory" button, or note the path shown +5. The `places.sqlite` file is in this directory + +Alternative: Type `about:profiles` to see all profiles and their locations. + +#### Chromium (Chrome/Edge/Brave/Opera/Vivaldi/etc.) +1. Open your Chromium-based browser +2. Type `chrome://version/` in the address bar and press Enter +3. Look for **Profile Path** - this shows the full path to your profile directory +4. The `History` file (no extension) is in this directory + +Note: For browsers based on Chromium, use the same URL even if it's not Chrome: +- Edge: `edge://version/` +- Brave: `brave://version/` +- Opera: `opera://about/` +- Vivaldi: `vivaldi://about/` + +#### WebKit / Safari +Safari's history database is always at the same location on macOS: +`~/Library/Safari/History.db` + +To view in Finder: +1. Open Finder +2. Press `Cmd + Shift + G` (Go to Folder) +3. Type `~/Library/Safari/` +4. Press Enter + +### Standard Profile Locations + +If you prefer to navigate directly to the standard locations: + +### Gecko / Firefox + +**Database file:** `places.sqlite` + +- **Linux:** `~/.mozilla/firefox//places.sqlite` +- **macOS:** `~/Library/Application Support/Firefox/Profiles//places.sqlite` +- **Windows:** `%APPDATA%\Mozilla\Firefox\Profiles\\places.sqlite` + +### Chromium (Chrome/Edge/Brave/Opera/Vivaldi/etc.) + +**Database file:** `History` (no file extension) + +All Chromium browsers use the same database format. Only the location differs: + +**Google Chrome:** +- **Linux:** `~/.config/google-chrome/Default/History` +- **macOS:** `~/Library/Application Support/Google/Chrome/Default/History` +- **Windows:** `%LOCALAPPDATA%\Google\Chrome\User Data\Default\History` + +**Microsoft Edge:** +- **Linux:** `~/.config/microsoft-edge/Default/History` +- **macOS:** `~/Library/Application Support/Microsoft Edge/Default/History` +- **Windows:** `%LOCALAPPDATA%\Microsoft\Edge\User Data\Default\History` + +**Brave:** +- **Linux:** `~/.config/BraveSoftware/Brave-Browser/Default/History` +- **macOS:** `~/Library/Application Support/BraveSoftware/Brave-Browser/Default/History` +- **Windows:** `%LOCALAPPDATA%\BraveSoftware\Brave-Browser\User Data\Default\History` + +**Opera:** +- **Linux:** `~/.config/opera/Default/History` +- **macOS:** `~/Library/Application Support/com.operasoftware.Opera/History` +- **Windows:** `%APPDATA%\Opera Software\Opera Stable\History` + +**Vivaldi:** +- **Linux:** `~/.config/vivaldi/Default/History` +- **macOS:** `~/Library/Application Support/Vivaldi/Default/History` +- **Windows:** `%LOCALAPPDATA%\Vivaldi\User Data\Default\History` + +### WebKit / Safari + +**Database file:** `History.db` + +- **macOS:** `~/Library/Safari/History.db` + +## Examples + +### Firefox (or any Gecko-based browser) +```bash +# Linux +python browser2timesketch.py -b firefox -i ~/.mozilla/firefox/xyz123.default/places.sqlite -o firefox_history.csv + +# macOS +python browser2timesketch.py -b gecko -i "~/Library/Application Support/Firefox/Profiles/xyz123.default/places.sqlite" -o firefox_history.csv + +# Windows +python browser2timesketch.py -b firefox -i "C:\Users\YourUser\AppData\Roaming\Mozilla\Firefox\Profiles\xyz123.default\places.sqlite" -o firefox_history.csv +``` + +### Chrome (or any Chromium-based browser) +```bash +# Linux - Chrome +python browser2timesketch.py -b chromium -i ~/.config/google-chrome/Default/History -o chrome_history.csv + +# macOS - Chrome +python browser2timesketch.py -b chromium -i "~/Library/Application Support/Google/Chrome/Default/History" -o chrome_history.csv + +# Windows - Chrome +python browser2timesketch.py -b chromium -i "C:\Users\YourUser\AppData\Local\Google\Chrome\User Data\Default\History" -o chrome_history.csv + +# Linux - Brave with custom label +python browser2timesketch.py -b chromium --browser-name "Brave" -i ~/.config/BraveSoftware/Brave-Browser/Default/History -o brave_history.csv + +# Windows - Edge +python browser2timesketch.py -b chromium -i "C:\Users\YourUser\AppData\Local\Microsoft\Edge\User Data\Default\History" -o edge_history.csv +``` + +### Safari +```bash +# macOS +python browser2timesketch.py -b safari -i ~/Library/Safari/History.db -o safari_history.csv + +# Or using the webkit alias +python browser2timesketch.py -b webkit -i ~/Library/Safari/History.db -o safari_history.csv +``` + +## Output Format + +The script generates a CSV file with Timesketch-compatible fields: + +| Field | Description | All Browsers | +|-------|-------------|--------------| +| `timestamp` | Unix timestamp in microseconds | ✓ | +| `datetime` | ISO 8601 formatted datetime | ✓ | +| `timestamp_desc` | Description of timestamp | ✓ | +| `message` | Human-readable event description | ✓ | +| `url` | The visited URL | ✓ | +| `title` | Page title | ✓ | +| `data_type` | Source identifier | ✓ | +| `visit_type` | Type of visit | Gecko, Chromium | +| `visit_duration_us` | Visit duration in microseconds | Chromium only | +| `total_visits` | Total visits to this URL | Chromium only | +| `typed_count` | Times URL was typed | Chromium only | + +## Browser Engine Details + +### Timestamp Formats + +Each browser engine uses a different timestamp format: + +- **Gecko (Firefox):** Microseconds since Unix epoch (1970-01-01 00:00:00 UTC) +- **Chromium:** Microseconds since Windows epoch (1601-01-01 00:00:00 UTC) +- **WebKit (Safari):** Seconds since Cocoa epoch (2001-01-01 00:00:00 UTC) + +The script automatically converts all timestamps to Unix microseconds for Timesketch. + +### Database Schemas + +- **Gecko:** Uses `moz_historyvisits` and `moz_places` tables in `places.sqlite` +- **Chromium:** Uses `visits` and `urls` tables in `History` database +- **WebKit:** Uses `history_visits` and `history_items` tables in `History.db` + +## Important Notes + +1. **Close the browser** before running the script to avoid database lock errors +2. **Copy the database file** to a temporary location if you want to avoid potential issues +3. **Handle output carefully** - the CSV contains your complete browsing history +4. Different browsers may have multiple profiles - make sure you're pointing to the correct profile directory +5. On Windows, use quotes around paths that contain spaces + +## Troubleshooting + +### Database is locked +- Close the browser completely +- Copy the database file to a temporary location and run the script on the copy + +### File not found +- Verify the profile directory name (the random string like `xyz123.default`) +- Check that the browser has been used and has history +- On macOS, use tab completion or check the exact path + +### Permission denied +- Run with appropriate permissions +- On Linux/macOS, check file permissions with `ls -l` +- On Windows, run as Administrator if needed + +## Requirements + +- Python 3.6 or higher +- No external dependencies (uses only standard library) + +## Privacy and Security + +This tool exports your complete browsing history. The output file contains: +- All visited URLs +- Page titles +- Visit timestamps +- Visit types and patterns + +Handle the output files appropriately and delete them when no longer needed. \ No newline at end of file diff --git a/browser2timesketch.py b/browser2timesketch.py new file mode 100755 index 0000000..9f10259 --- /dev/null +++ b/browser2timesketch.py @@ -0,0 +1,509 @@ +#!/usr/bin/env python3 +""" +Browser History to Timesketch CSV Converter + +Converts browser history from major browser engines to Timesketch-compatible CSV format. +Supports: Gecko (Firefox), Chromium (Chrome/Edge/Brave/etc.), WebKit (Safari) +""" + +import sqlite3 +import csv +import argparse +from datetime import datetime, timedelta +from pathlib import Path + + +def convert_gecko_timestamp(gecko_timestamp): + """ + Convert Gecko/Firefox timestamp (microseconds since Unix epoch) to ISO format. + Firefox stores timestamps as microseconds since 1970-01-01 00:00:00 UTC. + + Args: + gecko_timestamp: Gecko timestamp in microseconds + + Returns: + tuple: (microseconds, ISO formatted datetime string) + """ + if gecko_timestamp is None: + return 0, "" + + # Convert microseconds to seconds + timestamp_seconds = gecko_timestamp / 1000000 + dt = datetime.utcfromtimestamp(timestamp_seconds) + return gecko_timestamp, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00') + + +def convert_chromium_timestamp(chromium_timestamp): + """ + Convert Chromium timestamp to Unix microseconds and ISO format. + Chromium stores timestamps as microseconds since 1601-01-01 00:00:00 UTC (Windows epoch). + + Args: + chromium_timestamp: Chromium timestamp in microseconds since 1601 + + Returns: + tuple: (Unix microseconds, ISO formatted datetime string) + """ + if chromium_timestamp is None or chromium_timestamp == 0: + return 0, "" + + # Chromium epoch: January 1, 1601 + # Unix epoch: January 1, 1970 + # Difference: 11644473600 seconds + chromium_epoch_offset = 11644473600 + + # Convert to Unix timestamp (seconds since 1970) + timestamp_seconds = (chromium_timestamp / 1000000) - chromium_epoch_offset + + # Convert to Unix microseconds for Timesketch + unix_microseconds = int(timestamp_seconds * 1000000) + + dt = datetime.utcfromtimestamp(timestamp_seconds) + return unix_microseconds, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00') + + +def convert_webkit_timestamp(webkit_timestamp): + """ + Convert WebKit/Safari timestamp to Unix microseconds and ISO format. + Safari stores timestamps as seconds (with decimal) since 2001-01-01 00:00:00 UTC (Cocoa/Core Data epoch). + + Args: + webkit_timestamp: WebKit timestamp in seconds since 2001 + + Returns: + tuple: (Unix microseconds, ISO formatted datetime string) + """ + if webkit_timestamp is None or webkit_timestamp == 0: + return 0, "" + + # WebKit/Cocoa epoch: January 1, 2001 + # Unix epoch: January 1, 1970 + # Difference: 978307200 seconds + webkit_epoch_offset = 978307200 + + # Convert to Unix timestamp (seconds since 1970) + timestamp_seconds = webkit_timestamp + webkit_epoch_offset + + # Convert to Unix microseconds for Timesketch + unix_microseconds = int(timestamp_seconds * 1000000) + + dt = datetime.utcfromtimestamp(timestamp_seconds) + return unix_microseconds, dt.strftime('%Y-%m-%dT%H:%M:%S+00:00') + + +def extract_chromium_history(db_path, output_csv, browser_name=None): + """ + Extract browser history from Chromium-based browsers and convert to Timesketch CSV. + Works with all Chromium-based browsers: Chrome, Edge, Brave, Chromium, Opera, Vivaldi, etc. + + Args: + db_path: Path to Chromium History database + output_csv: Path to output CSV file + browser_name: Optional custom name for data_type field (default: "Chromium") + """ + + if browser_name is None: + browser_name = "Chromium" + + # Check if database exists + if not Path(db_path).exists(): + raise FileNotFoundError(f"Chromium database not found: {db_path}") + + # Connect to Chromium SQLite database + conn = sqlite3.connect(db_path) + cursor = conn.cursor() + + # Query to extract history visits with URL information + query = """ + SELECT + visits.visit_time, + urls.url, + urls.title, + visits.transition, + visits.visit_duration, + urls.visit_count, + urls.typed_count, + urls.last_visit_time + FROM visits + JOIN urls ON visits.url = urls.id + ORDER BY visits.visit_time + """ + + cursor.execute(query) + results = cursor.fetchall() + + # Transition type mapping (Chromium transition types) + # Core types (bits 0-7) + transition_types = { + 0: "Link", + 1: "Typed", + 2: "Auto_Bookmark", + 3: "Auto_Subframe", + 4: "Manual_Subframe", + 5: "Generated", + 6: "Start_Page", + 7: "Form_Submit", + 8: "Reload", + 9: "Keyword", + 10: "Keyword_Generated" + } + + # Write to Timesketch CSV format + with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile: + fieldnames = [ + 'timestamp', + 'datetime', + 'timestamp_desc', + 'message', + 'url', + 'title', + 'visit_type', + 'visit_duration_us', + 'total_visits', + 'typed_count', + 'data_type' + ] + + writer = csv.DictWriter(csvfile, fieldnames=fieldnames) + writer.writeheader() + + for row in results: + chromium_timestamp = row[0] + url = row[1] or "" + title = row[2] or "(No title)" + transition = row[3] + visit_duration = row[4] or 0 + visit_count = row[5] or 0 + typed_count = row[6] or 0 + last_visit = row[7] + + # Extract core transition type (lower 8 bits) + core_transition = transition & 0xFF + transition_name = transition_types.get(core_transition, f"Unknown({core_transition})") + + # Convert timestamp + unix_microseconds, iso_datetime = convert_chromium_timestamp(chromium_timestamp) + + # Construct message + message = f"Visited: {title}" + + writer.writerow({ + 'timestamp': unix_microseconds, + 'datetime': iso_datetime, + 'timestamp_desc': 'Visit Time', + 'message': message, + 'url': url, + 'title': title, + 'visit_type': transition_name, + 'visit_duration_us': visit_duration, + 'total_visits': visit_count, + 'typed_count': typed_count, + 'data_type': f'{browser_name.lower()}:history:visit' + }) + + conn.close() + + print(f"Successfully converted {len(results)} history entries from {browser_name}") + print(f"Output saved to: {output_csv}") + + +def extract_gecko_history(db_path, output_csv, browser_name=None): + """ + Extract browser history from Gecko-based browsers (Firefox) and convert to Timesketch CSV. + Works with Firefox and Firefox derivatives (Waterfox, LibreWolf, etc.) + + Args: + db_path: Path to Gecko places.sqlite database + output_csv: Path to output CSV file + browser_name: Optional custom name for data_type field (default: "Firefox") + """ + + if browser_name is None: + browser_name = "Firefox" + + # Check if database exists + if not Path(db_path).exists(): + raise FileNotFoundError(f"Gecko database not found: {db_path}") + + # Connect to Firefox SQLite database + conn = sqlite3.connect(db_path) + cursor = conn.cursor() + + # Query to extract history visits with URL information + query = """ + SELECT + moz_historyvisits.visit_date as timestamp, + moz_places.url, + moz_places.title, + moz_places.description, + moz_historyvisits.visit_type, + moz_historyvisits.from_visit + FROM moz_historyvisits + JOIN moz_places ON moz_historyvisits.place_id = moz_places.id + ORDER BY moz_historyvisits.visit_date + """ + + cursor.execute(query) + results = cursor.fetchall() + + # Visit type mapping (Firefox visit types) + visit_types = { + 1: "Link", + 2: "Typed", + 3: "Bookmark", + 4: "Embed", + 5: "Redirect_Permanent", + 6: "Redirect_Temporary", + 7: "Download", + 8: "Framed_Link", + 9: "Reload" + } + + # Write to Timesketch CSV format + with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile: + # Timesketch expected fields + fieldnames = [ + 'timestamp', + 'datetime', + 'timestamp_desc', + 'message', + 'url', + 'title', + 'visit_type', + 'data_type' + ] + + writer = csv.DictWriter(csvfile, fieldnames=fieldnames) + writer.writeheader() + + for row in results: + timestamp_us = row[0] # Firefox timestamp in microseconds + url = row[1] or "" + title = row[2] or "(No title)" + description = row[3] or "" + visit_type_id = row[4] + from_visit = row[5] + + visit_type_name = visit_types.get(visit_type_id, f"Unknown({visit_type_id})") + + # Convert timestamp + unix_microseconds, iso_datetime = convert_gecko_timestamp(timestamp_us) + + # Construct message + message = f"Visited: {title}" + if description: + message += f" - {description}" + + writer.writerow({ + 'timestamp': unix_microseconds, + 'datetime': iso_datetime, + 'timestamp_desc': 'Visit Time', + 'message': message, + 'url': url, + 'title': title, + 'visit_type': visit_type_name, + 'data_type': f'{browser_name.lower()}:history:visit' + }) + + conn.close() + + print(f"Successfully converted {len(results)} history entries from {browser_name}") + print(f"Output saved to: {output_csv}") + + +def extract_webkit_history(db_path, output_csv, browser_name=None): + """ + Extract browser history from WebKit-based browsers (Safari) and convert to Timesketch CSV. + + Args: + db_path: Path to Safari History.db database + output_csv: Path to output CSV file + browser_name: Optional custom name for data_type field (default: "Safari") + """ + + if browser_name is None: + browser_name = "Safari" + + # Check if database exists + if not Path(db_path).exists(): + raise FileNotFoundError(f"WebKit database not found: {db_path}") + + # Connect to Safari SQLite database + conn = sqlite3.connect(db_path) + cursor = conn.cursor() + + # Query to extract history visits with URL information + query = """ + SELECT + history_visits.visit_time, + history_items.url, + history_items.title, + history_visits.title as visit_title + FROM history_visits + JOIN history_items ON history_visits.history_item = history_items.id + ORDER BY history_visits.visit_time + """ + + cursor.execute(query) + results = cursor.fetchall() + + # Write to Timesketch CSV format + with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile: + fieldnames = [ + 'timestamp', + 'datetime', + 'timestamp_desc', + 'message', + 'url', + 'title', + 'data_type' + ] + + writer = csv.DictWriter(csvfile, fieldnames=fieldnames) + writer.writeheader() + + for row in results: + webkit_timestamp = row[0] + url = row[1] or "" + title = row[2] or row[3] or "(No title)" # Use visit_title as fallback + + # Convert timestamp + unix_microseconds, iso_datetime = convert_webkit_timestamp(webkit_timestamp) + + # Construct message + message = f"Visited: {title}" + + writer.writerow({ + 'timestamp': unix_microseconds, + 'datetime': iso_datetime, + 'timestamp_desc': 'Visit Time', + 'message': message, + 'url': url, + 'title': title, + 'data_type': f'{browser_name.lower()}:history:visit' + }) + + conn.close() + + print(f"Successfully converted {len(results)} history entries from {browser_name}") + print(f"Output saved to: {output_csv}") + + +def main(): + parser = argparse.ArgumentParser( + description='Convert browser history to Timesketch CSV format', + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Browser Engine Types: + gecko, firefox - Gecko-based browsers (Firefox, Waterfox, LibreWolf, etc.) + chromium - Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi, etc.) + webkit, safari - WebKit-based browsers (Safari) + +All Chromium-based browsers (Chrome, Edge, Brave, Opera, Vivaldi) use identical database +schemas and can be processed with the "chromium" option. Use --browser-name to customize +the label in the output if needed. + +HOW TO FIND YOUR PROFILE PATH: + Firefox: + 1. Open Firefox and type: about:support + 2. Look for "Profile Folder" or "Profile Directory" + 3. Click "Open Folder" button or note the path + 4. The places.sqlite file is in this directory + + Chromium browsers (Chrome/Edge/Brave/etc.): + 1. Open browser and type: chrome://version/ + (or edge://version/, brave://version/, etc.) + 2. Look for "Profile Path" - this shows the full path + 3. The History file (no extension) is in this directory + + Safari: + Always at: ~/Library/Safari/History.db + +Example usage: + # Firefox + python browser_to_timesketch.py -b firefox -i ~/.mozilla/firefox/xyz.default/places.sqlite -o output.csv + + # Any Chromium browser (Chrome, Edge, Brave, etc.) + python browser_to_timesketch.py -b chromium -i ~/.config/google-chrome/Default/History -o output.csv + + # Chromium browser with custom label + python browser_to_timesketch.py -b chromium --browser-name "Brave" -i ~/.config/BraveSoftware/Brave-Browser/Default/History -o output.csv + + # Safari (macOS) + python browser_to_timesketch.py -b safari -i ~/Library/Safari/History.db -o output.csv + +Database Locations: + Gecko/Firefox: + Linux: ~/.mozilla/firefox//places.sqlite + macOS: ~/Library/Application Support/Firefox/Profiles//places.sqlite + Windows: %APPDATA%\\Mozilla\\Firefox\\Profiles\\\\places.sqlite + + Chromium (Chrome/Edge/Brave/Opera/Vivaldi): + Chrome Linux: ~/.config/google-chrome/Default/History + Chrome macOS: ~/Library/Application Support/Google/Chrome/Default/History + Chrome Windows: %LOCALAPPDATA%\\Google\\Chrome\\User Data\\Default\\History + + Edge Windows: %LOCALAPPDATA%\\Microsoft\\Edge\\User Data\\Default\\History + Edge macOS: ~/Library/Application Support/Microsoft Edge/Default/History + + Brave Linux: ~/.config/BraveSoftware/Brave-Browser/Default/History + Brave macOS: ~/Library/Application Support/BraveSoftware/Brave-Browser/Default/History + Brave Windows: %LOCALAPPDATA%\\BraveSoftware\\Brave-Browser\\User Data\\Default\\History + + WebKit/Safari: + macOS: ~/Library/Safari/History.db + +Note: Close the browser before running this script to avoid database lock issues. +You may want to copy the database file to a temporary location first. + """ + ) + + parser.add_argument( + '-b', '--browser', + required=True, + choices=['gecko', 'firefox', 'chromium', 'webkit', 'safari'], + help='Browser engine type (firefox and gecko are aliases, safari and webkit are aliases)' + ) + + parser.add_argument( + '-i', '--input', + required=True, + help='Path to browser history database' + ) + + parser.add_argument( + '-o', '--output', + default='browser_history_timesketch.csv', + help='Output CSV file path (default: browser_history_timesketch.csv)' + ) + + parser.add_argument( + '--browser-name', + default=None, + help='Custom browser name for the data_type field (e.g., "Chrome", "Brave", "Edge")' + ) + + args = parser.parse_args() + + try: + # Normalize browser type + browser_type = args.browser.lower() + + if browser_type in ['gecko', 'firefox']: + extract_gecko_history(args.input, args.output, args.browser_name) + elif browser_type == 'chromium': + extract_chromium_history(args.input, args.output, args.browser_name) + elif browser_type in ['webkit', 'safari']: + extract_webkit_history(args.input, args.output, args.browser_name) + + except Exception as e: + print(f"Error: {e}") + import traceback + traceback.print_exc() + return 1 + + return 0 + + +if __name__ == "__main__": + exit(main()) \ No newline at end of file