The problem

Messages have been clogging up my iphone for a while now and at this point, they are taking up maybe 15Gigs. Most of this is photo attchments. I thought an options was to manually go through the phone UI, deleting photo attachments to save space, but I did a bit of this tedium and the storage space did not seem to clear up. Maybe it is delayed? The other alternative is to hit a button that says “only keep messages up to a year old”. In reality, I don’t need the 15 Gigs of photo attachments, but I don’t want to lose the text data.

Mac laptop based export

Luckily a month or two ago I started syncing imessages through icloud and so I was able to try out ReagentX’s imessage-exporter on github. Looks like this only pulled scattered 1meg ofconversations, for sure only from this last month or two.

I am on macos Ventura so I could not do the itunes backup as stated in the reddit thread , but I tried the unencrypted iphone backup that you can use just when you connect your phone to your laptop. In this backup there was a Manifest.db , which allowed me to find a sms.db in there and also Attachments. I was able to also use imessage-exporter on this sms.db, however the --attachment-root option looks like was not supported for iphone based extraction with the version I tried. Still, now I have an additional 13meg of text data.

Note on using the Manifest.db in the iphone export

The imessage-exporter was not handling the 15gig of attachments I pulled from the iphone export at least now, but just writing out here what I did just in case this is available in the future.

First I used ChatGPT to quickly cook up some sqlite code to inspect this Manifest.db, and then to pull out the table to csv,

import sqlite3
from pathlib import Path


def get_table_schemas(database_file):
    conn = sqlite3.connect(database_file)
    cursor = conn.cursor()

    # Get a list of all tables in the database
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = cursor.fetchall()

    table_schemas = {}

    for table in tables:
        table_name = table[0]
        cursor.execute(f"PRAGMA table_info({table_name});")
        columns = cursor.fetchall()
        
        schema = {}
        for column in columns:
            column_name = column[1]
            data_type = column[2]
            not_null = column[3]
            default_value = column[4]
            primary_key = column[5]
            schema[column_name] = {
                "data_type": data_type,
                "not_null": not_null,
                "default_value": default_value,
                "primary_key": primary_key
            }
        
        table_schemas[table_name] = schema

    conn.close()
    return table_schemas

def print_schemas(schemas):
    for table_name, schema in schemas.items():
        print(f"Table: {table_name}")
        for column_name, column_info in schema.items():
            print(f"  Column: {column_name}")
            print(f"    Data Type: {column_info['data_type']}")
            print(f"    Not Null: {column_info['not_null']}")
            print(f"    Default Value: {column_info['default_value']}")
            print(f"    Primary Key: {column_info['primary_key']}")

def export_table_to_csv(database_file, table_name, csv_file):
    conn = sqlite3.connect(database_file)
    cursor = conn.cursor()

    cursor.execute(f"SELECT * FROM {table_name};")
    data = cursor.fetchall()

    column_names = [description[0] for description in cursor.description]

    with open(csv_file, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(column_names)
        writer.writerows(data)

    conn.close()

So looking at the schema, with print_schemas , I saw two tables, Files and Properties. And I extracted both, like,

path = (Path("/Users/michal/Library/Application Support/MobileSync")
       / "Backup/00008020-000E7C903A23002E" / "Manifest.db")

import pandas as pd
import csv
export_table_to_csv(str(path), "Files", 
                    str(path.with_stem(path.stem + "-Files").with_suffix(".csv")))
export_table_to_csv(str(path), "Properties", 
                    str(path.with_stem(path.stem + "-Properties").with_suffix(".csv")))

manifestdf = pd.read_csv(path.with_suffix(".csv"))

I did a search for sms like,

manifestdf[(manifestdf.relativePath.notnull()) 
           & (manifestdf.relativePath.str.contains("sms"))][[
           "fileID","domain","relativePath"]].to_dict(orient="records")

which dug out, this sms.db ,

 {'fileID': '3d0d7e5fb2ce288813306e4d4636395e047a3d28',
  'domain': 'HomeDomain',
  'relativePath': 'Library/SMS/sms.db'},

I did also notice “Library/SMS/Attachments/” , “Library/SMS/Drafts/” in there, so I tried digging out the Attachments like this ,


attachments_records = manifestdf[
    (manifestdf.relativePath.notnull()) 
    & (manifestdf.relativePath.str.contains("SMS/Attachments"))
    ][["fileID","domain","relativePath"]].to_dict(orient="records")

Somehow a lot of the data appeared to be missing, somehow, so I filtered by not missing,

source_dir = (Path("/Users/michal/Library/Application Support/MobileSync")
        / "Backup/00008020-000E7C903A23002E")
destination_dir = (
    Path("/Users/michal/imessage-data")
    / "2023-12-09-1934-from-iphone-hmm"
    / "Attachments"
	)
attachments_exist = []
for x in attachments_records:
    if (source_dir / x["fileID"][:2] / x["fileID"]).exists():
        attachments_exist.append(x)

attachments_exist[:2]
# [{'fileID': '7112c681ea44f8b64609aa943b5b1182a393136a',
#   'domain': 'MediaDomain',
#   'relativePath': 'Library/SMS/Attachments/61/01/72A08FCD-CD0B-4AF7-83CF-F5E94D546A64/IMG_1976.PNG'},
#  {'fileID': '1faae24aef070aebc630d8b315b1630f2ffc5be3',
#   'domain': 'MediaDomain',
#   'relativePath': 'Library/SMS/Attachments/61/01/FA8E24CD-AF3D-4607-B954-347E5DEF57AA/IMG_0714.jpeg'}]

After some trial and error, I ended up with this for the copy,

def copy_attachments(attachments_records, source_dir, destination_dir):
    for record in attachments_records:
        first_two_characters = record["fileID"][:2]
        from_path = source_dir / first_two_characters / record["fileID"]
        if from_path.exists():
            relative_path = Path("/".join(record["relativePath"].split("/")[3:]))
            to_path = destination_dir / relative_path
            if not to_path.parent.exists():
                to_path.parent.mkdir(parents=True, exist_ok=True)
            shutil.copy(from_path, to_path)
            ...
        ...

Though as I mentioned earlier, the imessage export call, basically did not work. The diagnostics option ignores the attachments,

imessage-exporter     --db-path /Users/michal/imessage-data/2023-12-09-1934-from-iphone-hmm/sms.db   --attachment-root /Users/michal/imessage-data/2023-12-09-1934-from-iphone-hmm/Attachments  --diagnostics   
Building cache...
[1/4] Caching chats...
[2/4] Caching chatrooms...
[3/4] Caching participants...
[4/4] Caching reactions...
Cache built!

iMessage Database Diagnostics

Contacts with more than one ID: 49
Message diagnostic data:
    Total messages: 76650
    Messages not associated with a chat: 2112
    Messages belonging to more than one chat: 26
Attachment diagnostic data:
    Total attachments: 16272
        Data referenced in table: 16.15 GB
        Data present on disk: 0.00 B
    Missing files: 16272 (100%)
        No path provided: 189
        No file located: 16083
Global diagnostic data:
    Total database size: 171.45 MB
    Duplicated contacts: 74
    Duplicated chats: 6
Done!

And later also, using the --platform iOS option I saw that the --attachment-root option is only supported for --platform macOS , so I am stuck at this point. But maybe later this can be worked out.

In any case, the good news is that I was able to dig out the text data, which was more important to me and now I feel more comfortable to just go with the “only keep 1 years worth of data " option, moving forward 😀.