Technology Forum, Adobe Photoshop Sub-Category, Decoding the Photoshop Database Thread
A few quick searches on the web found the biggest help and that is that PSE uses an MS Database - they just rename the file .psa. So, if you find the catalog file (under documents and settings\all users\application data\adobe\catalogs or just search for .psa on your machine), make a copy of it and rename the copy .mdb and, providing you have MS Access (or, as in my case, a work computer that has MS access) you can look at the tables.
Great, should be easy from here. No! For some reason (presumably how the program accesses the data) it's not build on a relational database structure. The table “FolderTable” contains the details of the tags you've setup and table “ImageTable” contains all the details of your images (where they're stored and a bunch of stuff about the camera). There isn't a table that joins the two and gives you the tags to images. Instead, there is a 400 character binary field called “fFolderInfoArray” in the table ImageTable. This field contains the information you need, but it's not that easy to get to.
I couldn't get MS Access to read the field properlly. Not sure if it's the version of Access, but the field type for fFolderInfoArray is named as “Binary” which isn't a valid field type in access. After a little while of trying, I gave up and decided to use PHP to ODBC into the database and pull the data.
First, if you just use Access to dump the data to a txt file and open it up, the field looks like this:
00 00 00 00 00 00 00 00 00 00 - data
A B C D E F G H I J - reference
The first two pairs of numbers contain the tag information. Initially I thought it was just the first two, but then I noticed I had tags over the number 255 so it uses the next pair of numbers. It then repeats after every fourth set. Me thinks this would be saying that it uses the first two bits in each byte, but I might be wrong with this. So, tag 1 is AB, tag 2 is EF, tag 3 is IJ etc. Anyway, the weird thing is that it reads it back to front, so instead of doing a Hexadecimal to Decimal conversion on AB, you have to do it on BA. In other words, if your data looks like this:
a6 01 00 00 0C 00 00 00 7D 00
You would have tags 422, 12 and 125 assigned to your data. The hex to dec conversion of 01a6 is 422.
That's all you need to know to do the conversion.
A helpful tip is that the tag file contains heaps of data around uploads of files, printing, scanning etc. So you don't need all of the tags if you just want to extract your user data. You will need to filter this out.
Some 16 years after making this thread and posts, I note that the above is pretty irrelevant. I switched to PSE9 which was released in 2010. That's built on a SQLite database which can be browsed or edited with a SQLite browser. The tables and fields are pretty self explanatory and the tags are stored as regular numbers, so no need to decrypt anything.
This post itself is probably out of date as they've quite possibly moved on since then, but I've no intention of upgrading, so no need to find out.
2nd December 2007