Home > Software > This Dust Remembers What It Once Was
This Dust Remembers What It Once Was
article and software by Ben Lincoln
Table of contents
- Introduction
- Components
- Walkthroughs
- Results
- Thank You
- Frequently-Asked Questions
- Known Issues
- Plans for Future Versions
- Update History
- Downloads
Introduction
This Dust Remembers What It Once Was ("TDR") is a reverse-engineering toolkit I wrote for use with the NSA'a amazing tool Ghidra. Ghidra is a completely free, open-source binary reverse-engineering toolkit that includes not only a disassembler, but a decompiler that must have been written using black magic. I can't thank its authors and the NSA enough for releasing it last year.
I wanted to use Ghidra to help reverse engineer Soul Reaver, my favourite game of all time, but at least when I started, there were a couple of obstacles in my way: Ghidra doesn't support the proprietary PSX-EXE format used for PlayStation binaries, and it also doesn't support the PsyQ .SYM debug symbol format.
I originally started writing TDR specifically for that one project, but I've tried to generalize it enough to work with any PlayStation title that has PsyQ debug symbols available. The PSX-EXE-to-ELF converter means that any PlayStation binary should be importable into Ghidra, even if it wasn't written using PsyQ at all. It also means that any PlayStation binary can be imported into other tools that support ELF binaries, such as RetDec.
I have some additional componnents in mind for later that will extend it to other gaming platforms, but I'm not sure when I'll have time to get around to that.
Be warned, the current version of TDR should be considered an alpha release, in the traditional sense: it's feature-complete, but it's probably full of bugs. I don't know how frequently I'll be able to work on it, so I wanted to get it out there in case it was useful to someone even in its current state.
TDR is a highly-specialized reverse-engineering tool. The documentation below is pretty barebones at the moment, and assumes extensive pre-existing knowledge. I'd like to expand it in the future.
TDR itself is open-source, licensed under the GPLv3. Warning: you may regret looking at some of the code. This is a project that grew organically over about eight months. It involved lots of on-the-fly design changes because I was learning about some of the low-level details as I went.
If you just want to be able to load PlayStation games into Ghidra and don't have .SYM files, you can also take a look at DrMefistO's outstanding PSX Loader for Ghidra. A future version of this content will include an alternative walkthrough for using that loader in place of the EXE-to-ELF step, for those who wish to compare.
Components
The current version of TDR is made up of four tools (in addition to Ghidra itself, which you'll need to install separately):
- PlayStationELFConverter.exe - takes a PSX-EXE binary and converts it into the more standard ELF format.
- SymDumpTE.exe - a heavily-customized version of Steffen Ohrendorf's SymDump. Parses PsyQ .SYM files and generates a JSON file describing the content.
- CreateSkeleton.exe - parses the game binary and the JSON output of SymDumpTE.exe to generate a number of files (see below), some of which can be imported into Ghidra along with the ELF.
- PopulateSkeleton.exe - uses all of the previous data, plus a Ghidra-generated monolithic decompilation of the binary and exported data definitions, to attempt to regenerate something like the original set of source code files for the game.
CreateSkeleton.exe does the bulk of the work in the current version of TDR. From the input data, it generates the following:
- A monolithic C header file for the entire project.
- Stub versions of all of the C source files referenced in the debug symbols.
- Two Ghidra scripts to import most of the debug symbol data into Ghidra.
- A Ghidra script to identify additional arrays in global/static variables and other embedded data.
- A Ghidra script to export the decompiled source code in a way that's easier to parse.
- A Ghidra script to export global/static variables and other data embedded in the binary.
Walkthroughs
I've written five walkthroughs to help people jump into the basics of using TDR. Following these walkthroughs will get you a long way, much faster than manually reverse-engineering these games from scratch, but you'll still be doing a lot of manual work in Ghidra if you want to generate code that will compile back to a working game binary. If all you're looking for is mostly-accurate decompiled code to use as a reference for e.g. reverse-engineering file formats, you might not have to do any additional work, though.
The reason there are five walkthroughs instead of three is that with earlier (pre-0.8) versions of TDR, numerous game-specific manual workarounds were necessary just to get basic decompiled code. Game-specific manual work will always be required to get highly-accurate results, but as of version 0.8, three of my five main test cases (Soul Reaver, EDGECASE, and an unnamed prototype) can all have their first pass done without any of the manual workarounds that were previously documented here. Need for Speed 4 requires one minor fix, but I'm hoping to eliminate that in a future version. Biohazard 2, to my surprise, uses PsyQ memory overlays. It's possible to do a basic decompilation of it easily, but that won't include any of the information from the overlays. I'll add a second section later that covers the overlays in that game.
Important:
When a new version is released, be sure to delete (or archive) all of the generated files, start over from scratch, and re-read the walkthrough. This is still alpha-quality software, and the "right" way to use it is changing essentially with every release. Mixing files from different versions will result in poor output or crashes.
- TDR: Practice Using EDGECASE - a basic walkthrough of decompiling a very simple PlayStation PSX-EXE binary which has source code included so you can compare the decompiled code with the corresponding original source.
- TDR: Practice Using OVERLAYS - a basic walkthrough of decompiling a very simple PlayStation PSX-EXE binary (also with source code included) which uses PsyQ's memory overlay features (like Diablo). This type of binary requires more manual work, and the use of some additional TDR features.
- TDR: Soul Reaver - steps that should work with most debug builds of Soul Reaver, but specifically tested with the 1999-06-01 prototype.
- TDR: Need For Speed 4 - tested with the 1999-02-22 (v9.0) version of the game.
- TDR: Biohazard 2 - tested against the 1997-10-30 build of Biohazard 2.
Results
This section will be greatly expanded in the future, but for the most in-depth current look at results, see TDR: Practice Using OVERLAYS.
TDR works really well with all of the debug builds of Soul Reaver I've tested it against.
It does a solid job against the 1997-10-30 beta build of Biohazard 2.
As of version 0.6, it does a pretty phenomenal job with the 1999-02-22 prototype of Need for Speed 4, especially considering the extremely complex codebase for that game.
It does not do so well with the 1996-08-05 prototype version of Wipeout XL, because that build was created without the -g flag for CCPSX.EXE, which means the .SYM file essentially only includes labels, not other types of symbols. It's better than nothing, but a significant additional amount of manual work would be required decompile it to working code.
Thank You
It wouldn't have been possible for me to build and refine TDR without the following people and organizations. Sincere thanks to:
Known Issues
- DO NOT RUN MULTIPLE INSTANCES OF GHIDRA SIMULTANEOUSLY ON THE SAME MACHINE. It is very likely that if you attempt to run a script, it will run against a different instance of Ghidra than the one you think it will, which will corrupt your other project. This may apply to other Ghidra scripts, or it may not, but it definitely applies to TDR.
- When using the --auto-rename-fakes flag in SymDumpTE.exe, a lot of what are effectively duplicate structs and unions are generated. The explanation as to why is a bit lengthy.
The PsyQ C compiler likes to split up nested structs and unions into the main struct/union and members named .#fake, where # is a decimal number. Each object file that is linked into the PSX-EXE will end up with its own copies of structs, unions, etc. that are shared between multiple object files. If two object files use the exact same set, they seem to end up with "fake" definitions that match up, but when this is not the case, one may have a struct named .117fake which contains two ints and a char[16], which another might have one named .117fake which contains three shorts and a char. The --auto-rename-fakes flag prepends the name of the object file (minus any extension) to the names of the fake structs and unions in that object file. This (usually) makes them unique across the entire codebase, but it also means that if they were the same thing originally, now there are multiple copies. This is arguably better than the alternative, as the code should compile and work, but it does make a mess of things.
Making this more automatic and less problematic is a focus area for future versions of TDR.
- The output of all of the command-line tools is extremely verbose at present.
- Code which originated in assembly-language source files (e.g. SNMAIN.S) is inserted as C code in both the stub and decompiled code output files. This is probably better for readability, but less likely to result in perfect bit-for-bit recompiled versions of games. On the other hand, most of these files are library files, and you should be replacing the decompiled versions with the originals from the PsyQ SDK anyway, so addressing this is a low priority.
- Custom variadic functions may not be handled correctly. printf() and other common/standard variadic C functions should be fine.
Plans for Future Versions
Some changes/additional features that I have planned for future versions. Some of these will probably be in the next release. Others may take longer.
- Collapse references to "fake" structs and unions so that they're inlined into their parent structs/unions. This will more closely resemble the original source code, and should make compilation more reliable. This feature was added in version 0.9
- Build a map of dependencies to help with accurately placing functions and data.
- Generate a makefile and a linker file.
- Generate a separate output file which shows where functions and data were placed in the decompiled source by PopulateSkeleton.exe.
- More accurate placement of functions and data in the decompiled source.
- Automatically add the appropriate #include statements for each decompiled source file based on known library functions, etc.
- Better automatic handling of data types which currently end up in Ghidra as undefined pointers.
- If practical, for source code which was in assembly language, generate assembly language output instead of C/C++.
- Support for other gaming platforms (PlayStation 2, Xbox, etc.). This is likely to take quite awhile.
Frequently-Asked Questions
Why write wrappers around Ghidra and Ghidra scripts instead of making TDR a Ghidra extension?
Ghidra extensions need to be recompiled for every Ghidra release. The current model permits users to (in most cases) continue using TDR with whatever the current version of Ghidra is without waiting for someone to recompile TDR.
Additionally, a separate set of tools permits analysis of PlayStation code in other tools, such as RetDec.
Some planned future elements of TDR may be created as Ghidra extensions if that is the best approach for them.
For a game with memory overlays, like Diablo, is there a way to process all of the overlays at once?
You can use SymDumpTE.exe without the --json flag to get low-level information about all of the overlays in text form, and you can include more than one overlay when calling CreateSkeleton.exe and PopulateSkeleton.exe as long as none of the overlays overlap in memory, but you can't process overlapping overlays in the rest of the toolchain. If that doesn't make sense, try going through the TDR: Practice Using OVERLAYS walkthrough.
Update History
In reverse chronological order:
Version 0.9 - released 2019-09-25
- Added an --inline-fakes flag to SymDumpTE.exe. It builds on --auto-rename-fakes (and requires that flag), by automatically replacing referenced to "fake" structs and unions with inline versions of them, then hiding the fake definition. This should much more closely match the original source code, as the "fake" structs and unions are the result of them being inlined in the source.
Note 1: if a binary is made up of multiple object files, and those object files each refer to identical fake structs/unions, you will end up with some orphans in the decompiled header files. For example, in Soul Reaver, just about every object file makes reference to the Camera struct, which contains a large number of inlined structs and unions. With the way this version of TDR works, you will get a valid definition of Camera in the monolithic header file, but you'll also get some unused duplicates of the fake structs and unions which were referenced in the unused copies of the Camera definition found in the other object files (DRAW_109fake, MORLOCK_114fake, PLAYER_114fake, etc.). You can take them out manually if you like when attempting to recompile the code, but as they're not referenced, they shouldn't hurt anything.
I'm going to try to improve that in a future release so that they're removed entirely, but, again, didn't want to delay the release.
Note 2: some games seem to define global variables using inline struct/union definitions. E.g. in Soul Reaver, the WarpGateLoadInfo, functionChoiceTable, and gMcmenu global variables. These are not currently inlined even when using --inline-fakes, because I've not found documentation on the proper C syntax for that, and I want to try to avoid situations where TDR generates code that won't compile.
- Greatly improved handling of potentially-duplicated structs and unions.
- Automatic assignment of code elements (where the information is not explicitly present in the SYM file and was being inferred) has been disabled in CreateSkeleton.exe and PopulateSkeleton.exe. It was a little buggy in 0.8. I have a better way of doing this in mind, but didn't want to delay this release until it was ready, as the other improvements will make a big difference for some users.
- More code cleanup. Still a long way to go.
Version 0.8 - released 2019-09-11
- Added support for PsyQ overlays to PlayStationELFConverter.exe, SymDumpTE.exe, CreateSkeleton.exe, and PopulateSkeleton.exe.
- Added a global variable/data-exporting Ghidra script. This is a data counterpart to the existing decompiled-code-exporting script, and outputs global variables/embedded data as C code. This should save an enormous amount of time when attempting to reconstruct source code.
- Added a custom array-detection Ghidra script. This automatically detects most global arrays and redefines them as such in Ghidra. Combined with the data-exporting script, a huge chunk of manual reverse-engineering effort can generally be avoided.
- Added fully-functional handlers for most remaining SYM file entry types to SymDumpTE.exe, and stub handlers for the rest. This provides a significant amount of additional data about PsyQ-based PlayStation binaries.
- Greatly-improved handling of labels, as the label type is now identified. Labels of type 1 (which generally are relative offsets, memory segment boundaries, or other data that should NOT be treated as function/variable names) are now filtered out from inclusion in the Ghidra scripts where applicable.
- Multiple additional sources of forensic information regarding the original source and/or object files are now parsed from the SYM data and used by TDR to map functions and data to their proper source files.
- Added an --auto-rename-fakes flag to SymDumpTE.exe. This automatically renames "fake" structs and unions to avoid name conflicts and generate more accurate results. Using this option is highly recommended.
- Improved handling of 'fake' structs and unions even if --auto-rename-fakes is not used.
- Added an --output-updated-json option to CreateSkeleton.exe which will create an updated version of the JSON debug symbol file. This updated version will contain additional data either generated by CreateSkeleton.exe, or forensically determined from the newly-analyzed additional symbol data, and can be passed to PopulateSkeleton.exe instead of the original JSON file for a more accurate reconstruction. Optionally, the file can be manually edited based on user knowledge/intentions so that PopulateSkeleton.exe will place certain code elements in locations of the user's choice even if they could not be automatically determined by TDR.
- Added a --map-sld-functions flag to CreateSkeleton.exe. This uses some of the previously-ignored data in the symbol file to map functions without explicit function definitions (library calls, etc.) to original source code files. Using this flag is highly recommended as it should greatly reduce the number of unmapped functions in the decompiled code output.
- Added a --use-gp-base option to CreateSkeleton.exe. Use this to manually specify a global pointer value which should be assumed at the beginning of all functions in Ghidra. This works like --assume-sn-gp-base, but requires that the global pointer value be specified by the user. This option is intended for use with games like Diablo which do not include a __SN_GP_BASE value in their debug symbols, and requires manual effort to determine.
- Added a --ignore-duplicate-definitions flag to SymDumpTE.exe. This will suppress messages about duplicate (identical) definitions of structs, enums, etc. Messages about redefinitions/name conflicts will still be output even when this flag is used.
- When creating stub source code and decompiled code, CreateSkeleton.exe and PopulateSkeleton.exe now include two additional directories in the path to the output files. The first is named PRIMARY for the standard executable code, and after the overlay ID for any additional overlay binaries handled by TDR. The second named after the base drive letter where the original source code was located, in order to avoid conflicts in cases where the original developer had identical paths with different files on different drives.
For example, a file generated by TDR 0.7 and earlier might be located at source-stubs/kain2/game/RAZIEL/RAZIEL.C. In version 0.8 and later, that same file would be located at source-stubs/PRIMARY/C/kain2/game/RAZIEL/RAZIEL.C.
- Added a --replace-non-ascii-labels flag to CreateSkeleton.exe as an alternative to the more blunt --ignore-non-ascii-labels.
- Corrected some minor memory segment naming issues.
- SymDumpTE.exe and PopulateSkeleton.exe now correctly display their help messages.
- Added some input validation to help prevent invalid Java script content in the event of unusual data, or malicious code execution in the event that malicious, crafted PlayStation binaries are processed.
- Externs are now named correctly in the JSON files.
- Additional work on general code cleanup to make the source less horrific (still in progress). Reducing the overall horror did lead to some localized increases in horror.
Version 0.7 - released 2019-08-22
- Added an --assume-sn-gp-base flag to CreateSkeleton.exe. If the debug symbols include an __SN_GP_BASE label (most PsyQ games do), then setting this flag will add an assumption at the beginning of every function that the global pointer register is set to the value of the address associated with the __SN_GP_BASE label. This will significantly improve decompilation results in most cases, so using this option is highly recommended.
- Added "TDR" to the filename of the function-definition Ghidra script to make it easier to find all of the TDR scripts in Ghidra's script manager.
- Enum/struct/union definition calls in the Ghidra scripts are now split out into subroutines like the label- and function-defining calls, so that Biohazard 2 can be processed almost normally again.
- First pass at some general code cleanup to make the source less horrific.
Version 0.6 - released 2019-08-19
- Fixed a bug in SymDumpTE.exe inherited from the original SymDump in which pointers to arrays did not correctly have parentheses applied. For example, a function with the signature void func(struct someStruct (*someVar)[3]) would be incorrectly output as void func(struct someStruct *someVar[3]). The former is a pointer to an array of 3 someStruct structs, while the latter is an array of 3 pointers to someStruct structs. This was causing issues with datatype importing, function signatures, and parameter storage assignment in Ghidra.
- Disabled explicit assignment of parameter storage by default, because the information in the SYM files appears to be misleading. If you want to try it out anyway, it can be enabled with the --assign-parameter-storage flag for CreateSkeleton.exe, but it is not recommended.
- Fixed handling of function pointer parameters and return types.
- Borrowed most of the memory map/segment-defining code from DrMefistO's outstanding PSX Loader for Ghidra. Passing the --create-playstation-memory flag to CreateSkeleton.exe will create a much more accurate model of PlayStation hardware memory, which may make some decompilation results much easier to read.
- Added a --rename-for-compatibility flag to SymDumpTE.exe which is much better at handling bulk renaming of enums/structs/unions whose names have leading periods. If you aren't manually tweaking the JSON file to address these, using this option is highly recommended.
- Fixed handling of an edge case in CreateSkeleton.exe when functions were defined in a header (.H) file instead of source code. This is not a recommended practice, but it is allowed by the PsyQ C compiler, so it's possible some games may be built this way.
- Fixed buggy generation of stub source code files, where parameter names were repeated.
Version 0.5 - released 2019-08-14
- Fixed some significant regression bugs regarding parameter storage and the fp register. Should work much better.
Version 0.4 - released 2019-08-14
- Based on feedback from Gh0stBlade, the toolchain now explicity sets the stack pointer and global pointer values at the entrypoint of the program code in Ghidra. This greatly improves some of the decompilation results.
- CreateSkeleton.exe now translates references to the fp (frame pointer) register to the s8 register, which is how Ghidra interprets that register. This improves automatic parameter storage assignment, and significantly improves the overall decompilation results.
- Nearly-complete rewrite of the function-generation generation code used to populate the first Ghidra script. It's still a bit ugly, but at least it's less of a monstrosity it was previously.
- The parameter storage assignment logic no longer tells Ghidra to use ra for the return value of non-void-returning functions by default. If you really want to do this, you can use the new --assign-ra-storage flag.
- Added an --add-stack-pointer-offset flag for what is probably an obscure edge case.
- Fixed several other bugs.
Version 0.3 - released 2019-08-14
- --externs-to-labels now also sets the data type of externs and static variables, in addition to labeling them.
- Complete rewrite of the decompiled code export/import logic, so that decompiled versions will be matched up 100% accurately with their original source files as long as sufficient data is present in the SYM file.
- TDR now attempts to set the register or stack offset of function parameters. If it fails, it reverts to its previous behaviour of allowing Ghidra to automatically determine these values.
- Added --ignore-non-ascii-labels flag to CreateSkeleton.exe as a workaround for odd labels in the 1997-10-30 beta build of Biohazard 2.
- Fixed a variety of edge-case bugs related to the 1999-02-22 version of Need for Speed 4.
- Numerous other bug-fixes and improvements.
Downloads
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
654 KiB
|
0.9
|
2019-09-25
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
644 KiB
|
0.8
|
2019-09-11
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
578 KiB
|
0.7
|
2019-08-22
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
575 KiB
|
0.6
|
2019-08-19
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
571 KiB
|
0.5
|
2019-08-14
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
571 KiB
|
0.4
|
2019-08-14
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
568 KiB
|
0.3
|
2019-08-13
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
558 KiB
|
0.2
|
2019-08-06
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
File
|
Size
|
Version
|
Release Date
|
Author
|
This Dust Remembers What It Once Was
|
557 KiB
|
0.1
|
2019-08-06
|
Ben Lincoln
|
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
|
Related Articles:
TDR: Practice Using EDGECASE
TDR: Practice Using OVERLAYS
TDR: Soul Reaver
TDR: Need For Speed 4
TDR: Biohazard 2