Home > Software > TDR: Practice Using OVERLAYS
TDR: Practice Using OVERLAYS
Table of contents
Introduction
This is a basic walkthrough of decompiling a very simple PlayStation PSX-EXE binary (OVERLAYS.EXE, which you can download at the bottom of this page) using Ghidra and This Dust Remembers What It Once Was. This binary make use of the PsyQ implementation of memory overlays, which games like the PlayStation version of Diablo and Biohazard 2 use to swap code in and out of RAM since it won't all fit at once. This can make reverse-engineering trickier, which is why I created a greatly-simplified tutorial to introduce them in a controlled setting.
Note that some (maybe many) PlayStation games use other methods of swapping code in and out of RAM. You can see when this has occurred because there will be functions defined in the SYM file, but with no corresponding code (or all zeroes) when the binary is loaded into Ghidra along with the symbol data. You can probably handle these in a similar way.
The practice binaries were compiled using the same PsyQ toolchain as many real PlayStation titles. The source code and PsyQ build instructions are included for reference/reproducibility. They were used to develop the overlay-handling features of TDR version 0.8, as earlier versions did not have that capability.
Decompilation Walkthrough - OVERLAYS.EXE - Primary Binary
This first section is very similar to the process used for games without memory overlays. However, I've included a lot of extra detail and side-notes in this one, so I'd recommend reading through all of it if you want to learn more about using TDR for games I've not documented.
PlayStationELFConverter.exe --exe2elf OVERLAYS.EXE OVERLAYS.ELF > PlayStationELFConverter_Log.txt 2>&1
SymDumpTE.exe --debug --ignore-duplicate-definitions --rename-for-compatibility --auto-rename-fakes --inline-fakes --json OVERLAYS.SYM OVERLAYS.json > SymDumpTE_Log.txt 2>&1
If you examine the end of SymDumpTE_Log.txt, you'll see this message: Warning: the debug symbols for this project reference 3 PsyQ overlays. Reverse-engineering this type of project requires additional manual effort. Please consult the TDR documentation - for example, the walkthrough of decompiling the OVERLAYS example binary.The following overlays were found in this set of debug symbols:
Overlay ID: 0x04 (decimal: 4), address 0x8004099C, length 0x00000100 (decimal: 256)
Overlay ID: 0x05 (decimal: 5), address 0x8004099C, length 0x00000248 (decimal: 584)
Overlay ID: 0x06 (decimal: 6), address 0x8004099C, length 0x00000248 (decimal: 584)
In this extremely simple example code, there are three memory overlays. Any one of the three can be loaded into the block of memory starting at 0x8004099C, but only one can be loaded at a time (since all three share that same block of memory. When reverse-engineering this type of project using TDR, you'll generally want to perform a decompilation of just the primary binary, and then start handling the overlays after that. This is partly because in most cases it won't be obvious where the overlay data is stored, and having a decompiled version of the main binary will help you locate them. Additionally, if your goal is to generate C code that can be recompiled successfully, you'll want to have clean delineations between the sets of code, for reasons that will become obvious as you progress through this tutorial if they're not already.CreateSkeleton.exe --create-playstation-memory --assume-sn-gp-base --map-sld-functions --name OVERLAYS --externs-to-labels --output-updated-json OVERLAYS-Mapped.json --output Output OVERLAYS.json > CreateSkeleton_Log.txt 2>&1
Examine the log file (CreateSkeleton_Log.txt) and make sure it doesn't end with a Did not find an __SN_GP_BASE value in the debug symbol data error. It won't for OVERLAYS, but it might for othr games, like Diablo and Biohazard 2, so it's good to get in the habit. If you do see that error, you'll need to run this command instead for now, then do another pass later once you know the correct value for the global pointer:CreateSkeleton.exe --create-playstation-memory --map-sld-functions --name OVERLAYS --externs-to-labels --output-updated-json OVERLAYS-Mapped.json --output Output OVERLAYS.json > CreateSkeleton_Log.txt 2>&1
You always want to include the global pointer information if possible, because it makes Ghidra's decompilation of the code much more accurate.ProgramCounter: 0x800406F4
...or in the JSON version of the debug symbols for text like this:"program_counter": 2147747572
That address is the entrypoint..text:80040770 04 80 1c 3c lui gp,0x8004
.text:80040774 88 09 9c 27 addiu gp,gp,0x988
In MIPS assembly language, lui is the "load upper immediate" instruction[1] (where "upper" is the most significant 16 bits of a 32-bit value). The first line can be read as "set the two most significant bytes of the gp register to 0x8004, and set the two least-significant bytes to 0x0000", or "set the gp register to 0x80040000".void PrintMessage(char *message)
{
printf((char *)&PTR_DAT_80040988,message);
return;
}
PopulateSkeleton.exe --name OVERLAYS --input-json OVERLAYS-Mapped.json --input-source Output\OVERLAYS.C --input-data Output\XPRTDATA.C --output Output > PopulateSkeleton_Log.txt 2>&1
Comparing the source code to the decompiled output, you can see that the combination of Ghidra and TDR has done a pretty good job of recovering something like the original source code, with the exception of having a spurious function named OverlayAddress() in THISDUST.C and THISDUST.H in addition to the global variable of the same name.[2]
This is what MAIN.C looks like in the original source code:
#include
extern char *OverlayAddress;
extern void overlay1_function_1 (void);
extern void overlay1_function_2 (void);
extern void overlay2_function_1 (char *);
extern void overlay2_function_2 (void);
extern void overlay3_function_1 (char *);
extern void overlay3_function_2 (void);
int GlobalNumber;
void PrintMessage(char *message)
{
printf("%s\n", message);
}
void PrintCurrentGlobalNumber()
{
printf("Current global number value is %i\n", GlobalNumber);
}
static void loadOverlay(char *fileName)
{
int fileHandle;
int fileLength;
fileHandle = PCopen(fileName, 0, 0);
fileLength= PClseek(fileHandle, 0, 2);
PClseek(fileHandle, 0, 0);
PCread(fileHandle, OverlayAddress, fileLength);
PCclose(fileHandle);
FlushCache();
}
int main()
{
PrintMessage("PsyQ Overlay Example");
GlobalNumber = 0;
PrintMessage("Loading OVERLAY1.BIN");
loadOverlay("OVERLAY1.BIN");
PrintMessage("Calling overlay1_function_1()");
overlay1_function_1();
PrintMessage("Calling overlay1_function_2()");
overlay1_function_2();
PrintMessage("Loading OVERLAY2.BIN");
loadOverlay("OVERLAY2.BIN");
PrintMessage("Calling overlay2_function_1(\"Sent to overlay 2\")");
overlay2_function_1("Sent to overlay 2");
PrintMessage("Calling overlay2_function_2()");
overlay2_function_2();
PrintMessage("Loading OVERLAY3.BIN");
loadOverlay("OVERLAY3.BIN");
PrintMessage("Calling overlay3_function_1(\"Sent to overlay 3\")");
overlay3_function_1("Sent to overlay 3");
PrintMessage("Calling overlay3_function_2()");
overlay3_function_2();
PrintMessage("Trying to access an overlay which is no longer loaded should cause unexpected behaviour.");
PrintMessage("Calling overlay1_function_1() even though overlay 1 has been overwritten with overlay 3");
overlay1_function_1();
PrintMessage("Calling overlay1_function_2() even though overlay 1 has been overwritten with overlay 3");
overlay1_function_2();
PrintMessage("Calling overlay2_function_1(\"Sent to overlay 2\") even though overlay 2 has been overwritten with overlay 3");
overlay2_function_1("Sent to overlay 2");
PrintMessage("Calling overlay2_function_2() even though overlay 2 has been overwritten with overlay 3");
overlay2_function_2();
PrintMessage("Done!");
return 0;
}
This is the decompiled version:
#include "THISDUST.H"
#include "MAIN.H"
void PrintMessage(char *message)
{
printf(s_string_placeholder_with_newline,message);
return;
}
void PrintCurrentGlobalNumber(void)
{
printf(s_Current_global_number_value_is___80040000,GlobalNumber);
return;
}
void loadOverlay(char *fileName)
{
undefined4 uVar1;
undefined4 uVar2;
uVar1 = PCopen(fileName,0,0);
uVar2 = PClseek(uVar1,0,2);
PClseek(uVar1,0,0);
PCread(uVar1,OverlayAddress,uVar2);
PCclose(uVar1);
FlushCache();
return;
}
int main(void)
{
__main();
PrintMessage(s_PsyQ_Overlay_Example_80040024);
GlobalNumber = 0;
PrintMessage(s_Loading_OVERLAY1_BIN_8004003c);
loadOverlay(s_OVERLAY1_BIN_80040054);
PrintMessage(s_Calling_overlay1_function_1___80040064);
FUN_80040a24();
PrintMessage(s_Calling_overlay1_function_2___80040084);
FUN_80040a5c();
PrintMessage(s_Loading_OVERLAY2_BIN_800400a4);
loadOverlay(s_OVERLAY2_BIN_800400bc);
PrintMessage(s_Calling_overlay2_function_1__Sen_800400cc);
FUN_80040b28(s_Sent_to_overlay_2_80040100);
PrintMessage(s_Calling_overlay2_function_2___80040114);
FUN_80040b70();
PrintMessage(s_Loading_OVERLAY3_BIN_80040134);
loadOverlay(s_OVERLAY3_BIN_8004014c);
PrintMessage(s_Calling_overlay3_function_1__Sen_8004015c);
FUN_80040b28(s_Sent_to_overlay_3_80040190);
PrintMessage(s_Calling_overlay3_function_2___800401a4);
FUN_80040b70();
PrintMessage(s_Trying_to_access_an_overlay_whic_800401c4);
PrintMessage(s_Calling_overlay1_function_1___ev_80040220);
FUN_80040a24();
PrintMessage(s_Calling_overlay1_function_2___ev_80040278);
FUN_80040a5c();
PrintMessage(s_Calling_overlay2_function_1__Sen_800402d0);
FUN_80040b28(s_Sent_to_overlay_2_80040100);
PrintMessage(s_Calling_overlay2_function_2___ev_8004033c);
FUN_80040b70();
PrintMessage(s_Done__8004098c);
return 0;
}
Some notes:
void PrintMessage(char *message)
{
printf("%s\n", message);
}
void PrintMessage(char *message)
{
printf(s_string_placeholder_with_newline,message);
return;
}
extern string s_string_placeholder_with_newline;
s_string_placeholder_with_newline = "%s\n";
A future version of TDR may allow these to be collapsed inline.PrintMessage(s_Calling_overlay1_function_1___80040064);
FUN_80040a24();
void FUN_80040a24(void)
{
/* WARNING: Bad instruction - Truncating control flow here */
halt_baddata();
}
The code that should be here is only present in the overlay, so it can't be decompiled or referenced correctly in Ghidra without loading the corresponding overlay data, which will be covered in the next section.PrintMessage("Calling overlay2_function_2()");
overlay2_function_2();
PrintMessage("Calling overlay3_function_2()");
overlay3_function_2();
PrintMessage(s_Calling_overlay2_function_2___80040114);
FUN_80040b70();
PrintMessage(s_Calling_overlay3_function_2___800401a4);
FUN_80040b70();
Decompilation Walkthrough - OVERLAYS.EXE - Overlays
So now you know that the binary references code in overlays. How do you go about reverse-engineering those?
When reverse-engineering a real game, you'd first need to figure out where the overlay binaries were located. If you're lucky, they're all in individual .BIN files whose sizes correspond exactly with the list that's output by TDR. That's the default for PsyQ. If you're unlucky, they're concatenated together into a blob, or stored in a larger archive. You'll need to examine the primary game binary itself to figure out for sure.
By following the code from MAIN.C to THISDUST.C (or cheating and looking in the original source), you can see that OVERLAYS does the most basic thing possible and references three separate binaries: OVERLAY1.BIN, OVERLAY2.BIN, and OVERLAY3.BIN. Because all three of these overlays occupy the same block of memory, you'll need to follow the steps below separately for each of the three. That is, the steps below will walk you through reverse-engineering OVERLAY1.BIN, but you'll need to do the exact same steps for OVERLAY2.BIN and OVERLAY3.BIN.
Overlay ID: 0x04 (decimal: 4), address 0x8004099C, length 0x00000100 (decimal: 256)
Overlay ID: 0x05 (decimal: 5), address 0x8004099C, length 0x00000248 (decimal: 584)
Overlay ID: 0x06 (decimal: 6), address 0x8004099C, length 0x00000248 (decimal: 584)
You'll need to know the overlay ID (0x044 for the first overlay, for example), the address in memory to load it into, and the length of the overlay for the next step.PlayStationELFConverter.exe --exe2elf --overlay "0x04,0x8004099C,0x00000100,OVERLAY1.BIN" OVERLAYS.EXE OVERLAYS.ELF > PlayStationELFConverter_Log.txt 2>&1
COPY OVERLAYS.ELF OVERLAYS-OVERLAY1.ELF
SymDumpTE.exe --debug --ignore-duplicate-definitions --rename-for-compatibility --auto-rename-fakes --json OVERLAYS.SYM OVERLAYS.json > SymDumpTE_Log.txt 2>&1
CreateSkeleton.exe --create-playstation-memory --overlay 0x04 --assume-sn-gp-base --map-sld-functions --name OVERLAYS --externs-to-labels --output-updated-json OVERLAYS-Mapped.json --output Output OVERLAYS.json > CreateSkeleton_Log.txt 2>&1
PopulateSkeleton.exe --name OVERLAYS --overlay 0x04 --input-json OVERLAYS-Mapped.json --input-source Output\OVERLAYS.C --input-data Output\XPRTDATA.C --output Output > PopulateSkeleton_Log.txt 2>&1
Some key elements of this second phase:
Output/over0x04/source-decompiled/C/OVERLAYS/OVERLAY1.C has been created based on the additional overlay data incorporated into this step:
#include "THISDUST.H"
#include "OVERLAY1.H"
void overlay1_function_1(void)
{
PrintMessage(s_This_is_a_message_from_overlay1__800409a0);
return;
}
void overlay1_function_2(void)
{
PrintMessage(s_I_m_overlay_1__and_I_m_going_to_c_800409cc);
PrintCurrentGlobalNumber();
return;
}
...it's a pretty reasonable facsimile of the original:
#include "globals.h"
void overlay1_function_1()
{
PrintMessage("This is a message from overlay1_function_1.");
}
void overlay1_function_2()
{
PrintMessage("I'm overlay 1, and I'm going to call PrintCurrentGlobalNumber(), which is in main.c.");
PrintCurrentGlobalNumber();
}
Output/PRIMARY/source-decompiled/C/OVERLAYS/MAIN.C is nearly identical to the corresponding file from the first phase, but has had four of the unknown function calls replaced with the actual functions:
int main(void)
{
__main();
PrintMessage(s_PsyQ_Overlay_Example_80040024);
GlobalNumber = 0;
PrintMessage(s_Loading_OVERLAY1_BIN_8004003c);
loadOverlay(s_OVERLAY1_BIN_80040054);
PrintMessage(s_Calling_overlay1_function_1___80040064);
overlay1_function_1();
PrintMessage(s_Calling_overlay1_function_2___80040084);
overlay1_function_2();
PrintMessage(s_Loading_OVERLAY2_BIN_800400a4);
loadOverlay(s_OVERLAY2_BIN_800400bc);
PrintMessage(s_Calling_overlay2_function_1__Sen_800400cc);
FUN_80040b28(s_Sent_to_overlay_2_80040100);
PrintMessage(s_Calling_overlay2_function_2___80040114);
FUN_80040b70();
PrintMessage(s_Loading_OVERLAY3_BIN_80040134);
loadOverlay(s_OVERLAY3_BIN_8004014c);
PrintMessage(s_Calling_overlay3_function_1__Sen_8004015c);
FUN_80040b28(s_Sent_to_overlay_3_80040190);
PrintMessage(s_Calling_overlay3_function_2___800401a4);
FUN_80040b70();
PrintMessage(s_Trying_to_access_an_overlay_whic_800401c4);
PrintMessage(s_Calling_overlay1_function_1___ev_80040220);
overlay1_function_1();
PrintMessage(s_Calling_overlay1_function_2___ev_80040278);
overlay1_function_2();
PrintMessage(s_Calling_overlay2_function_1__Sen_800402d0);
FUN_80040b28(s_Sent_to_overlay_2_80040100);
PrintMessage(s_Calling_overlay2_function_2___ev_8004033c);
FUN_80040b70();
PrintMessage(s_Done__8004098c);
return 0;
}
If you repeat this phase with the other two overlays, you can build up enough information to recover the original source, but you'll need to manually piece MAIN.C back together using the information from all four phases.
Download | ||||
File | Size | Version | Release Date | Author |
OVERLAYS Toy PlayStation PSX-EXE With Source | 13 KiB | 1.0 | 2019-09-09 | Ben Lincoln |
A small, custom PsyQ PSX-EXE (with source code included for reference) that can be used to practice with TDR. This one makes use of PsyQ's memory overlay feature, which requires special handling when reverse-engineering. |
1. | See http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html. |
2. | This extra function definition is an artifact of the --map-sld-functions flag in CreateSkeleton.exe. For some reason, when OVERLAYS is compiled, PsyQ creates a label for the OverlayAddress global variable, but doesn't include it in the list of externs. This means that as far as the current version of TDR is concerned, it may as well be a function, because there is SLD data that points to it, and only a label defined at that address, which is how library functions appear in SYM data. A future version should do a better job of filtering out edge-case bad data like this. |
3. | Yes, it's theoretically possible to run the code in some sort of emulator, determine which overlay is loaded at a given point in the program execution, and use that information to map to the original function. If you want to write an extension for TDR which does that, be my guest :). I, unfortunately, don't have anywhere near enough time to do that. |