SDF File decomposition notes

This forum is dedicated to the ones who take care about Ground Control mods and creating maps.

Moderator: Moderators

Board-General
User avatar
Posts: 303
Joined: 05 Mar 2012 06:32
Location: United States

SDF File decomposition notes

Postby stAtrill » 27 Jul 2014 06:10

Hey hey, figured it was about time I uploaded these notes so that they wouldn't get lost to time. For starts, to a competent reverser, this may be an easy file to crack. However, since this is my first foray into reversing, I am super slow and don't quite know what I am doing. These notes chronicle something of a journey, so beware - they can be contradictory and are a bit stream-of-consciousness-ish.

Code: Select all
SDF file decomposition

First 3 bytes are always 52 59 53, RYS (as in Martin Rystrand, the software engineer who designed this file)

Next byte seems to be the version code. This was incremented from 07 before 5/13/2000 to 87 in 5/15/2000 to C7 after 5/23/2000, all SDFs written now will have C7 as this byte.
Next Dword (4 bytes) describe the length of the file. These bytes are in little-endian format, suggesting that the whole file is in little endian. (!!!) The last byte in the file is this length, minus one.

The version code seems to be important, as if a 87 file is un-sdfed and re-sdfed, the file is largely dissimilar.

In the order it was ennumerated, as listed in the SDf man program
=Header
=folder
=>File

If the file is empty (contains no data), will have the 12 bytes
E8 1C C5 92 68 1F 91 E8 9A E8 10 C4

run length encoding or deflate?

repeating pattern XX 1C C5 92, where XX is some byte. This is true of any internal file structure. The byte XX will also change depending on the version code.

Top of file, after header, is always (this is the top level directory):
XX 1C C5 92   YY 1C C5 92   ZZ 1C C5 92

XX is E0 plus the number of files, if they are all in the parent folder.
YY and ZZ are EE + the length of the name of the first file if only one file is present, and the length is under some threshold
Will be EF is no files is present, decreases as number of files increase
Seems to be EF + (total filename -1 per file) -

*Version code byte confirmed*
Upon changing the version code of a test file built with C7 to 87, the file failed to extract. Upon changing the test file to 07, the file printed out the original file as all literal characters. The name was intact, meaning that comparing against versions may lead to how the names are encoded. Finally, this may show what sections of the file are data sections

From further testing, I confirmed that 07 stores data as literal values (as in, version 07 files are purely uncompressed archives, with only names scrambled). This may mean that Massive was experimenting with ways to store data densely without incurring too much overhead on old systems.

By testing on a compression sample, I learned that SDFs actually are compressed archives, rather than just encoded or scrambled.

Found information on version 1.1.3: http://www.csie.ntu.edu.tw/~piaip/docs/zlib/
Could either be zlib, gzip, or deflate. Don't think it is deflate; may have checked wrong, but bits don't make sense for deflate codes.

From the 1.1.3 documentation: "This version of the library supports only one compression method (deflation) but other algorithms will be added later and will have the same stream interface. " Need to get my hands on this version of the library.

There are multiple functions, some stream oriented and some that can operate on entire files if memory mapped. Knowing that GC mmaps its entire data directory, these may be the functions used.


Looking back at the file structure, there is a minimum of 8 bytes after the 12 bytes that follow the header. Structure of a SDF with one empty file with one letter name is as follows:

[3 byte magic number] [1 byte version code] [4 byte EOF address] [12 byte general identifiers (identifies number and length of file names)] [2 bytes + 1 byte per every letter in name of file] [4 byte local header for data] [8 byte identifier for data] [compressed data]

***Entrypoint into sdfextract is 00008706  ***
***Compiled with Microsoft Visual C++ 6.0 ***

http://www.csie.ntu.edu.tw/~piaip/docs/zlib/


Strips uppercase filenames to lowercase. Always represented in the SDF as uppercase.

NOTES: in AToP and folderZwithAtoP, the strings all seem to be concatenated then compressed, and there appear to be an offset in both that refers back to offset0x16 (in 0x6A in AtoP, and 0x7D in folder....toP). Need to peer more into the treatment of names

All names concatenated and encoded?

I will update this file in the future, as I stopped updating my notes once I started decompiling, tracing, using debuggers, etc. There is quite a bit more that I still need to write down: since I started decompiling, I have the SDF and unSDF flow and process down, including the key and algorithm used for what appears to be a simple xor cipher. I left off at trying to figure out when (before or after cipher, etc) and how (on what address ranges, etc) deflate was used.

I intend to finish updating this post soon,
-Stat

Return to Ground Control - Mods | Maps

Who is online

Users browsing this forum: No registered users and 1 guest