How do I find out what archive format is being used ?

Joshua5
Private
Posts: 1
Joined: Mon 22 Apr 2019 08:52
Contact:

How do I find out what archive format is being used ?

Postby Joshua5 » Mon 22 Apr 2019 08:55

I'm doing a hobby project on some game data files. I would like to edit some things in them and repackage them so the game accepts the modifications.

The directories themselves were archived in a proprietary format which was easy enough to open up. The files were compressed with zlib. Now I'm stumped, because it seems there is still (at least) one more layer of archiving. The files seem to be serialized, but looking up the most common obvious answers didn't pan out. Google wasn't helpful. I didn't find any magic bytes (doesn't mean there aren't any, I just didn't find any). How do I find out what the serialization format is, if it is commercial? If it is not, how should I approach the problem?

A little background:

the file is read by a Visual C++ application on Windows
I believe the file pre-serialization was XML-like
I've decompiled the .exe, trying to step the process while data files were being read didn't work out (it reads in 7Gb of data, I couldn't locate the start of the file type I wanted to work with). Fishing for helpful strings didn't work out either.
I've tried comparing to Python pickle, marshal, VC++ MFC marshal and various archiving program formats. No luck.
Distinctive features of the serialized files:

The file end has a Table of Contents of some sort. Looks like this:

TOC0 4 bytes of offset 4 bytes of length OBJE 8 bytes of offset 8 bytes of length

and so on. The other headings in the TOC are TOPO, CHNK, CLAS, PROP, STRG, TRAN, IMPR and EXPR all followed by offset and length. Offset and length values are big-endian.

The file itself seems to be either type-length-value encoded (human-readable strings falling under the CLAS heading) or type-different type-value in 4 byte chunks. There are 4 byte blocks like AA AA AA AA, AB AB AB AB or BB BB BB BB which probably work as delimiters.

There are long parts of data where nothing changes except one byte is increased by 1. Looks like an index of sorts.

The file data may contain various data types.

I had the chance to compare two different versions of the data files. Changing int values in the unserialized file lead to very small changes in the serialized file (typically one number changed in the original lead to one hex value being changed in the resulting file).

The format is extremely space inefficient. Most everything is in 4-byte chunks and the file is compressible by a factor of 10. This and human readability of strings have lead me to believe the file is not compressed or encrypted in any way. It's just serialized somehow.

Any help is greatly appreciated. :lol: :lol: :lol:

User avatar
Mike
More than 10 000 messages. Soldier you are the leader of all armies!
Posts: 12403
Joined: Thu 20 Feb 2014 01:09
Location: Virginia, United States of America
Contact:

Re: How do I find out what archive format is being used ?

Postby Mike » Sat 1 Jun 2019 14:33

So you're trying to mod a game?
Image
Courtesy of KattiValk

Return to “Off-Topic”

Who is online

Users browsing this forum: No registered users and 13 guests