Long integers in BT are sometimes saved in big and sometimes in little endian format (see [http://en.wikipedia.org/wiki/Endianness]_). In big endian format the most significant byte comes first, e.g. 01 02 5E AE means 0x01<<24 + 0x02<<16 + 0x5E<<8 + 0xAE in big endian and 0x01 + 0x02<<8 + 0x5E<<16 + 0xAE<<24 in little endian. Big endian is what humans are wont to read and what is used on PCs. In BT usage is a bit arbitrary, e.g. in the MSDOS version the file dpics0 starts with 00 00 56 58 57 5a 01 00, i.e. the first long is big endian and the second one little endian, while dpics1 starts with 00 00 56 58 00 01 6a a2, where both longs are big endian.
In BT often the following strategy works: decode the long value as big endian and as little endian and take the smaller of both. Only in cases where you expect the values to be really large, specify endianness explicitly. Code example:
def read_long_big(byte_arr, offset):
return b[0] << 24 | b[1] << 16 | b[2] << 8 | b[3]
def read_long_little(byte_arr, offset):
return b[0] | b[1] << 8 | b[2] << 16 | b[3] << 14
def read_long(byte_arr, offset, endian=GUESS):
return min(read_long_big(byte_arr, offset),
read_long_little(byte_arr, offset)
Compression in BT is mostly done with Huffman encoding [http://en.wikipedia.org/wiki/Huffman_coding]_. Any chunk of data that is Huffman encoded, has the following structure:
of decompressed data * a long (4 bytes, big or little endian) describing the number of `bits` of compressed data including the Huffman tree * the Huffman tree * the compressed data
The Huffman tree and the compressed data must be read bit by bit always started with the highest bit of each byte. Both are directly adjacent.
Note
Description of Huffman encoding goes here
Description of indexed files goes here
In BT1: