tool_binary_compressor
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:A tool for compressing binaries
This directory contains a tool to compress an executable XCore binary by 20-25%.

Quickstart
----------

To compile::

   make

To compress an example binary::

   make decompress.xe

To compress your own binary::

   xobjdump --split --strip file.xe
   ./compressor -b 0x10000 -t 0x18000 image_n0c0.bin -o decompress.xe -target=XK-1

This command creates a file ``decompress.xe`` that contains an executable
that will execute at address 0x10000. This executable will create the
binary of ``image_n0c0.bin`` (from ``file.xe`` using xobjdump) at address
0x18000, and then jump to address 0x18000 to execute ``file.xe``. You can
use this to, for example, store a compressed program in OTP.

The detail
----------

The program is inspired by the liblzg project by Marcus Geelnard,
. The compression algorithm is slightly
simplified, and has been written from scratch to make sure that the
decompression is be space-efficient on XMOS processors; not necessarily
speed-efficient. The decompressor program is generated by the compressor
(typically around 120 bytes), together with the compressed binary as a
single assembly file.

The compressed binary is a sequence of bytes that are mostly the original
binary. Where sequences are repeated in the original binary, these sequences are
replaced with a command in the compressed binary to copy the sequence from an
earlier spot. There are three commands to achieve this:

* Command 0: copies a long sequence over a short distance

* Command 1: copies a short sequence over a medium distance

* Command 2: copies a long sequence over a long distance

Command 0 comprises a marker symbol (called marker0), and a byte that
contains the length of the sequence encoded in the top few bits, and the
distance encoded in the bottom few bits. The length encoding subtracts 3
(so bits '000000' means that the length of the sequence is 3, and bits
'000110' means that the length of the sequence is 9), and the distance
encoding divides by 2 (so bits '01' means that the sequence is 2 bytes in
the past, and bits '11' means that the sequence is 6 bytes in the past).
The number of bits for length and distance is determined at compression
time, and encoded in the decompressor binary.

Command 1 is very similar to Command 0, but uses a different trade-off in
the number of bits allocated to length and distance. It comprises a marker symbol
(called marker1), and a byte that contains the length of the sequence
encoded in the top few bits, and the distance encoded in the bottom few
bits. The length encoding subtracts 3 (so bits '00' means that the length
of the sequence is 3, and bits '10' means that the length of the sequence
is 5), and the distance encoding divides by 2 and subtracts the maximum
distance encoding for Command 1 (so, assuming that command 0 can span 6
bytes, bits '000001' means that the sequence is 8 bytes in the past, and
bits '000011' means that the sequence is 12 bytes in the past). As with
command 0, the number of bits for length and distance is determined at
compression time, and encoded in the decompressor binary.

Command 2 comprises a byte with the value marker2, and two bytes for
encoding length and offset. The
second byte is the LSB of the distance. The first byte encodes the most
significant bits of the distance and the
length. The encodings are as in
Command 1, and the split is again determined at compression time.

The values for marker0, marker1, and marker2 are also established at
compression time, and are either values that do not occur in the binary, or
otherwise values that occur infrequently. If any of the marker values occur,
then the marker symbol followed by value '0' is an escape to insert the
marker symbol.

The compressor
--------------

The compression engine tries all sensible bit splits in Command 0, 1, and
2, and searches backwards through the whole binary for longest matching
lengths. Once the compression engine has
explored the space it uses the best bit splits for Command 0, 1, and 2, and
generates a compressed binary. It then creates an assembly file
containing the compressor with the correct marker-values, bit splits, and
compressed binary. The binary is padded with an extra '0' byte if the
number of bytes is an odd number.

The assembly file is then compiled into a ``.xe`` binary, and this binary
is executed using the simulator to verify that it produces the expected
original binary. Any errors are reported.

The process takes a few seconds.

The decompressor
----------------

The decompressor is a piece of assembly code that interprets the binary.
The target address, marker values, and bit splits are all hard coded in the
assembly code.

The decompressor is position independent and can execute at any address.
The target address is the address at which the uncompressed code expects to run.


Further optimisations
---------------------

* A fourth marker could be used to occupy the medium sequences over a medium
  distance - this adds 16 bytes or so the the encoder, and saves 60 bytes or
  so in the binary.

* The compressor could try 1 or more markers, each with 1 or 2 bytes
  sequences, and pick the shortest sequence (including the code for
  decoding, which grows with more markers)

* A test could be made whether any markers occur in the binary - if not,
  the test for escape characters can be omitted.

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。