Sunday, 19 March 2017

EntropyGrapher - file visualization

Since I have finally found a bit of free time I was able to develop a project that was on my mind for quite some time now.
The idea was simple, write some piece of code that would let me explore entropy of the contents of a file. I've written a little snippet, that yielded this:
With the help of matplotlib, creating such image is a matter of a few lines in python. I've used some color coding for defined entropy ranges to make the image more readable.
Nevertheless I wasn't really happy with the readability I got. One can't really tell which sections are which, is this even a binary file?
It turned out that calculating and drawing graphs based on entropy tells actually lot less than just straight up drawing a file as it is. So that's exactly what I did!
Here are some examples:



In all of the above we should be able to easily identify code(high entropy) and strings(green).

We can confirm that with circle of colors, as we can see green has a lot of coverage in printable characters range(48-126).

mkdir.png -> first of the images above

.NET dll packed with ConfuserEx 1.0 (we can see similar entropy to the one in the png file, almost random at the beginning)
After unpacking the binary and deobfuscating strings, code lost some entropy, and strings are now in ASCII range(green color)

As you can see those can tell us A LOT more than just entropy graphs, the truth is that sometimes the image can get a bit large, but it's nothing that we can't control.
Creating images is really simple process. We just iterate through the whole file, use each byte as a H input to the HSV color scheme(S=V=1), and output that to the image file with the help of the Pillow python library.
Another thing that I've tried to visualize was output of the movfuscator. Expecting to see some dots, maybe straight lines, which would represent mov instructions with the same opcodes, or very close ones. That's what I was able to find, in addition to that, there were some areas which I couldn't really identify:

Full Image
Later I found out what's the purpose of this beautiful mosaic is. Do you know what is it? :)

In the meantime of having fun with this project, I realized that when working with images of sizes like 256x20000 pixels, some of the programs have hard time displaying them. The default Ubuntu image viewer(eog) crashes when trying to display that png. The reason being that it wants to allocate really big amounts of memory to display the image, and ends up not getting the memory it wants from the OS(reported as a bug). Viewnior was able to handle the images properly.

There are some other instances where such big pictures, even when really small in file size, can cause some trouble. Messenger browser crashes on some of them consistently.

Project sources:

→ python3 -h
usage: [-h] [-c CHK_SIZE] [-o OUTPUT] [-e | -i] filename

Simple tool for file visualization

positional arguments:
  filename              name of the file to analyze

optional arguments:
  -h, --help            show this help message and exit
  -c CHK_SIZE, --chk_size CHK_SIZE
  -o OUTPUT, --output OUTPUT
  -e, --entropy
  -i, --image