## Thursday, November 24, 2016

### Entropy at a glance

In a hurry? Skip straight to the C# source code - EntropyGlance; Entropy at a glance - A C# WinForms project - https://github.com/AdamRakaska/EntropyGlance

So I wrote an file entropy analysis tool for my friend, who works in infosec. Here it is, hands-down the coolest feature this tool offers is a System.Windows.Forms.DataVisualization.Charting visualization that graphs how the entropy changes across a whole file:

This application provides both Shannon (data) entropy and entropy as a compression ratio.
Get a more intuitive feel for the overall entropy at a glance with by visualizing both measures of entropy as a percentage of a progress bar, instead of just numbers.

However, for those who love numbers, standard measures of entropy are also given as well. Information entropy is expressed both as the quantity of bits/byte (on a range from 0 to 8), and as the 'normalized' value (range 0 to 1). High entropy means it the data is random-looking, like encrypted or compressed information.
The Shannon 'specific' entropy calculation makes no assumptions about the type of message it is measuring. What this means is that while a message consisting of only 2 symbols will get a very low entropy score of 0.9/8, a message of 52 symbols (the alphabet, as lower case first, then upper) repeated in the same sequence one hundred times would be yield a higher-than-average score of 6/8.
This is precisely why I included a compression ratio as a ranking of entropy that is much closer to notion of entropy that takes into account repeated patterns or predictable sequences, in the sense of Shannon's source coding theorem.

Dive deep into the symbol distribution and analysis. This screen gives you the per-symbol entropy value and the ability to sort by rank, symbol, ASCII value, count, entropy, and hex value:

As always, the C# source code is being provided, hosted on my GitHub:
EntropyGlance; Entropy at a glance - A C# WinForms project - https://github.com/AdamRakaska/EntropyGlance