entropy
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:ent is a small, fast command line utility, plotting various entropy related metrics of files or pipe/stdin streams



#ENT - view string metrics and entropy of arbitrary files
##### *As a command line tool, .NET library or web-application (coming)*

---

**author**: lo sauer 2011-12; www.lsauer.net   
**website**: https://github.com/lsauer/entropy   
**license**: MIT license   
**description**: quickly plot entropy information and string metrics of arbitrary files or 
strings from the console Input/Output   

### Analysing a file:




### Cross-Platform:



### Plots as Vector graphics (default: .svg):

![plot from the 1st screencast](https://lsauer.github.io/res/github/project/entropy/json_testfiles.tar.out-Plot%28SHAN%29.svg "jasonfiles.tar")



### Analysing a twitter-live stream






 * 
	**Windows Usage**: 	


```bash
    > type file1.ext file2.ext file3.ext | ent -b 2.15
    
    > "teststringdata" | ent -b 2.15 -s
```

 *	
	**Linux, Mac Usage**: 

```bash
    $ cat file1.ext file2.ext file3.ext | ent -b 2.15
    
    $ "sometextdata" | ent -b 64 -s
```





###Background


In information theory, *entropy* is a measure of the uncertainty associated with a random variable. Informatically, the term usually refers to the **Shannon entropy**, which quantifies the expected value of the information contained in a message (*a specific instance of the random variable*), in units of bits. (See: [Entropy](http://en.wikipedia.org/wiki/Entropy_(information_theory)) )

A file with high entropy shows few repeated patterns and is typically compressed/optimized.  Such a highly compressed data-stream will feature an entropy index of greater *>5* for a binary-base 2.

###Purpose

The purpose of this tool is quickly analyzing arbitrary files, for instance *biological sequence files* or *serialized JSON data*. Naturally powerful investigation options are granted to researchers in the form of the *[R-statistical language](http://www.r-project.org/ "The R Project for Statistical Computing")* or *[Matlab](http://www.mathworks.com/products/matlab/)* and *[Wolfram Mathematica](http://www.wolfram.com/mathematica/)*.

**Ad hoc investigation** however is much faster with a dedicated command line tool along and the **features the console environment** provides such as **autocompletion** (`pressing Tab`). Virtually **no startup times** are required and plots can be output in web-compatible **vector graphics**.

### Usage
```bash
    Usage: shantropy [ ...1st param!] [-f fromBy] [-t toBy] [-o ] [-h help]
    [-e efficiency] [-m 1,2.. 1st,2nd order markov] [-b base-alphabet]
    [-w width plot] [-h height plot] [-z zoom%] [-fp fileposition]
    [-p plot permutation entropy] [-s  as last param!]
    Press CTRL+C or Q to Quit!
    Press PAGEUP / PAGEDOWN to zoom in or out of the file
    Press LEFT / RIGHT Arrow to navigate to the next or previous file-segment
```

### Parameters
- **-m 0** zero-order Markov source: *default* (pratically identical with Shannon entropy when the log-base is 2)
- **-m 1** first-order Markov source: ...number of linked characters is one
- **-m 2** second-order Markov source e.g. `ent -m 2`
- **-m ``** n order markov source
- **-b ``** "b-ary entropy": a different base can be set with e.g. `ent -b 2,15` , default is 256 for ASCII; use 64 for literature-text
- **-s ``** arbitrary string passing: *-s* must be passed as the last argument!
- **-w ``** width of the plot
- **-h ``** height of the plot
- **-f,-t ``** define a file segment in Byte (from/to). Both are optional
- **-z** file-segment zoom in percent
- **-fp** file-segment position (0-n)
- **-o outfile** plot data to a given file and create or append to the file `outfile`,
 - use `ent .... > myfilen.out` to capture the entire console output
 - use `ent .... > mygraphics.svg` to plot to an svg file
- **-e** ...plot the efficiency of the data
- **-p** ...plot and compute the permutation efficiency

*note:* files have to be passed as first arguments:
To calculate metrics for several files put them in sequence e.g. `ent explain.nfo markdownsharp-20100703-v113.7z -b 3,6`


###Todo: 
- make and use an console argument hash map or struct params
- code cleanup

###Fixes
- slow -> fixed; `Readline loop` was replaced by ReadAll; up to 200x speedup
- navigation of the file for chunked data processing
- incorrect results -> fixed: for text files set -b 64, to get meaninful results

###Example
Example for a typical info (*.nfo*) file:
the ordinate(y-axis) shows the entropy and the abscisse (x-axis) shows the file-segment position in percent

*result:* the text is highly compressible and clearly shows structuring

```
0,60 |                                  ▓▓▓▓▓▓▓▓▓      ▓▓▓▓▓
0,54 |                                  ▓▓▓▓▓▓▓▓▓    ▓▓▓▓▓▓▓    ▓▓
0,48 |                                  ▓▓▓▓▓▓▓▓▓    ▓▓▓▓▓▓▓    ▓▓▓
0,42 |           ▓    ▓▓  ▓             ▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓    ▓▓▓
0,36 |           ▓    ▓▓▓▓▓    ▓        ▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓    ▓▓▓
0,30 |           ▓▓   ▓▓▓▓▓    ▓ ▓      ▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓    ▓▓▓
0,24 | ▓       ▓ ▓▓ ▓ ▓▓▓▓▓  ▓ ▓ ▓   ▓ ▓▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓ ▓  ▓▓▓
0,18 | ▓       ▓▓▓▓ ▓ ▓▓▓▓▓  ▓▓▓ ▓▓  ▓ ▓▓▓▓▓▓▓▓▓▓▓ ▓ ▓▓▓▓▓▓▓ ▓  ▓▓▓
0,12 | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓ ▓ ▓▓▓▓▓▓▓ ▓  ▓▓▓
0,06 | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓ ▓▓ ▓▓▓
0,00 | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓ ▓▓▓▓▓▓
------------------------------------------------------------------
     0%        16%        33%        50%        66%        83%
```

###Useful links: 
- https://losauer.blogspot.com/2011/09/ent-plotting-entropy-and-computing.html
- http://www.tiem.utk.edu/~mbeals/shannonDI.html
- http://books.google.com/books?id=IxrjpbNH2XAC&pg=PA17&lpg=PA17&dq=second+order+markov+source

###Case studies:
- https://losauer.blogspot.com/2011/10/google-chrome-session-restore-web.html
- https://losauer.blogspot.com/2012/06/permutation-entropy-plot-added-to-ent.html
- https://losauer.blogspot.com/2012/06/linkedin-lastfm-eharmony-analysis-of.html


Fork it on github: https://github.com/lsauer/entropy

Have fun! In fact don't use this program for anything else yet...

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。