Zipfus

Zipf's law distribution histogram on-line internet tool for SEO analysis

This utility can be very useful in SEO analysis to determine keyword distribution in a text.

Just enter URL of HTTP resource you want to create histogram for and submit it.

http://

Input URL of the page like this

e.g. http:// www.domain.com/path/index.html

If URL is a directory you have to add trailing '/' to end of it.

e.g. http:// www.domain.com/path/

It also possible to specify schema. Supported schemas are http://(default) and ftp://

e.g. http:// ftp://www.domain.com/path/

Source page character set will be detected automatically from HTML <meta> tag.

If it is absent or detects incorrect or you want to redefine it, just add one more extra parameter 'cs=character_set' and separate it with a space from an URL. It is principle to put URL first. Source page text will be converted by GNU libiconv library.

e.g. http:// www.domain.com/path/ cs=ISO-8859-1

It provides support for the encodings:

European languages
ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16}, KOI8-R, KOI8-U, KOI8-RU, CP{1250,1251,1252,1253,1254,1257}, CP{850,866}, Mac{Roman,CentralEurope,Iceland,Croatian,Romania}, Mac{Cyrillic,Ukraine,Greek,Turkish}, Macintosh
Semitic languages
ISO-8859-{6,8}, CP{1255,1256}, CP862, Mac{Hebrew,Arabic}
Japanese
EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1
Chinese
EUC-CN, HZ, GBK, GB18030, EUC-TW, BIG5, CP950, BIG5-HKSCS, ISO-2022-CN, ISO-2022-CN-EXT
Korean
EUC-KR, CP949, ISO-2022-KR, JOHAB
Armenian
ARMSCII-8
Georgian
Georgian-Academy, Georgian-PS
Tajik
KOI8-T
Thai
TIS-620, CP874, MacThai
Laotian
MuleLao-1, CP1133
Vietnamese
VISCII, TCVN, CP1258
Platform specifics
HP-ROMAN8, NEXTSTEP
Full Unicode
UTF-8
UCS-2, UCS-2BE, UCS-2LE
UCS-4, UCS-4BE, UCS-4LE
UTF-16, UTF-16BE, UTF-16LE
UTF-32, UTF-32BE, UTF-32LE
UTF-7
C99, JAVA
Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment)
UCS-2-INTERNAL, UCS-4-INTERNAL
Locale dependent, in terms of `char' or `wchar_t' (with machine dependent endianness and alignment, and with OS and locale dependent semantics)
char, wchar_t
The empty encoding name "" is equivalent to "char": it denotes the locale dependent character encoding.

Extra encodings:

European languages
CP{437,737,775,852,853,855,857,858,860,861,863,865,869,1125}
Semitic languages
CP864
Japanese
EUC-JISX0213, Shift_JISX0213, ISO-2022-JP-3
Turkmen
TDS565
Platform specifics
RISCOS-LATIN1

It is also acceptable to enter windows-1252 vs. cp1252

Generated by Zipfus on-line tool at www.insertix.com