This utility can be very useful in SEO analysis to determine keyword distribution in a text.
Just enter URL of HTTP resource you want to create histogram for and submit it.
Input URL of the page like this
e.g. http:// www.domain.com/path/index.html
If URL is a directory you have to add trailing '/' to end of it.
e.g. http:// www.domain.com/path/
It also possible to specify schema. Supported schemas are http://(default) and ftp://
e.g. http:// ftp://www.domain.com/path/
Source page character set will be detected automatically from HTML <meta> tag.
If it is absent or detects incorrect or you want to redefine it,
just add one more extra parameter 'cs=character_set' and separate it with a space from an URL.
It is principle to put URL first. Source page text will be converted by
GNU libiconv library.
e.g. http:// www.domain.com/path/ cs=ISO-8859-1
It provides support for the encodings:
- European languages
- ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16},
KOI8-R, KOI8-U, KOI8-RU,
CP{1250,1251,1252,1253,1254,1257}, CP{850,866},
Mac{Roman,CentralEurope,Iceland,Croatian,Romania},
Mac{Cyrillic,Ukraine,Greek,Turkish},
Macintosh
- Semitic languages
- ISO-8859-{6,8}, CP{1255,1256}, CP862, Mac{Hebrew,Arabic}
- Japanese
- EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1
- Chinese
- EUC-CN, HZ, GBK, GB18030, EUC-TW, BIG5, CP950, BIG5-HKSCS,
ISO-2022-CN, ISO-2022-CN-EXT
- Korean
- EUC-KR, CP949, ISO-2022-KR, JOHAB
- Armenian
- ARMSCII-8
- Georgian
- Georgian-Academy, Georgian-PS
- Tajik
- KOI8-T
- Thai
- TIS-620, CP874, MacThai
- Laotian
- MuleLao-1, CP1133
- Vietnamese
- VISCII, TCVN, CP1258
- Platform specifics
- HP-ROMAN8, NEXTSTEP
- Full Unicode
- UTF-8
UCS-2, UCS-2BE, UCS-2LE
UCS-4, UCS-4BE, UCS-4LE
UTF-16, UTF-16BE, UTF-16LE
UTF-32, UTF-32BE, UTF-32LE
UTF-7
C99, JAVA
- Full Unicode, in terms of
uint16_t or uint32_t
(with machine dependent endianness and alignment)
- UCS-2-INTERNAL, UCS-4-INTERNAL
- Locale dependent, in terms of `char' or `wchar_t'
(with machine dependent endianness and alignment, and with OS and
locale dependent semantics)
- char, wchar_t
The empty encoding name "" is equivalent to "char": it denotes the
locale dependent character encoding.
Extra encodings:
- European languages
- CP{437,737,775,852,853,855,857,858,860,861,863,865,869,1125}
- Semitic languages
- CP864
- Japanese
- EUC-JISX0213, Shift_JISX0213, ISO-2022-JP-3
- Turkmen
- TDS565
- Platform specifics
- RISCOS-LATIN1
It is also acceptable to enter windows-1252 vs. cp1252