initial rpm release (#1075662)
TextCat is an implementation of the text categorization algorithm presented in Cavnar, W. B. and J. M. Trenkle, "N-Gram-Based Text Categorization". TextCat uses this the technique to implement a written language identification. At the moment, it knows about 69 natural languages (counting Esperanto as a natural language).
Testing is quite easy: Take a sample text in some language with a few sentences and save it as plain text. Invoke
textcat $yourtext and it should give you the name of the language the text is written in to stdout. If it doesn't know the language you will get message about, too. If there are different possibilities of languages to will give you the list of possible languages concaternated by 'or'.
Please login to add feedback.