initial rpm release (#1075662)
TextCat is an implementation of the text categorization algorithm presented in Cavnar, W. B. and J. M. Trenkle, "N-Gram-Based Text Categorization". TextCat uses this the technique to implement a written language identification. At the moment, it knows about 69 natural languages (counting Esperanto as a natural language).
Testing is quite easy: Take a sample text in some language with a few sentences and save it as plain text. Invoke textcat $yourtext
and it should give you the name of the language the text is written in to stdout. If it doesn't know the language you will get message about, too. If there are different possibilities of languages to will give you the list of possible languages concaternated by 'or'.
Please login to add feedback.
This update has been submitted for testing by besser82.
does what it promises :)
Works fine
This update is currently being pushed to the Fedora EPEL 5 testing updates repository.
This update is currently being pushed to the Fedora EPEL 5 testing updates repository.
This update is currently being pushed to the Fedora EPEL 5 testing updates repository.
This update has been pushed to testing
Works
This update has reached 14 days in testing and can be pushed to stable now if the maintainer wishes
This update has been submitted for stable by besser82.
This update is currently being pushed to the Fedora EPEL 5 stable updates repository.
This update has been pushed to stable