Tokenizers
prose3.1K

Library for text processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. English only.

gse2.6K

Go efficient text segmentation; support english, chinese, japanese and other.

gojieba2.5K

This is a Go implementation of jieba which a Chinese word splitting algorithm.

sentences442

Sentence tokenizer: converts text into a list of sentences.

segment89

Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29

textcat73

Go package for n-gram based text categorization, with support for utf-8 and raw text.

MMSEGO62

This is a GO implementation of MMSEG which a Chinese word splitting algorithm.

stemmer53

Stemmer packages for Go programming language. Includes English and German stemmers.

gotokenizer21

A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation)

shamoji13

The shamoji is word filtering package written in Go.