The dataset was compiled in 2016 and is based on the Corpus of the Contemporary Lithuanian Language, version tekstynas.vdu.lt (139 m tokens). It contains wordforms (types) with tokens from the monolingual, general language corpus of Lithuanian.
See also: http://hdl.handle.net/20.500.11821/