481 research outputs found
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
How effective is stemming and decompounding for German text retrieval?
Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch
The head-modifier principle and multilingual term extraction
Advances in Language Engineering may be dependent on theoretical principles originating from linguistics since both share a common object of enquiry, natural language structures. We outline an approach to term extraction that rests on theoretical claims about the structure of words. We use the structural properties of compound words to specifically elicit the sets of terms defined by type hierarchies such as hyponymy and meronymy. The theoretical claims revolve around the head-modifier principle which determines the formation of a major class of compounds. Significantly it has been suggested that the principle operates in languages other than English. To demonstrate the extendibility of our approach beyond English, we present a case study of term extraction in Chinese, a language whose written form is the vehicle of communication for over 1.3 billion language users, and therefore has great significance for the development of language engineering technologies
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008
Theoretical and empirical arguments for the reassessment of the notion of paradigm
The volume discusses the breadth of applications for an extended notion of paradigm. Paradigms in this sense are not only tools of morphological description but constitute the inherent structure of grammar. Grammatical paradigms are structural sets forming holistic, semiotic structures with an informational value of their own. We argue that as such, paradigms are a part of speaker knowledge and provide necessary structuring for grammaticalization processes. The papers discuss theoretical as well as conceptual questions and explore different domains of grammatical phenomena, ranging from grammaticalization, morphology, and cognitive semantics to modality, aiming to illustrate what the concept of grammatical paradigms can and cannot (yet) explain
Valency over Time
The papers collected in this book are devoted to verbal valency, and share a diachronic perspective, by either discussing changes in the behavior of verbs or discussing verbal valency at different historical stages of specific languages. They provide new data for research on valency patterns and on changes in valency orientation, verbal voice, and related constructions
- …