481 research outputs found

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

    How effective is stemming and decompounding for German text retrieval?

    Get PDF
    Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch

    The head-modifier principle and multilingual term extraction

    Get PDF
    Advances in Language Engineering may be dependent on theoretical principles originating from linguistics since both share a common object of enquiry, natural language structures. We outline an approach to term extraction that rests on theoretical claims about the structure of words. We use the structural properties of compound words to specifically elicit the sets of terms defined by type hierarchies such as hyponymy and meronymy. The theoretical claims revolve around the head-modifier principle which determines the formation of a major class of compounds. Significantly it has been suggested that the principle operates in languages other than English. To demonstrate the extendibility of our approach beyond English, we present a case study of term extraction in Chinese, a language whose written form is the vehicle of communication for over 1.3 billion language users, and therefore has great significance for the development of language engineering technologies

    A grammar of Tawala : an Austronesian language of the Milne Bay area, Papua New Guinea

    Get PDF

    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008

    Theoretical and empirical arguments for the reassessment of the notion of paradigm

    Get PDF
    The volume discusses the breadth of applications for an extended notion of paradigm. Paradigms in this sense are not only tools of morphological description but constitute the inherent structure of grammar. Grammatical paradigms are structural sets forming holistic, semiotic structures with an informational value of their own. We argue that as such, paradigms are a part of speaker knowledge and provide necessary structuring for grammaticalization processes. The papers discuss theoretical as well as conceptual questions and explore different domains of grammatical phenomena, ranging from grammaticalization, morphology, and cognitive semantics to modality, aiming to illustrate what the concept of grammatical paradigms can and cannot (yet) explain

    Valency over Time

    Get PDF
    The papers collected in this book are devoted to verbal valency, and share a diachronic perspective, by either discussing changes in the behavior of verbs or discussing verbal valency at different historical stages of specific languages. They provide new data for research on valency patterns and on changes in valency orientation, verbal voice, and related constructions
    • …
    corecore