Search CORE

8 research outputs found

Multilingual Sentence Categorization according to Language

Author: Giguet Emmanuel
Publication venue
Publication date: 01/01/1995
Field of study

In this paper, we describe an approach to sentence categorization which has the originality to be based on natural properties of languages with no training set dependency. The implementation is fast, small, robust and textual errors tolerant. Tested for french, english, spanish and german discrimination, the system gives very interesting results, achieving in one test 99.4% correct assignments on real sentences. The resolution power is based on grammatical words (not the most common words) and alphabet. Having the grammatical words and the alphabet of each language at its disposal, the system computes for each of them its likelihood to be selected. The name of the language having the optimum likelihood will tag the sentence --- but non resolved ambiguities will be maintained. We will discuss the reasons which lead us to use these linguistic facts and present several directions to improve the system's classification performance. Categorization sentences with linguistic properties shows that difficult problems have sometimes simple solutions.Comment: 4 pages --- LaTe

arXiv.org e-Print Archive

HAL - Normandie Université

CiteSeerX

HAL Descartes

Hal-Diderot

Automatic Identification of Close Languages – Case Study: Malay and Indonesian.

Author: Bali Ranaivo-Malançon
Publication venue
Publication date: 01/11/2006
Field of study

Identifying the language of an unknown text is not a new problem but what is new is the task of identifying close languages. Malay and Indonesian as many other language€ are very similar, and therefore it is a real difficulty to search, retrieve, classify, and above all translate texts written in one of the two Identifying the language of an unknown text is not a new problem but what is new is the task of identifying close languages. Malay and Indonesian as many other language€ are very similar, and therefore it is a real difficulty to search, retrieve, classify, and above all translate texts written in one of the two languages

Repository@USM

Language Set Identification in Noisy Synthetic Multilingual Documents

Author: Jauhiainen Heidi Annika
Jauhiainen Tommi Sakari
Linden Krister
Publication venue: Springer International Publishing AG
Publication date: 01/01/2015
Field of study

Proceeding volume: Part IIn this paper, we reconsider the problem of language identification of multilingual documents. Automated language identification algorithms have been improving steadily from the seventies until recent years. The current state-of-the-art language identifiers are quite efficient even with only a few characters and this gives us enough reason to again evaluate the possibility to use existing language identifiers for monolingual text to detect the language set of a multilingual document. We are using a previously developed language identifier for monolingual documents with the multilingual documents from the WikipediaMulti dataset published in a recent study. Our method outperforms previous methods tested with the same data, achieving an F 1-score of 97.6 when classifying between 44 languages.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Writer identify, self-other relations and writing strategies in the narratives of Nigerian second language learners

Author: Bello Mohammed
Publication venue
Publication date: 01/01/2019
Field of study

Language use is important in understanding the identity of the author in a written discourse. Given that the author produces the writing for the other, an in-depth understanding of how students display identities to relate to others in writing is needed. Previous studies mainly examined how identity is displayed in spoken discourse. However, little is known about the construction of writer‟s identity and that of the other in second language writing among Nigerian students. Thus, this study explores how public secondary school students in Nigeria narrate their junior year experiences. It also examines the strategies used in constructing ideas and connections with the other in writing. The study seeks to understand how identity and self-other relations are embedded in the students‟ written language as they communicate their ideas. Data were collected using purposive sampling comprising forty-five students‟ written narratives, transcriptions of in-depth interviews, notes from classroom observations and interpreted through a discourse analytical approach. The findings reveal varied ways the self relates to others in writing, that is, through loan words, nouns, pronouns and words that convey appreciation, salutation, and care in their narratives. The study creates awareness of not only self as the writer but also about the relations with the other and the strategies in writing. It contributes to the understanding of writer identity, the strategies used and the relevance of others in writing. It also reinforces the need for educators to give attention to second language writing ability and see writing as not only the product. Rather, writing speaks volumes about the author‟s voice which relates to identity, culture and social background. Future research should explore how identity and self-other construction are reflected in other written genres, specifically, in academic writing, given the cultural identity of the author, style, positioning and knowledge about writing in English

Universiti Utara Malaysia: UUM eTheses

Argumentative zoning information extraction from scientific text

Author: Teufel Simone
Publication venue: The University of Edinburgh
Publication date: 01/01/1999
Field of study

Let me tell you, writing a thesis is not always a barrel of laughs—and strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope

CiteSeerX

Edinburgh Research Archive

Multilingual Sentence Categorization according to Language

Author: Giguet Emmanuel
Publication venue: HAL CCSD
Publication date: 27/03/1995
Field of study

International audienceIssues in sentence categorization according to language is fundamental for NLP, especially in document processing. In fact, with the growing amount of multilingual text corpus data becoming available, sentence categorization, leading to multilingual text structure, opens a wide range of applications in multilingual text analysis such as information retrieval or preprocessing of multilingual syntactic parser

Hal-Diderot

Multilingual Sentence Categorization according to Language

Author: Emmanuel Giguet
Publication venue
Publication date
Field of study

Issues in sentence categorization according to language is fundamental for NLP, especially in document processing. In fact, with the growing amount of multilingual text corpus data becoming available, sentence categorization, leading to multilingual text structure, opens a wide range of applications in multilingual text analysis such as information retrieval or preprocessing of multilingual syntactic parser. The major difficulties in sentence categorization are convergence and textual errors. Convergence since dealing with short entries involve discarding languages from few clues. Textual errors since documents coming from different electronic ways may contain spelling and grammatical errors as well as character recognition errors generated by OCR. We describe here an approach to sentence categorization which has the originality to be based on natural properties of languages with no training set dependency. The implementation is fast, small, robust and textual errors tolerant. Tested f..

CiteSeerX