7,374 research outputs found

    A Factoid Question Answering System for Vietnamese

    Full text link
    In this paper, we describe the development of an end-to-end factoid question answering system for the Vietnamese language. This system combines both statistical models and ontology-based methods in a chain of processing modules to provide high-quality mappings from natural language text to entities. We present the challenges in the development of such an intelligent user interface for an isolating language like Vietnamese and show that techniques developed for inflectional languages cannot be applied "as is". Our question answering system can answer a wide range of general knowledge questions with promising accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference Companion, Lyon, Franc

    Modal Markers in Japanese: A Study of Learners’ Use before and after Study Abroad

    Get PDF
    Japanese discourse requires speakers to index, in a relatively explicit manner, their stance toward the propositional information as well as the hearer. This is done, among other things, by means of a grammaticalized set of modal markers. Although previous research suggests that the use of modal expressions by second language learners differs from that of native users, little is known about “typical” native or non-native behavior. This study aims (a) to delineate native and non-native usage by a quantitative examination of a broad range of Japanese modal categories, and qualitative analyses of a subset of potentially problematic categories among them, and (b) to identify possible developmental trajectories, by means of a longitudinal observation of learners’ verbal production before and after study abroad in Japan. We find that modal categories realized by non- transparent or non-salient markers (e.g., explanatory modality no da, or utterance modality sentence-final particles) pose particular challenges in spite of their relatively high availability in the input, and we discuss this finding in terms of processing constraints that arguably affect learners’ acquisition of the grammaticalized modal markers

    The Role of Semantic, Pragmatic, and Discourse Factors in the Development of Case

    Get PDF

    Exploring the interplay of mode of discourse and proficiency level in ESL writing performance

    Get PDF
    Recent theory in discourse and practice in rhetoric has suggested that writers require different skills and strategies when writing for different purposes, and in using different genres and modes (Kinneavy, 1972; Carrell and Connor, 1991) in writing. The importance of taking into account these various aspectual skills and forms of writing is recognised in teaching (e.g. Scarcella and Oxford, 1992), and in the assessment of writing (e.g. Odell and Cooper, 1980). For instance, Odell and Cooper argued that any claims about writing ability cannot be made until students’ performance on a variety of writing tasks has been examined. Thus, the issue of what writing task(s) are to be included in a test is crucial, since a task will be regarded as useless if it does not provide the basis for making generalisations regarding an individual’s writing ability. This paper presents the findings of a study on the effects of mode of discourse on L2 writing performance as well as the interplay between learner variable, namely, proficiency level and task variable, mode of discourse amongst Malaysian upper secondary ESL learners. The findings provide some evidence for the need to re-examine issues of reliability and validity in test practice of manipulating variables in the design of assessment tasks to evaluate ESL writing performance. Given the status and complexity of the writing skill, it stands to reason that studies into this area will continue to shed light onto how best the construct can be understood, taught and tested to give a fair chance for language learners to exhibit their true ability and be reliably reported on

    Just Bring It:a case study on code-switching in Japanese contemporary hard rock lyrics

    Get PDF
    Abstract. This thesis studies language contact phenomena, and particularly written code-switching and code-mixing in contemporary Japanese hard rock music lyrics. The data of this mixed method case study was drawn from Japanese all-female rock band Band-Maid’s album Just Bring It (2017). The album conveys the English pervasion in Japanese popular music quite extensively, as it has substantial language contact, showcases a plethora of artistic liberties, and contains lyrics written by both the band members and third-party lyricists. The lyrical data was collected into a corpus and first examined for how code-switching differs between the lyricists, how the lyrics follow English rock lexicon, and how orthographical experiments occur in the song titles. In the second part of the analysis, the corpus was analyzed to find if the data had intrasentential (code-mixing) and intersentential (code-switching) occurrences, grammar issues and code-switching effects, such as repetition, doubling or code ambiguity. The study found out that code-switching was very prevalent; longer code-switching via clauses and sentences was more common than short one word or phrase mixing. Moreover, repetition was very prevalent, whereas other code-switching effects occurred rather rarely. Band members used longer code-switches, whereas featured lyricists used shorter, but more creative solutions. In general, in the context of this study, used English signals cosmopolitan prestige and ambitions of international success to Japanese listeners, but also serves and includes the English-speaking listeners. After compiling all the findings, one could argue that Roman letters and English have established significance in the Japanese contemporary hard rock music.Just Bring It : tapaustutkimus koodinvaihdosta tämän päivän japanilaisessa hard rock -lyriikassa. Tiivistelmä. Tämä kandidaatintutkielma tutkii kielikontakteja nykypäivän japanilaisessa hard rock -lyriikassa. Tässä laadullisia ja määrällisiä tutkimusmenetelmiä yhdistelevässä tutkielmassa keskitytään ensisijaisesti japanin ja englannin kielten väliseen, kirjoitettuun koodinvaihtoon ja koodinsekoitukseen, ja sen aineistona on käytetty Band-Maid -yhtyeen albumia Just Bring It (2017). Albumi kuvastaa hyvin englannin imeytymistä japanilaiseen populaarikulttuuriin: pääsääntöisestä japaninkielisyydestään huolimatta se sisältää koodinvaihtoa lähes kaikissa kappaleissa, vierailevia sanoittajia, ja taiteellisia vapauksia englannin kielen käytössä. Tutkielman analyysin tueksi albumin sanoitukset kerättiin yhdeksi määrälliseksi aineistorungoksi. Analyysin ensimmäisessä osassa selvitetään eri sanoittajien koodinvaihdon eroja, etsitään yhtäläisyyksiä yleisen englanninkielisen rock-sanaston ja kerätyn lyyrisen aineiston väliltä sekä analysoidaan kappaleiden nimissä käytettyjä kieliopillisia poikkeavuuksia. Analyysin toisessa osassa tutkitaan lauseiden välisten ja lauseiden sisäisten koodinvaihtojen suhdetta, kieliopillisia ongelmia sekä koodinvaihdon vaikutuksia, kuten toistorakenteita ja monitulkintaisuutta. Analyysissä selvisi, että koodinvaihto japanin ja englannin välillä oli erittäin yleistä. Pitkiä, lauseiden välistä koodinvaihtoa löytyi lauseiden sisäisiä, lyhyitä vaihtoja useammin. Lisäksi toistorakenteet olivat varsin yleisiä, toisin kuin monitulkintaiset koodinvaihdot, joita esiintyi vain kahdesti aineistossa. Yhtyeen jäsenten kirjoittamat lyriikat sisälsivät enemmän pitkiä koodinvaihtoja, kun taas vierailevien sanoittajien lyriikoista löytyi luovempia ja lyhyempiä koodinsekoituksia. Kaiken kaikkiaan lyriikoissa käytetty englanti kuvastaa yhtäältä kansainvälisyyden tavoittelua ja toisaalta inklusiivisuutta. Tulosten perusteella voidaankin alustavasti väittää, että englannin kieli näyttelee suurta osaa japanilaisessa, kansainvälisyyteen tähtäävässä nykyrockissa

    Learning Sentence-internal Temporal Relations

    Get PDF
    In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects

    Parameter Learning of Logic Programs for Symbolic-Statistical Modeling

    Full text link
    We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. definite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, that runs for a class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have been developed independently in each research field. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can significantly outperform the Inside-Outside algorithm

    Enumeration as a means of financial markets data representation

    Get PDF
    Cognitive linguistics has developed into one of the most attractive and influential frameworks within linguistics at large. And it draws the linguists’ attention to the study of aspects of language within cognitive paradigm as reflections of conceptual organization and categorization principles, as well as the ways of presenting knowledge about the world. This paper focuses on the cognitive analysis of enumeration as an instrument for organizing and conveying domain-specific information. The aim of the analysis is to establish the role of enumeration as a way of arriving at true knowledge of the world. The tasks of the research consist in the description of enumeration as a means of knowledge representation, as well as the study of the peculiarities of enumeration as a means of representation of financial markets information. The novelty of the analysis is defined by the study of enumeration as a means of representation of financial markets information. The theoretical value of the analysis is reasoned with the study of enumeration as a way of organizing knowledge that reflects collective experience of the community within a specific environment. The results of the study of enumeration in economic discourse show that it serves to structure and represent domain-specific knowledge