7,374 research outputs found
A Factoid Question Answering System for Vietnamese
In this paper, we describe the development of an end-to-end factoid question
answering system for the Vietnamese language. This system combines both
statistical models and ontology-based methods in a chain of processing modules
to provide high-quality mappings from natural language text to entities. We
present the challenges in the development of such an intelligent user interface
for an isolating language like Vietnamese and show that techniques developed
for inflectional languages cannot be applied "as is". Our question answering
system can answer a wide range of general knowledge questions with promising
accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference
Companion, Lyon, Franc
Modal Markers in Japanese: A Study of Learners’ Use before and after Study Abroad
Japanese discourse requires speakers to index, in a relatively explicit manner, their stance toward the propositional information as well as the hearer. This is done, among other things, by means of a grammaticalized set of modal markers. Although previous research suggests that the use of modal expressions by second language learners differs from that of native users, little is known about “typical” native or non-native behavior. This study aims (a) to delineate native and non-native usage by a quantitative examination of a broad range of Japanese modal categories, and qualitative analyses of a subset of potentially problematic categories among them, and (b) to identify possible developmental trajectories, by means of a longitudinal observation of learners’ verbal production before and after study abroad in Japan. We find that modal categories realized by non- transparent or non-salient markers (e.g., explanatory modality no da, or utterance modality sentence-final particles) pose particular challenges in spite of their relatively high availability in the input, and we discuss this finding in terms of processing constraints that arguably affect learners’ acquisition of the grammaticalized modal markers
Exploring the interplay of mode of discourse and proficiency level in ESL writing performance
Recent theory in discourse and practice in rhetoric has suggested that writers require different skills and strategies when writing for different purposes, and in using different genres and modes (Kinneavy, 1972; Carrell and Connor, 1991) in writing. The importance of taking into account these various aspectual skills and forms of writing is recognised in teaching (e.g. Scarcella and Oxford, 1992), and in the assessment of writing (e.g. Odell and Cooper, 1980). For instance, Odell and Cooper argued that any claims about writing ability cannot be made until students’ performance on a variety of writing tasks has been examined. Thus, the issue of what writing task(s) are to be included in a test is crucial, since a task will be regarded as useless if it does not provide the basis for
making generalisations regarding an individual’s writing ability. This paper presents the findings of a study on the effects of mode of discourse on L2 writing performance as well as the interplay between learner variable, namely, proficiency level and task variable, mode of discourse amongst Malaysian upper secondary ESL learners. The findings
provide some evidence for the need to re-examine issues of reliability and validity in test practice of manipulating variables in the design of assessment tasks to evaluate ESL
writing performance. Given the status and complexity of the writing skill, it stands to reason that studies into this area will continue to shed light onto how best the construct
can be understood, taught and tested to give a fair chance for language learners to exhibit their true ability and be reliably reported on
Just Bring It:a case study on code-switching in Japanese contemporary hard rock lyrics
Abstract. This thesis studies language contact phenomena, and particularly written code-switching and code-mixing in contemporary Japanese hard rock music lyrics. The data of this mixed method case study was drawn from Japanese all-female rock band Band-Maid’s album Just Bring It (2017). The album conveys the English pervasion in Japanese popular music quite extensively, as it has substantial language contact, showcases a plethora of artistic liberties, and contains lyrics written by both the band members and third-party lyricists. The lyrical data was collected into a corpus and first examined for how code-switching differs between the lyricists, how the lyrics follow English rock lexicon, and how orthographical experiments occur in the song titles. In the second part of the analysis, the corpus was analyzed to find if the data had intrasentential (code-mixing) and intersentential (code-switching) occurrences, grammar issues and code-switching effects, such as repetition, doubling or code ambiguity. The study found out that code-switching was very prevalent; longer code-switching via clauses and sentences was more common than short one word or phrase mixing. Moreover, repetition was very prevalent, whereas other code-switching effects occurred rather rarely. Band members used longer code-switches, whereas featured lyricists used shorter, but more creative solutions. In general, in the context of this study, used English signals cosmopolitan prestige and ambitions of international success to Japanese listeners, but also serves and includes the English-speaking listeners. After compiling all the findings, one could argue that Roman letters and English have established significance in the Japanese contemporary hard rock music.Just Bring It : tapaustutkimus koodinvaihdosta tämän päivän japanilaisessa hard rock -lyriikassa. Tiivistelmä. Tämä kandidaatintutkielma tutkii kielikontakteja nykypäivän japanilaisessa hard rock -lyriikassa. Tässä laadullisia ja määrällisiä tutkimusmenetelmiä yhdistelevässä tutkielmassa keskitytään ensisijaisesti japanin ja englannin kielten väliseen, kirjoitettuun koodinvaihtoon ja koodinsekoitukseen, ja sen aineistona on käytetty Band-Maid -yhtyeen albumia Just Bring It (2017). Albumi kuvastaa hyvin englannin imeytymistä japanilaiseen populaarikulttuuriin: pääsääntöisestä japaninkielisyydestään huolimatta se sisältää koodinvaihtoa lähes kaikissa kappaleissa, vierailevia sanoittajia, ja taiteellisia vapauksia englannin kielen käytössä. Tutkielman analyysin tueksi albumin sanoitukset kerättiin yhdeksi määrälliseksi aineistorungoksi. Analyysin ensimmäisessä osassa selvitetään eri sanoittajien koodinvaihdon eroja, etsitään yhtäläisyyksiä yleisen englanninkielisen rock-sanaston ja kerätyn lyyrisen aineiston väliltä sekä analysoidaan kappaleiden nimissä käytettyjä kieliopillisia poikkeavuuksia. Analyysin toisessa osassa tutkitaan lauseiden välisten ja lauseiden sisäisten koodinvaihtojen suhdetta, kieliopillisia ongelmia sekä koodinvaihdon vaikutuksia, kuten toistorakenteita ja monitulkintaisuutta. Analyysissä selvisi, että koodinvaihto japanin ja englannin välillä oli erittäin yleistä. Pitkiä, lauseiden välistä koodinvaihtoa löytyi lauseiden sisäisiä, lyhyitä vaihtoja useammin. Lisäksi toistorakenteet olivat varsin yleisiä, toisin kuin monitulkintaiset koodinvaihdot, joita esiintyi vain kahdesti aineistossa. Yhtyeen jäsenten kirjoittamat lyriikat sisälsivät enemmän pitkiä koodinvaihtoja, kun taas vierailevien sanoittajien lyriikoista löytyi luovempia ja lyhyempiä koodinsekoituksia. Kaiken kaikkiaan lyriikoissa käytetty englanti kuvastaa yhtäältä kansainvälisyyden tavoittelua ja toisaalta inklusiivisuutta. Tulosten perusteella voidaankin alustavasti väittää, että englannin kieli näyttelee suurta osaa japanilaisessa, kansainvälisyyteen tähtäävässä nykyrockissa
Learning Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring
sentence-internal temporal relations. Temporal inference is relevant for
practical NLP applications which either extract or synthesize temporal
information (e.g., summarisation, question answering). Our method bypasses the
need for manual coding by exploiting the presence of markers like after", which
overtly signal a temporal relation. We first show that models trained on main
and subordinate clauses connected with a temporal marker achieve good
performance on a pseudo-disambiguation task simulating temporal inference
(during testing the temporal marker is treated as unseen and the models must
select the right marker from a set of possible candidates). Secondly, we assess
whether the proposed approach holds promise for the semi-automatic creation of
temporal annotations. Specifically, we use a model trained on noisy and
approximate data (i.e., main and subordinate clauses) to predict
intra-sentential relations present in TimeBank, a corpus annotated rich
temporal information. Our experiments compare and contrast several
probabilistic models differing in their feature space, linguistic assumptions
and data requirements. We evaluate performance against gold standard corpora
and also against human subjects
Parameter Learning of Logic Programs for Symbolic-Statistical Modeling
We propose a logical/mathematical framework for statistical parameter
learning of parameterized logic programs, i.e. definite clause programs
containing probabilistic facts with a parameterized distribution. It extends
the traditional least Herbrand model semantics in logic programming to
distribution semantics, possible world semantics with a probability
distribution which is unconditionally applicable to arbitrary logic programs
including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM
algorithm, the graphical EM algorithm, that runs for a class of parameterized
logic programs representing sequential decision processes where each decision
is exclusive and independent. It runs on a new data structure called support
graphs describing the logical relationship between observations and their
explanations, and learns parameters by computing inside and outside probability
generalized for logic programs. The complexity analysis shows that when
combined with OLDT search for all explanations for observations, the graphical
EM algorithm, despite its generality, has the same time complexity as existing
EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside
algorithm for PCFGs, and the one for singly connected Bayesian networks that
have been developed independently in each research field. Learning experiments
with PCFGs using two corpora of moderate size indicate that the graphical EM
algorithm can significantly outperform the Inside-Outside algorithm
Enumeration as a means of financial markets data representation
Cognitive linguistics has developed into one of the most attractive and influential frameworks within linguistics at large. And it draws the linguists’ attention to the study of aspects of language within cognitive paradigm as reflections of conceptual organization and categorization principles, as well as the ways of presenting knowledge about the world. This paper focuses on the cognitive analysis of enumeration as an instrument for organizing and conveying domain-specific information. The aim of the analysis is to establish the role of enumeration as a way of arriving at true knowledge of the world. The tasks of the research consist in the description of enumeration as a means of knowledge representation, as well as the study of the peculiarities of enumeration as a means of representation of financial markets information. The novelty of the analysis is defined by the study of enumeration as a means of representation of financial markets information. The theoretical value of the analysis is reasoned with the study of enumeration as a way of organizing knowledge that reflects collective experience of the community within a specific environment. The results of the study of enumeration in economic discourse show that it serves to structure and represent domain-specific knowledge
- …