242,033 research outputs found
Towards a constructional approach to discourse-level phenomena : the case of the Spanish interpersonal epistemic stance construction
This study contributes to a better understanding of how constructional models can be applied to discourse-level phenomena, and constitute a valuable complementation to previous grammaticalization accounts of pragmatic markers. The case study that is presented concerns the recent development of the interpersonal epistemic stance construction in Spanish. The central argument is that the expanding use of sabes as a pragmatic marker can best be fully understood by taking into account the composite network of related expressions which Spanish speakers have at their disposal when performing a particular speech act. The diachronic analysis is documented with spoken corpus examples collected in recent decades, and is mainly informed by frequency data measuring the productivity, as well as formal properties of the construction and its instances
Angļu-latviešu statistiskās mašīntulkošanas sistēmas izveide: metodes, resursi un pirmie rezultāti
<p class="Pa4"><strong>DEVELOPMENT OF ENGLISH-LATVIAN STATISTICAL MACHINE TRANSLATION SYSTEM: METHODS, RESOURCES AND FIRST RESULTS</strong></p><p class="Pa5"><em>Summary</em></p><p>This paper presents research and development of English-Latvian Statistical Machine Translation (SMT) prototypes for legal domain. Several methods have been investigated, i.e., phrase-based models and factored models. Translation quality has been evaluated using automated metrics (BLEU score) and human evaluation. In automatic evaluation the best score (46.44 BLEU points) was assigned to factored model trained on JRC Acquis corpus (version 3.0) which was also evaluated as the best from the human viewpoint. In addition, error analysis of SMT output was performed. This analysis showed that although the output of the best prototype demonstrated a reasonable quality, it had several frequent common errors, i.e., incorrect form, missing words and wrong word order. For the future, work on tree-based SMT and hybrid systems is proposed.</p
Cell line name recognition in support of the identification of synthetic lethality in cancer from text
Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus.
Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers
Effect of Tuned Parameters on a LSA MCQ Answering Model
This paper presents the current state of a work in progress, whose objective
is to better understand the effects of factors that significantly influence the
performance of Latent Semantic Analysis (LSA). A difficult task, which consists
in answering (French) biology Multiple Choice Questions, is used to test the
semantic properties of the truncated singular space and to study the relative
influence of main parameters. A dedicated software has been designed to fine
tune the LSA semantic space for the Multiple Choice Questions task. With
optimal parameters, the performances of our simple model are quite surprisingly
equal or superior to those of 7th and 8th grades students. This indicates that
semantic spaces were quite good despite their low dimensions and the small
sizes of training data sets. Besides, we present an original entropy global
weighting of answers' terms of each question of the Multiple Choice Questions
which was necessary to achieve the model's success.Comment: 9 page
Acquiring Correct Knowledge for Natural Language Generation
Natural language generation (NLG) systems are computer software systems that
produce texts in English and other human languages, often from non-linguistic
input data. NLG systems, like most AI systems, need substantial amounts of
knowledge. However, our experience in two NLG projects suggests that it is
difficult to acquire correct knowledge for NLG systems; indeed, every knowledge
acquisition (KA) technique we tried had significant problems. In general terms,
these problems were due to the complexity, novelty, and poorly understood
nature of the tasks our systems attempted, and were worsened by the fact that
people write so differently. This meant in particular that corpus-based KA
approaches suffered because it was impossible to assemble a sizable corpus of
high-quality consistent manually written texts in our domains; and structured
expert-oriented KA techniques suffered because experts disagreed and because we
could not get enough information about special and unusual cases to build
robust systems. We believe that such problems are likely to affect many other
NLG systems as well. In the long term, we hope that new KA techniques may
emerge to help NLG system builders. In the shorter term, we believe that
understanding how individual KA techniques can fail, and using a mixture of
different KA techniques with different strengths and weaknesses, can help
developers acquire NLG knowledge that is mostly correct
Weakly-supervised appraisal analysis
This article is concerned with the computational treatment of Appraisal, a Systemic Functional Linguistic theory of the types of language employed to communicate opinion in English. The theory considers aspects such as Attitude (how writers communicate their point of view), Engagement (how writers align themselves with respect to the opinions of others) and Graduation (how writers amplify or diminish their attitudes and engagements). To analyse text according to the theory we employ a weakly-supervised approach to text classification, which involves comparing the similarity of words with prototypical examples of classes. We evaluate the method's performance using a collection of book reviews annotated according to the Appraisal theory
- …