192 research outputs found

    Morfessor 2.0: Toolkit for statistical morphological segmentation

    Get PDF
    Morfessor is a family of probabilistic machine learning methods forfinding the morphological segmentation from raw text data. Recentdevelopments include the development of semi-supervised methods forutilizing annotated data. Morfessor 2.0 is a rewrite of the original,widely-used Morfessor 1.0 software, with well documented command-linetools and library interface. It includes algorithmic improvements and new features such as semi-supervised learning, online training, and integrated evaluation code.Peer reviewe

    Dynamic Topic Adaptation for Phrase-based MT

    Get PDF
    Translating text from diverse sources poses a challenge to current machine translation systems which are rarely adapted to structure beyond corpus level. We explore topic adaptation on a diverse data set and present a new bilingual vari-ant of Latent Dirichlet Allocation to com-pute topic-adapted, probabilistic phrase translation features. We dynamically in-fer document-specific translation proba-bilities for test sets of unknown origin, thereby capturing the effects of document context on phrase translations. We show gains of up to 1.26 BLEU over the base-line and 1.04 over a domain adaptation benchmark. We further provide an anal-ysis of the domain-specific data and show additive gains of our model in combination with other types of topic-adapted features.

    A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge

    Get PDF
    Scripts representing common sense knowledge about stereotyped sequences of events have been shown to be a valu-able resource for NLP applications. We present a hierarchical Bayesian model for unsupervised learning of script knowledge from crowdsourced descriptions of human activities. Events and constraints on event ordering are induced jointly in one unified framework. We use a statistical model over permutations which captures event ordering constraints in a more flexible way than previous approaches. In order to alleviate the sparsity problem caused by using relatively small datasets, we incorporate in our hierarchical model an informed prior on word distributions. The resulting model substantially outperforms a state-of-the-art method on the event ordering task.

    A Review of Research-Based Automatic Text Simplification Tools

    Get PDF
    In the age of knowledge, the democratisation of information facilitated through the Internet may not be as pervasive if written language poses challenges to particular sectors of the population. The objective of this paper is to present an overview of research-based automatic text simplification tools. Consequently, we describe aspects such as the language, language phenomena, language levels simplified, approaches, specific target populations these tools are created for (e.g. individuals with cognitive impairment, attention deficit, elderly people, children, language learners), and accessibility and availability considerations. The review of existing studies covering automatic text simplification tools is undergone by searching two databases: Web of Science and Scopus. The eligibility criteria involve text simplification tools with a scientific background in order to ascertain how they operate. This methodology yielded 27 text simplification tools that are further analysed. Some of the main conclusions reached with this review are the lack of resources accessible to the public, the need for customisation to foster the individual’s independence by allowing the user to select what s/he finds challenging to understand while not limiting the user’s capabilities and the need for more simplification tools in languages other than English, to mention a few.This research was conducted as part of the Clear-Text project (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and European Union NextGenerationEU/PRTR
    • …
    corecore