691 research outputs found

    External Lexical Information for Multilingual Part-of-Speech Tagging

    Get PDF
    Morphosyntactic lexicons and word vector representations have both proven useful for improving the accuracy of statistical part-of-speech taggers. Here we compare the performances of four systems on datasets covering 16 languages, two of these systems being feature-based (MEMMs and CRFs) and two of them being neural-based (bi-LSTMs). We show that, on average, all four approaches perform similarly and reach state-of-the-art results. Yet better performances are obtained with our feature-based models on lexically richer datasets (e.g. for morphologically rich languages), whereas neural-based results are higher on datasets with less lexical variability (e.g. for English). These conclusions hold in particular for the MEMM models relying on our system MElt, which benefited from newly designed features. This shows that, under certain conditions, feature-based approaches enriched with morphosyntactic lexicons are competitive with respect to neural methods

    Persian Semantic Role Labeling Based on Dependency Tree

    Get PDF
    Semantic role labeling is the task of attaching semantic tags to the words according to the occurred event in the sentence. Persian semantic role labeling is a challenging task that most methods so far in this regard depend on a huge number of handcrafted features and are done on feature engineering to attain high performance. On the other hand, by considering the Free-Word-Order and Subject-Object-Verb-Order characteristics of Persian, the verbal predicate’s arguments are often distant and create long-range dependencies. The long-range dependencies can hardly be modeled by these methods. Our goal is to achieve a better performance only with minimal feature engineering and also to capture long-range dependencies in a sentence. To these ends, in this paper a deep model for semantic role labeling is developed with the help of dependency tree for Persian. In our proposed method, for each verbal predicate, the potential arguments are identified with the help of dependency relationships, and then the dependency path for each pair of predicate and its candidate argument is embedded using the information in the dependency trees. In the next step, we employed a bi-directional recurrent neural network with long short-term memory units to transform word features into semantic role scores. Experiments have been done on the first semantic role corpus in Persian language and the corpus provided by the authors. The achieved Macro-average F1-measure is 80.01 for the first corpus and 82.48 for the second one

    PersoNER: Persian named-entity recognition

    Full text link
    © 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network

    Dynamic Document Annotation for Efficient Data Retrieval

    Get PDF
    Document annotation is considered as one of the most popular methods, where metadata present in document is used to search documents from a large text documents database. Few application domains such as scientific networks, blogs share information in a large amount is usually in unstructured data text documents. Manual annotation of each document becomes a tedious task. Annotations facilitate the task of finding the document topic and assist the reader to quickly overview and understand document. Dynamic document annotation provides a solution to such type of problems. Dynamic annotation of documents is generally considered as a semi-supervised learning task. The documents are dynamically assigned to one of a set of predefined classes based on the features extracted from their textual content. This paper proposes survey on Collaborative Adaptive Data sharing platform (CADS) for document annotation and use of query workload to direct the annotation process. A key novelty of CADS is that it learns with time the most important data attributes of the application, and uses this knowledge to guide the data insertion and querying
    • …
    corecore