106 research outputs found

    Theoretical foundations for illocutionary structure parsing

    Get PDF
    Illocutionary structure in real language use is intricate and complex, and nowhere more so than in argument and debate. Identifying this structure without any theoretical scaffolding is extremely challenging even for humans. New work in Inference Anchoring Theory has provided significant advances in such scaffolding which are helping to allow the analytical challenges of argumentation structure to be tackled. This paper demonstrates how these advances can also pave the way to automated and semi-automated research in understanding the structure of natural debate. This paper is the extended version of the paper presented at the 11th International Conference on Computational Models of Natural Argument (CMNA 2013), 14 June 2013, Rome, Italy. It reports on the initial steps of a project on argument mining from dialogue. Note that since then the corpus size and the annotation scheme have evolved, however, the method presented here is still valid and the project has developed accordingly

    Ensemble Morphosyntactic Analyser for Classical Arabic

    Get PDF
    Classical Arabic (CA) is an influential language for Muslim lives around the world. It is the language of two sources of Islamic laws: the Quran and the Sunnah, the collection of traditions and sayings attributed to the prophet Mohammed. However, classical Arabic in general, and the Sunnah, in particular, is underexplored and under-resourced in the field of computational linguistics. This study examines the possible directions for adapting existing tools, specifically morphological analysers, designed for modern standard Arabic (MSA) to classical Arabic. Morphological analysers of CA are limited, as well as the data for evaluating them. In this study, we adapt existing analysers and create a validation data-set from the Sunnah books. Inspired by the advances in deep learning and the promising results of ensemble methods, we developed a systematic method for transferring morphological analysis that is capable of handling different labelling systems and various sequence lengths. In this study, we handpicked the best four open access MSA morphological analysers. Data generated from these analysers are evaluated before and after adaptation through the existing Quranic Corpus and the Sunnah Arabic Corpus. The findings are as follows: first, it is feasible to analyse under-resourced languages using existing comparable language resources given a small sufficient set of annotated text. Second, analysers typically generate different errors and this could be exploited. Third, an explicit alignment of sequences and the mapping of labels is not necessary to achieve comparable accuracies given a sufficient size of training dataset. Adapting existing tools is easier than creating tools from scratch. The resulting quality is dependent on training data size and number and quality of input taggers. Pipeline architecture performs less well than the End-to-End neural network architecture due to error propagation and limitation on the output format. A valuable tool and data for annotating classical Arabic is made freely available

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    Cogitator : a parallel, fuzzy, database-driven expert system

    Get PDF
    The quest to build anthropomorphic machines has led researchers to focus on knowledge and the manipulation thereof. Recently, the expert system was proposed as a solution, working well in small, well understood domains. However these initial attempts highlighted the tedious process associated with building systems to display intelligence, the most notable being the Knowledge Acquisition Bottleneck. Attempts to circumvent this problem have led researchers to propose the use of machine learning databases as a source of knowledge. Attempts to utilise databases as sources of knowledge has led to the development Database-Driven Expert Systems. Furthermore, it has been ascertained that a requisite for intelligent systems is powerful computation. In response to these problems and proposals, a new type of database-driven expert system, Cogitator is proposed. It is shown to circumvent the Knowledge Acquisition Bottleneck and posess many other advantages over both traditional expert systems and connectionist systems, whilst having non-serious disadvantages.KMBT_22

    Cross-Lingual and Low-Resource Sentiment Analysis

    Get PDF
    Identifying sentiment in a low-resource language is essential for understanding opinions internationally and for responding to the urgent needs of locals affected by disaster incidents in different world regions. While tools and resources for recognizing sentiment in high-resource languages are plentiful, determining the most effective methods for achieving this task in a low-resource language which lacks annotated data is still an open research question. Most existing approaches for cross-lingual sentiment analysis to date have relied on high-resource machine translation systems, large amounts of parallel data, or resources only available for Indo-European languages. This work presents methods, resources, and strategies for identifying sentiment cross-lingually in a low-resource language. We introduce a cross-lingual sentiment model which can be trained on a high-resource language and applied directly to a low-resource language. The model offers the feature of lexicalizing the training data using a bilingual dictionary, but can perform well without any translation into the target language. Through an extensive experimental analysis, evaluated on 17 target languages, we show that the model performs well with bilingual word vectors pre-trained on an appropriate translation corpus. We compare in-genre and in-domain parallel corpora, out-of-domain parallel corpora, in-domain comparable corpora, and monolingual corpora, and show that a relatively small, in-domain parallel corpus works best as a transfer medium if it is available. We describe the conditions under which other resources and embedding generation methods are successful, and these include our strategies for leveraging in-domain comparable corpora for cross-lingual sentiment analysis. To enhance the ability of the cross-lingual model to identify sentiment in the target language, we present new feature representations for sentiment analysis that are incorporated in the cross-lingual model: bilingual sentiment embeddings that are used to create bilingual sentiment scores, and a method for updating the sentiment embeddings during training by lexicalization of the target language. This feature configuration works best for the largest number of target languages in both untargeted and targeted cross-lingual sentiment experiments. The cross-lingual model is studied further by evaluating the role of the source language, which has traditionally been assumed to be English. We build cross-lingual models using 15 source languages, including two non-European and non-Indo-European source languages: Arabic and Chinese. We show that language families play an important role in the performance of the model, as does the morphological complexity of the source language. In the last part of the work, we focus on sentiment analysis towards targets. We study Arabic as a representative morphologically complex language and develop models and morphological representation features for identifying entity targets and sentiment expressed towards them in Arabic open-domain text. Finally, we adapt our cross-lingual sentiment models for the detection of sentiment towards targets. Through cross-lingual experiments on Arabic and English, we demonstrate that our findings regarding resources, features, and language also hold true for the transfer of targeted sentiment
    corecore