63 research outputs found

    Self-Supervised and Controlled Multi-Document Opinion Summarization

    Full text link
    We address the problem of unsupervised abstractive summarization of collections of user generated reviews with self-supervision and control. We propose a self-supervised setup that considers an individual document as a target summary for a set of similar documents. This setting makes training simpler than previous approaches by relying only on standard log-likelihood loss. We address the problem of hallucinations through the use of control codes, to steer the generation towards more coherent and relevant summaries.Finally, we extend the Transformer architecture to allow for multiple reviews as input. Our benchmarks on two datasets against graph-based and recent neural abstractive unsupervised models show that our proposed method generates summaries with a superior quality and relevance.This is confirmed in our human evaluation which focuses explicitly on the faithfulness of generated summaries We also provide an ablation study, which shows the importance of the control setup in controlling hallucinations and achieve high sentiment and topic alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi

    Developing resources for sentiment analysis of informal Arabic text in social media

    Get PDF
    Natural Language Processing (NLP) applications such as text categorization, machine translation, sentiment analysis, etc., need annotated corpora and lexicons to check quality and performance. This paper describes the development of resources for sentiment analysis specifically for Arabic text in social media. A distinctive feature of the corpora and lexicons developed are that they are determined from informal Arabic that does not conform to grammatical or spelling standards. We refer to Arabic social media content of this sort as Dialectal Arabic (DA) - informal Arabic originating from and potentially mixing a range of different individual dialects. The paper describes the process adopted for developing corpora and sentiment lexicons for sentiment analysis within different social media and their resulting characteristics. The addition to providing useful NLP data sets for Dialectal Arabic the work also contributes to understanding the approach to developing corpora and lexicons

    Augmentative and alternative communication (AAC) advances: A review of configurations for individuals with a speech disability

    Get PDF
    High-tech augmentative and alternative communication (AAC) methods are on a constant rise; however, the interaction between the user and the assistive technology is still challenged for an optimal user experience centered around the desired activity. This review presents a range of signal sensing and acquisition methods utilized in conjunction with the existing high-tech AAC platforms for individuals with a speech disability, including imaging methods, touch-enabled systems, mechanical and electro-mechanical access, breath-activated methods, and brain–computer interfaces (BCI). The listed AAC sensing modalities are compared in terms of ease of access, affordability, complexity, portability, and typical conversational speeds. A revelation of the associated AAC signal processing, encoding, and retrieval highlights the roles of machine learning (ML) and deep learning (DL) in the development of intelligent AAC solutions. The demands and the affordability of most systems hinder the scale of usage of high-tech AAC. Further research is indeed needed for the development of intelligent AAC applications reducing the associated costs and enhancing the portability of the solutions for a real user’s environment. The consolidation of natural language processing with current solutions also needs to be further explored for the amelioration of the conversational speeds. The recommendations for prospective advances in coming high-tech AAC are addressed in terms of developments to support mobile health communicative applications

    Users' Traces for Enhancing Arabic Facebook Search

    Get PDF
    International audienceThis paper proposes an approach on Facebook search in Arabic, which exploits several users' traces (e.g. comment, share, reactions) left on Facebook posts to estimate their social importance. Our goal is to show how these social traces (signals) can play a vital role in improving Arabic Facebook search. Firstly, we identify polarities (positive or negative) carried by the textual signals (e.g. comments) and non-textual ones (e.g. the reactions love and sad) for a given Facebook post. Therefore, the polarity of each comment expressed on a given Facebook post, is estimated on the basis of a neural sentiment model in Arabic language. Secondly, we group signals according to their complementarity using features selection algorithms. Thirdly, we apply learning to rank (LTR) algorithms to re-rank Facebook search results based on the selected groups of signals. Finally, experiments are carried out on 13,500 Facebook posts, collected from 45 topics in Arabic language. Experiments results reveal that Random Forests combined with ReliefFAttributeEval (RLF) was the most effective LTR approach for this task

    Breathing pattern interpretation as an alternative and effective voice communication solution

    Get PDF
    Augmentative and alternative communication (AAC) systems tend to rely on the interpretation of purposeful gestures for interaction. Existing AAC methods could be cumbersome and limit the solutions in terms of versatility. The study aims to interpret breathing patterns (BPs) to converse with the outside world by means of a unidirectional microphone and researches breathing-pattern interpretation (BPI) to encode messages in an interactive manner with minimal training. We present BP processing work with (1) output synthesized machine-spoken words (SMSW) along with single-channel Weiner filtering (WF) for signal de-noising, and (2) k-nearest neighbor (k-NN) classification of BPs associated with embedded dynamic time warping (DTW). An approved protocol to collect analogue modulated BP sets belonging to 4 distinct classes with 10 training BPs per class and 5 live BPs per class was implemented with 23 healthy subjects. An 86% accuracy of k-NN classification was obtained with decreasing error rates of 17%, 14%, and 11% for the live classifications of classes 2, 3, and 4, respectively. The results express a systematic reliability of 89% with increased familiarity. The outcomes from the current AAC setup recommend a durable engineering solution directly beneficial to the sufferers

    A review on corpus annotation for arabic sentiment analysis

    Get PDF
    Mining publicly available data for meaning and value is an important research direction within social media analysis. To automatically analyze collected textual data, a manual effort is needed for a successful machine learning algorithm to effectively classify text. This pertains to annotating the text adding labels to each data entry. Arabic is one of the languages that are growing rapidly in the research of sentiment analysis, despite limited resources and scares annotated corpora. In this paper, we review the annotation process carried out by those papers. A total of 27 papers were reviewed between the years of 2010 and 2016
    • …
    corecore