8 research outputs found

    Predicting the Type and Target of Offensive Posts in Social Media

    Get PDF
    Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)As offensive content has become pervasive in social media, there has been much research in identifying potentially offensive messages. However, previous work on this topic did not consider the problem as a whole, but rather focused on detecting very specific types of offensive content, e.g., hate speech, cyberbulling, or cyber-aggression. In contrast, here we target several different kinds of offensive content. In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media. For this purpose, we complied the Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, which we make publicly available. We discuss the main similarities and differences between OLID and pre-existing datasets for hate speech identification, aggression detection, and similar tasks. We further experiment with and we compare the performance of different machine learning models on OLID

    A Novel Mobile Wireless Sensing System for Real- time Monitoring of Posture and Spine Stress

    No full text
    Abstract-Poor posture or extra stress on the spine has been shown to lead to a variety of spinal disorders including chronic back pain, and to incur numerous health costs to society. For this reason, workplace ergonomics is rapidly becoming indispensable in all major corporations. Making the individual continuously aware of poor posture may reduce out-of-posture tendencies and encourage healthy spinal habits. We have developed a novel wireless mobile sensing system which monitors spine stress in real-time by detecting poor back posture and strain on the back due to prolonged sitting or standing. The system provides a new method of measuring spine stress at both the back and the feet by integrating posture sensors with strain sensors. Posture and strain data is collected by means of a posture sensor at the neck and weight sensors at the feet. Data is transmitted wirelessly to a central processing station and real-time feedback is provided to the user's mobile device when sustained bad posture is detected. Moreover, the position of the patient (sitting, standing, or walking) can be determined by analysis of the weight sensor data and is visualized in real-time, along with back posture, at the central station by means of a graphical animation. Finally, data from all sensors is stored in a database to enable post processing and data analysis, and a summary report of daily posture and physical activity is sent to the user's email. The use of centralized processing allows for high performance data analysis and storage at the central station which enables tracking of the individual's progress. We demonstrate effectiveness of our system in simultaneously monitoring posture and position by testing in numerous situations

    Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

    No full text
    Optical Character Recognition (OCR) systems for Arabic rely on information contained in the scanned images to recognize sequences of characters and on language models to emphasize fluency. In this paper we incorporate linguistically and semantically motivated features to an existing OCR system. To do so we follow an n-best list reranking approach that exploits recent advances in learning to rank techniques. We achieve 10.1 % and 11.4 % reduction in recognition word error rate (WER) relative to a standard baseline system on typewritten and handwritten Arabic respectively.

    Large Scale Arabic Error Annotation: Guidelines and Framework

    No full text
    We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations

    Exploring Differences in the Impact of Users' Traces on Arabic and English Facebook Search

    Get PDF
    International audienceThis paper proposes an approach on Facebook search in Arabic and English, which exploits several users' traces (e.g. comment, share, reactions) left on Facebook posts to estimate their social importance. Our goal is to show how these social traces (signals) can play a vital role in improving Arabic and English Facebook search. Firstly, we identify polarities (positive or negative) carried by the textual signals (e.g. comments) and non-textual ones (e.g. the reactions love and sad) for a given Facebook posts. Therefore, the polarity of each comment expressed in Arabic or in English on a given Facebook post, is estimated on the basis of a neural sentiment model. Secondly , we group signals according to their complementarity using attributes (features) selection algorithms. Thirdly, we apply learning to rank (LTR) algorithms to re-rank Facebook search results based on the selected groups of signals. Finally, experiments are carried out on 13,500 Facebook posts, collected from 45 topics, for each of the two languages. Experiments results reveal that Random Forests was the most effective LTR approach for this task, and for the both languages. However, the best appropriate features selection algorithms are ReliefFAttributeEval and InfoGainAttributeEval for Arabic and English Facebook search task, respectively
    corecore