594 research outputs found

    Proceedings of the First Workshop on Computing News Storylines (CNewsStory 2015)

    Get PDF
    This volume contains the proceedings of the 1st Workshop on Computing News Storylines (CNewsStory 2015) held in conjunction with the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015) at the China National Convention Center in Beijing, on July 31st 2015. Narratives are at the heart of information sharing. Ever since people began to share their experiences, they have connected them to form narratives. The study od storytelling and the field of literary theory called narratology have developed complex frameworks and models related to various aspects of narrative such as plots structures, narrative embeddings, characters’ perspectives, reader response, point of view, narrative voice, narrative goals, and many others. These notions from narratology have been applied mainly in Artificial Intelligence and to model formal semantic approaches to narratives (e.g. Plot Units developed by Lehnert (1981)). In recent years, computational narratology has qualified as an autonomous field of study and research. Narrative has been the focus of a number of workshops and conferences (AAAI Symposia, Interactive Storytelling Conference (ICIDS), Computational Models of Narrative). Furthermore, reference annotation schemes for narratives have been proposed (NarrativeML by Mani (2013)). The workshop aimed at bringing together researchers from different communities working on representing and extracting narrative structures in news, a text genre which is highly used in NLP but which has received little attention with respect to narrative structure, representation and analysis. Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic extraction of events from single documents and work towards extracting story structures from multiple documents, while these documents are published over time as news streams. Policy makers, NGOs, information specialists (such as journalists and librarians) and others are increasingly in need of tools that support them in finding salient stories in large amounts of information to more effectively implement policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g. hurricane Katrina). Storylines represent explanatory schemas that enable us to make better selections of relevant information but also projections to the future. They form a valuable potential for exploiting news data in an innovative way.JRC.G.2-Global security and crisis managemen

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Automatic extraction of robotic surgery actions from text and kinematic data

    Get PDF
    The latest generation of robotic systems is becoming increasingly autonomous due to technological advancements and artificial intelligence. The medical field, particularly surgery, is also interested in these technologies because automation would benefit surgeons and patients. While the research community is active in this direction, commercial surgical robots do not currently operate autonomously due to the risks involved in dealing with human patients: it is still considered safer to rely on human surgeons' intelligence for decision-making issues. This means that robots must possess human-like intelligence, including various reasoning capabilities and extensive knowledge, to become more autonomous and credible. As demonstrated by current research in the field, indeed, one of the most critical aspects in developing autonomous systems is the acquisition and management of knowledge. In particular, a surgical robot must base its actions on solid procedural surgical knowledge to operate autonomously, safely, and expertly. This thesis investigates different possibilities for automatically extracting and managing knowledge from text and kinematic data. In the first part, we investigated the possibility of extracting procedural surgical knowledge from real intervention descriptions available in textbooks and academic papers on the robotic-surgical domains, by exploiting Transformer-based pre-trained language models. In particular, we released SurgicBERTa, a RoBERTa-based pre-trained language model for surgical literature understanding. It has been used to detect procedural sentences in books and extract procedural elements from them. Then, with some use cases, we explored the possibilities of translating written instructions into logical rules usable for robotic planning. Since not all the knowledge required for automatizing a procedure is written in texts, we introduce the concept of surgical commonsense, showing how it relates to different autonomy levels. In the second part of the thesis, we analyzed surgical procedures from a lower granularity level, showing how each surgical gesture is associated with a given combination of kinematic data

    Discovering knowledge structures in mind maps of mental health risks

    Get PDF
    This thesis addressed the problem of risk analysis in mental healthcare, with respect to the GRiST project at Aston University. That project provides a risk-screening tool based on the knowledge of 46 experts, captured as mind maps that describe relationships between risks and patterns of behavioural cues. Mind mapping, though, fails to impose control over content, and is not considered to formally represent knowledge. In contrast, this thesis treated GRiSTs mind maps as a rich knowledge base in need of refinement; that process drew on existing techniques for designing databases and knowledge bases. Identifying well-defined mind map concepts, though, was hindered by spelling mistakes, and by ambiguity and lack of coverage in the tools used for researching words. A novel use of the Edit Distance overcame those problems, by assessing similarities between mind map texts, and between spelling mistakes and suggested corrections. That algorithm further identified stems, the shortest text string found in related word-forms. As opposed to existing approaches’ reliance on built-in linguistic knowledge, this thesis devised a novel, more flexible text-based technique. An additional tool, Correspondence Analysis, found patterns in word usage that allowed machines to determine likely intended meanings for ambiguous words. Correspondence Analysis further produced clusters of related concepts, which in turn drove the automatic generation of novel mind maps. Such maps underpinned adjuncts to the mind mapping software used by GRiST; one such new facility generated novel mind maps, to reflect the collected expert knowledge on any specified concept. Mind maps from GRiST are stored as XML, which suggested storing them in an XML database. In fact, the entire approach here is ”XML-centric”, in that all stages rely on XML as far as possible. A XML-based query language allows user to retrieve information from the mind map knowledge base. The approach, it was concluded, will prove valuable to mind mapping in general, and to detecting patterns in any type of digital information

    Finding the online cry for help : automatic text classification for suicide prevention

    Get PDF
    Successful prevention of suicide, a serious public health concern worldwide, hinges on the adequate detection of suicide risk. While online platforms are increasingly used for expressing suicidal thoughts, manually monitoring for such signals of distress is practically infeasible, given the information overload suicide prevention workers are confronted with. In this thesis, the automatic detection of suicide-related messages is studied. It presents the first classification-based approach to online suicidality detection, and focuses on Dutch user-generated content. In order to evaluate the viability of such a machine learning approach, we developed a gold standard corpus, consisting of message board and blog posts. These were manually labeled according to a newly developed annotation scheme, grounded in suicide prevention practice. The scheme provides for the annotation of a post's relevance to suicide, and the subject and severity of a suicide threat, if any. This allowed us to derive two tasks: the detection of suicide-related posts, and of severe, high-risk content. In a series of experiments, we sought to determine how well these tasks can be carried out automatically, and which information sources and techniques contribute to classification performance. The experimental results show that both types of messages can be detected with high precision. Therefore, the amount of noise generated by the system is minimal, even on very large datasets, making it usable in a real-world prevention setting. Recall is high for the relevance task, but at around 60%, it is considerably lower for severity. This is mainly attributable to implicit references to suicide, which often go undetected. We found a variety of information sources to be informative for both tasks, including token and character ngram bags-of-words, features based on LSA topic models, polarity lexicons and named entity recognition, and suicide-related terms extracted from a background corpus. To improve classification performance, the models were optimized using feature selection, hyperparameter, or a combination of both. A distributed genetic algorithm approach proved successful in finding good solutions for this complex search problem, and resulted in more robust models. Experiments with cascaded classification of the severity task did not reveal performance benefits over direct classification (in terms of F1-score), but its structure allows the use of slower, memory-based learning algorithms that considerably improved recall. At the end of this thesis, we address a problem typical of user-generated content: noise in the form of misspellings, phonetic transcriptions and other deviations from the linguistic norm. We developed an automatic text normalization system, using a cascaded statistical machine translation approach, and applied it to normalize the data for the suicidality detection tasks. Subsequent experiments revealed that, compared to the original data, normalized data resulted in fewer and more informative features, and improved classification performance. This extrinsic evaluation demonstrates the utility of automatic normalization for suicidality detection, and more generally, text classification on user-generated content
    • …
    corecore