Search CORE

1,256 research outputs found

Recommended from our members

AXEL: A framework to deal with ambiguity in three-noun compounds

Author: Matadamas Martinez Jorge
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 6/12/2010.Cognitive Linguistics has been widely used to deal with the ambiguity generated by words in combination. Although this domain offers many solutions to address this challenge, not all of them can be implemented in a computational environment. The Dynamic Construal of Meaning framework is argued to have this ability because it describes an intrinsic degree of association of meanings, which in turn, can be translated into computational programs. A limitation towards a computational approach, however, has been the lack of syntactic parameters. This research argues that this limitation could be overcome with the aid of the Generative Lexicon Theory (GLT). Specifically, this dissertation formulated possible means to marry the GLT and Cognitive Linguistics in a novel rapprochement between the two. This bond between opposing theories provided the means to design a computational template (the AXEL System) by realising syntax and semantics at software levels. An instance of the AXEL system was created using a Design Research approach. Planned iterations were involved in the development to improve artefact performance. Such iterations boosted performance-improving, which accounted for the degree of association of meanings in three-noun compounds. This dissertation delivered three major contributions on the brink of a so-called turning point in Computational Linguistics (CL). First, the AXEL system was used to disclose hidden lexical patterns on ambiguity. These patterns are difficult, if not impossible, to be identified without automatic techniques. This research claimed that these patterns can assist audiences of linguists to review lexical knowledge on a software-based viewpoint. Following linguistic awareness, the second result advocated for the adoption of improved resources by decreasing electronic space of Sense Enumerative Lexicons (SELs). The AXEL system deployed the generation of “at the moment of use” interpretations, optimising the way the space is needed for lexical storage. Finally, this research introduced a subsystem of metrics to characterise an ambiguous degree of association of three-noun compounds enabling ranking methods. Weighing methods delivered mechanisms of classification of meanings towards Word Sense Disambiguation (WSD). Overall these results attempted to tackle difficulties in understanding studies of Lexical Semantics via software tools

Brunel University Research Archive

Novel Natural Language Processing Models for Medical Terms and Symptoms Detection in Twitter

Author: Golrooy Motlagh Farahnaz
Publication venue: CORE Scholar
Publication date: 01/01/2022
Field of study

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected 7.5 million data from August 2015 to March 2016. This work leveraged a longstanding, multidisciplinary collaboration between researchers at the Population & Center for Interventions, Treatment, and Addictions Research (CITAR) in the Boonshoft School of Medicine and the Department of Computer Science and Engineering. In addition, we aimed to develop and deploy an innovative prediction analysis algorithm for eDrugTrends, capable of semi-automated processing of Twitter data to identify emerging trends in cannabis and synthetic cannabinoid use in the U.S. In addition, the study included aim four, a use case study defined by tweets content analyzing PLWH, medication patterns, and identifying keyword trends via Twitter-based, user-generated content. This case study leveraged a multidisciplinary collaboration between researchers at the Departments of Family Medicine and Population and Public Health Sciences at Wright State University’s Boonshoft School of Medicine and the Department of Computer Science and Engineering. We collected 65K data from February 2022 to July 2022 with the U.S.-based HIV knowledge domain recruited via the Twitter API streaming platform. For knowledge discovery, domain knowledge plays a significant role in powering many intelligent frameworks, such as data analysis, information retrieval, and pattern recognition. Recent NLP and semantic web advances have contributed to extending the domain knowledge of medical terms. These techniques required a bag of seeds for medical knowledge discovery. Various initiate seeds create irrelevant data to the noise and negatively impact the prediction analysis performance. The methodology of aim one, PatRDis classifier, applied for noisy and ambiguous issues, and aim two, DsOn Ontology model, applied for semantic parsing and enriching the online medical to classify the data for HIV care medications engagement and symptom detection from Twitter. By applying the methodology of aims 2 and 3, we solved the challenges of ambiguity and explored more than 1500 cannabis and cannabinoid slang terms. Sentiments measured preceding the election, such as states with high levels of positive sentiment preceding the election who were engaged in enhancing their legalization status. we also used the same dataset for prediction analysis for marijuana legalization and consumption trend analysis (Ohio public polling data). In Aim 4, we applied three experiments, ensemble-learning, the RNN-LSM, the NNBERT-CNN models, and five techniques to determine the tweets associated with medication adherence and HIV symptoms. The long short-term memory (LSTM) model and the CNN for sentence classification produce accurate results and have been recently used in NLP tasks. CNN models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long-term dependencies between word sequences hence are better used for text classification. We propose attention-based RNN, MLP, and CNN deep learning models that capitalize on the advantages of LSTM and BERT techniques with an additional attention mechanism. We trained the model using NNBERT to evaluate the proposed model\u27s performance. The test results showed that the proposed models produce more accurate classification results, and BERT obtained higher recall and F1 scores than MLP or LSTM models. In addition, We developed an intelligent tool capable of automated processing of Twitter data to identify emerging trends in HIV disease, HIV symptoms, and medication adherence

CORE

Nowcasting user behaviour with social media and smart devices on a longitudinal basis: from macro- to micro-level modelling

Author: Tsakalidis Adam
Publication venue
Publication date: 01/09/2018
Field of study

The adoption of social media and smart devices by millions of users worldwide over the last decade has resulted in an unprecedented opportunity for NLP and social sciences. Users publish their thoughts and opinions on everyday issues through social media platforms, while they record their digital traces through their smart devices. Mining these rich resources offers new opportunities in sensing real-world events and indices (e.g., political preference, mental health indices) in a longitudinal fashion, either at the macro (population)-, or at the micro(user)-level. The current project aims at developing approaches to “nowcast" (predict the current state of) such indices at both levels of granularity. First, we build natural language resources for the static tasks of sentiment analysis, emotion disclosure and sarcasm detection over user-generated content. These are important for opinion monitoring on a large scale. Second, we propose a general approach that leverages textual data derived from generic social media streams to nowcast political indices at the macro-level. Third, we leverage temporally sensitive and asynchronous information to nowcast the political stance of social media users, at the micro-level using multiple kernel learning. We then focus further on the micro-level modelling, to account for heterogeneous data sources, such as information derived from users' smart phones, SMS and social media messages, to nowcast time-varying mental health indices of a small cohort of users on a longitudinal basis. Finally, we present the challenges faced when applying such micro-level approaches in a real-world setting and propose directions for future research

Warwick Research Archives Portal Repository

Real-time context-based sound and color extraction from text

Author: Peng Timothy, M. Eng. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (page 69).Narratarium is a system that uses English text or voice input, provided either realtime or off-line, to generate context-specific colors and sound effects. It accomplishes this by employing a variety of machine learning approaches, including commonsense reasoning and natural language processing. It can be highly customized to prioritize different performance metrics, most importantly accuracy and latency, and can be used with any tagged sound corpus. The final product allows users to tell a story in an immersive environment that augments the story-telling experience with thematic colors and background sounds. In this thesis, we present the back-end logic that generates best guesses for contextual colors and sound using text input. We evaluate the performance of these algorithms under different configurations, and demonstrate that performance is acceptable for realistic user scenarios. We also discuss Narratarium's overall design.by Timothy Peng.M. Eng

DSpace@MIT

Sharing is Caring: Using Open Data To Improve Targeting Policies

Author: Faska Matthias
Rößler Jannik
Schoder Detlef
Tilly Roman
Publication venue: AIS Electronic Library (AISeL)
Publication date: 18/06/2022
Field of study

When it comes to predictive power, companies in a variety of sectors depend on having sufficient data to develop and deploy business analytics applications, for example, to acquire new customers. While there is a vast literature on enriching internal data sets with external data sources, it is still largely unclear whether and how open data can be used to enrich internal data sets to improve business analytics. We choose a particular business analytics problem – designing targeting policies to acquire new customers – to investigate how an internal data set of a German grocery supplier can be enriched with open data to improve targeting policies. Using the enriched data set, we can improve the response rate of several well-established targeting policies by more than 30% in back-testing. Based on these results, we encourage firms and researchers to use, leverage, and share open data to enhance business analytics

AIS Electronic Library (AISeL)

INVESTIGATING CRIME-TO-TWITTER RELATIONSHIPS IN URBAN ENVIRONMENTS - FACILITATING A VIRTUAL NEIGHBORHOOD WATCH

Author: Bendler Johannes
Brandt Tobias
Neumann Dirk
Wagner Sebastian
Publication venue: AIS Electronic Library (AISeL)
Publication date: 07/06/2014
Field of study

Social networks offer vast potential for marketing agencies, as members freely provide private information, for instance on their current situation, opinions, tastes, and feelings. The use of social networks to feed into crime platforms has been acknowledged to build a kind of a virtual neighborhood watch. Current attempts that tried to automatically connect news from social networks with crime platforms have concentrated on documentation of past events, but neglected the opportunity to use Twitter data as a decision support system to detect future crimes. In this work, we attempt to unleash the wisdom of crowds materialized in tweets from Twitter. This requires to look at Tweets that have been sent within a vicinity of each other. Based on the aggregated Tweets traffic we correlate them with crime types. Apparently, crimes such as disturbing the peace or homicide exhibit different Tweet patterns before the crime has been committed. We show that these tweet patterns can strengthen the explanation of criminal activity in urban areas. On top of that, we go beyond pure explanatory approaches and use predictive analytics to provide evidence that Twitter data can improve the prediction of crimes

AIS Electronic Library (AISeL)