Search CORE

85 research outputs found

A complete online SVM and case base reasoning in pipe defect dection with multisensory inspection gauge

Author: Le D. Van-Khoa
Publication venue
Publication date: 24/07/2022
Field of study

An in-line inspection (ILI) robot has been considered an inevitable requirement to perform non-destructive testing methods efficiently and economically. The detection of flaws that could lead to leakages in buried concrete pipes has been a great concern to the oil and gas industry and water resource-based industry. The major problem is the difficulty in modeling the detection of cracks due to their irregularity and randomness that cannot be easily detected. Consequently, the use of an advanced modality system has emerged. Common defects detection systems favor non-destructive testing methods, which utilize specific sensory data. Only a few systems focus on fusing different types of sensory data. Moreover, the decision mechanism in this system required heavy-power consumption sensors with the configuration from the expertise domain. In addition, the outcome of the decision system is a consequence of rule-based settings rather than a mixture of learned features. This work covers the study of defect detection of non-destructive testing methods using fusion inspection sensors, light detection and ranging (LiDAR), and Optic sensors. The studies on ILI robots are reviewed to construct an efficient gauge. The prototype robot has been designed and successfully operated in a lab-scale environment. Ultimately, the study proposed a replacement for the standard expert system - in the branch of the CBR system, which is the crucial contribution of this thesis. Recent developments in Case-based Reasoning systems (CBR) have led to an interest in favoring machine learning (ML) approaches to replace traditional weighted distance methods. However, valuable information obtained through a training process was relinquished as transferring to other phases. As a result, the complete SVM-CBR system in this thesis concentrates on solving this gap by presenting an effective transferring mechanism from phase to phase. This thesis proposed a full pipeline integration of CBR using the kernel method designated with support vector machine. SVM technique is the primary classification engine for the combined sensory data. Since the system requires a learning SVM model to be invoked in every phase, the online learning mechanism is nominated to update the model when a new case adjoins effectively. The proposed full SVM-CBR integration has been successfully built into a pipe defect detection. The achieved result indicates a substantial improvement in transferring learning information accurately

Nottingham eTheses

Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

Author: ALMUQREN LATIFAH
Publication venue
Publication date: 01/01/2021
Field of study

The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining

Durham e-Theses

"Is There Choice in Non-Native Voice?" Linguistic Feature Engineering and a Variationist Perspective in Automatic Native Language Identification

Author: Bich Serhiy
Publication venue: Universität Tübingen
Publication date: 01/01/2017
Field of study

Is it possible to infer the native language of an author from a non-native text? Can we perform this task fully automatically? The interest in answers to these questions led to the emergence of a research field called Native Language Identification (NLI) in the first decade of this century. The requirement to automatically identify a particular property based on some language data situates the task in the intersection between computer science and linguistics, or in the context of computational linguistics, which combines both disciplines. This thesis targets several relevant research questions in the context of NLI. In particular, what is the role of surface features and more abstract linguistic cues? How to combine different sets of features, and how to optimize the resulting large models? Do the findings generalize across different data sets? Can we benefit from considering the task in the light of the language variation theory? In order to approach these questions, we conduct a range of quantitative and qualitative explorations, employing different machine learning techniques. We show how linguistic insight can advance technology, and how technology can advance linguistic insight, constituting a fruitful and promising interplay

Publikationsserver der Universität Tübingen

Situation inference and context recognition for intelligent mobile sensing applications

Author: Liono J
Publication venue: RMIT University
Publication date
Field of study

The usage of smart devices is an integral element in our daily life. With the richness of data streaming from sensors embedded in these smart devices, the applications of ubiquitous computing are limitless for future intelligent systems. Situation inference is a non-trivial issue in the domain of ubiquitous computing research due to the challenges of mobile sensing in unrestricted environments. There are various advantages to having robust and intelligent situation inference from data streamed by mobile sensors. For instance, we would be able to gain a deeper understanding of human behaviours in certain situations via a mobile sensing paradigm. It can then be used to recommend resources or actions for enhanced cognitive augmentation, such as improved productivity and better human decision making. Sensor data can be streamed continuously from heterogeneous sources with different frequencies in a pervasive sensing environment (e.g., smart home). It is difficult and time-consuming to build a model that is capable of recognising multiple activities. These activities can be performed simultaneously with different granularities. We investigate the separability aspect of multiple activities in time-series data and develop OPTWIN as a technique to determine the optimal time window size to be used in a segmentation process. As a result, this novel technique reduces need for sensitivity analysis, which is an inherently time consuming task. To achieve an effective outcome, OPTWIN leverages multi-objective optimisation by minimising the impurity (the number of overlapped windows of human activity labels on one label space over time series data) while maximising class separability. The next issue is to effectively model and recognise multiple activities based on the user's contexts. Hence, an intelligent system should address the problem of multi-activity and context recognition prior to the situation inference process in mobile sensing applications. The performance of simultaneous recognition of human activities and contexts can be easily affected by the choices of modelling approaches to build an intelligent model. We investigate the associations of these activities and contexts at multiple levels of mobile sensing perspectives to reveal the dependency property in multi-context recognition problem. We design a Mobile Context Recognition System, which incorporates a Context-based Activity Recognition (CBAR) modelling approach to produce effective outcome from both multi-stage and multi-target inference processes to recognise human activities and their contexts simultaneously. Upon our empirical evaluation on real-world datasets, the CBAR modelling approach has significantly improved the overall accuracy of simultaneous inference on transportation mode and human activity of mobile users. The accuracy of activity and context recognition can also be influenced progressively by how reliable user annotations are. Essentially, reliable user annotation is required for activity and context recognition. These annotations are usually acquired during data capture in the world. We research the needs of reducing user burden effectively during mobile sensor data collection, through experience sampling of these annotations in-the-wild. To this end, we design CoAct-nnotate --- a technique that aims to improve the sampling of human activities and contexts by providing accurate annotation prediction and facilitates interactive user feedback acquisition for ubiquitous sensing. CoAct-nnotate incorporates a novel multi-view multi-instance learning mechanism to perform more accurate annotation prediction. It also includes a progressive learning process (i.e., model retraining based on co-training and active learning) to improve its predictive performance over time. Moving beyond context recognition of mobile users, human activities can be related to essential tasks that the users perform in daily life. Conversely, the boundaries between the types of tasks are inherently difficult to establish, as they can be defined differently from the individuals' perspectives. Consequently, we investigate the implication of contextual signals for user tasks in mobile sensing applications. To define the boundary of tasks and hence recognise them, we incorporate such situation inference process (i.e., task recognition) into the proposed Intelligent Task Recognition (ITR) framework to learn users' Cyber-Physical-Social activities from their mobile sensing data. By recognising the engaged tasks accurately at a given time via mobile sensing, an intelligent system can then offer proactive supports to its user to progress and complete their tasks. Finally, for robust and effective learning of mobile sensing data from heterogeneous sources (e.g., Internet-of-Things in a mobile crowdsensing scenario), we investigate the utility of sensor data in provisioning their storage and design QDaS --- an application agnostic framework for quality-driven data summarisation. This allows an effective data summarisation by performing density-based clustering on multivariate time series data from a selected source (i.e., data provider). Thus, the source selection process is determined by the measure of data quality. Nevertheless, this framework allows intelligent systems to retain comparable predictive results by its effective learning on the compact representations of mobile sensing data, while having a higher space saving ratio. This thesis contains novel contributions in terms of the techniques that can be employed for mobile situation inference and context recognition, especially in the domain of ubiquitous computing and intelligent assistive technologies. This research implements and extends the capabilities of machine learning techniques to solve real-world problems on multi-context recognition, mobile data summarisation and situation inference from mobile sensing. We firmly believe that the contributions in this research will help the future study to move forward in building more intelligent systems and applications

RMIT Research Repository

Intelligent System Based Assessments of Academic Journals

Author: Su Pan
Publication venue
Publication date: 27/08/2015
Field of study

Aberystwyth Research Portal

Natural Language Processing: Emerging Neural Approaches and Applications

Author
Publication venue: 'MDPI AG'
Publication date: 06/05/2022
Field of study

This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

Directory of Open Access Books (DOAB)

Recommended from our members

The predictive power of stock micro-blogging sentiment in forecasting stock market behaviour

Author: Al Nasseri Alya Ali Mansoor
Publication venue: Brunel University London
Publication date: 01/01/2016
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonOnline stock forums have become a vital investing platform on which to publish relevant and valuable user-generated content (UGC) data such as investment recommendations and other stock-related information that allow investors to view the opinions of a large number of users and share-trading ideas. This thesis applies methods from computational linguistics and text-mining techniques to analyse and extract, on a daily basis, sentiments from stock-related micro-blogging messages called “StockTwits”. The primary aim of this research is to provide an understanding of the predictive ability of stock micro-blogging sentiments to forecast future stock price behavioural movements by investigating the various roles played by investor sentiments in determining asset pricing on the stock market. The empirical analysis in this thesis consists of four main parts based on the predictive power and the role of investor sentiment in the stock market. The first part discusses the findings of the text-mining procedure for extracting and predicting sentiments from stock-related micro-blogging data. The purpose is to provide a comparative textual analysis of different machine learning algorithms for the purpose of selecting the most accurate text-mining techniques for predicting sentiment analysis on StockTwits through the provision of two different applications of feature selection, namely filter and wrapper approaches. The second part of the analysis focuses on investigating the predictive correlations between StockTwits features and the stock market indicators. It aims to examine the explanatory power of StockTwits variables in explaining the dynamic nature of different financial market indicators. The third part of the analysis investigates the role played by noise traders in determining asset prices. The aim is to show that stock returns, volatility and trading volumes are affected by investor sentiment; it also seeks to investigate whether changes in sentiment (bullish or bearish) will have different effects on stock market prices. The fourth part offers an in-depth analysis of some tweet-market relationships which represent an open problem in the empirical literature (e.g. sentiment-return relations and volume-disagreement relations). The results suggest that StockTwits sentiments exhibit explanatory power in explaining the dynamics of stock prices in the U.S. market. Taking different approaches by combining text-mining techniques with feature selection methods has proved successful in predicting StockTwits sentiments. The applications of the approach presented in this thesis offer real-time investment ideas that may provide investors and their peers with a decision support mechanism. Investor sentiment plays a critical role in determining asset prices in capital markets. Overall, the findings suggest that investor sentiment among noise traders is a priced factor. The findings confirm the existence of asymmetric spillover effects of bullish and bearish sentiments on the stock market. They also suggest that sentiment is a significant factor in explaining stock price behaviour in the capital market and imply the positive role of the stock market in the formation of investor sentiment in stock markets. Furthermore, the research findings demonstrate that disagreement is not only an important factor in determining trading volumes but it is also considered a very significant factor in influencing asset prices and returns in capital markets. Overall, the findings of the thesis provide empirical evidence that failure to consider the role of investor sentiment in traditional finance theory could lead to an imperfect picture when explaining the behaviour of stock prices in stock market

Brunel University Research Archive

Contributions on distance-based algorithms, multi-classifier construction and pairwise classification

Author: Mendialdua Beitia Iñigo
Publication venue
Publication date: 01/01/2015
Field of study

179 p.Aurkezten den ikerketa lan honetan saikapen atazak landu dira, non helburua,sailkapen gainbegiratuaren artearen-egoera aberastea izan den. Sailkapengainbegiratuaren zenbait estrategi analizatu dira, beraien ezaugarri etaahuleziak aztertuz. Beraz, ezaugarri positiboak mantenduz, ahuleziak hobetzekosaiakera egin da. Hau burutu ahal izateko, sailkapen gainbegiratuarenzenbait estrategi konbinatzeaz gain, zenbait bilaketa heuristiko ere erabili dira.Sailkapen gainbegiratuko 3 ikerketa lerro desberdinetan burutu dira ekarpenak.Aurkezten diren lehenengo proposamenak, K-NN algoritmoan zentratzendira, honen zenbait bertsio aurkezten direlarik. Ondoren sailkatzaileen konbinaketarekinerlazionatutako beste lan bat aurkezten da. Eta azkenik, binakakosailkapenaren zenbait estrategi berritzaile proposatzen dira. Ekarpenhauek aldizkari edo konferentzi internazionaletan publikatuak edo bidaliakizan dira.Buruturiko experimentuetan, proposatutako algoritmoak artearen-estatuanaurkituriko zenbait algoritmorekin konparatu dira, emaitza interesgarriak lortuaz.Honetaz gain, emaitza hauetatik ondorio esanguratsuak eskuratzeko asmoz,test estatistikoen erabilera ere burutu da

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Language modelling for clinical natural language understanding and generation

Author: Zhang Jingqing
Publication venue: Computing, Imperial College London
Publication date: 01/12/2022
Field of study

One of the long-standing objectives of Artificial Intelligence (AI) is to design and develop algorithms for social good including tackling public health challenges. In the era of digitisation, with an unprecedented amount of healthcare data being captured in digital form, the analysis of the healthcare data at scale can lead to better research of diseases, better monitoring patient conditions and more importantly improving patient outcomes. However, many AI-based analytic algorithms rely solely on structured healthcare data such as bedside measurements and test results which only account for 20% of all healthcare data, whereas the remaining 80% of healthcare data is unstructured including textual data such as clinical notes and discharge summaries which is still underexplored. Conventional Natural Language Processing (NLP) algorithms that are designed for clinical applications rely on the shallow matching, templates and non-contextualised word embeddings which lead to limited understanding of contextual semantics. Though recent advances in NLP algorithms have demonstrated promising performance on a variety of NLP tasks in the general domain with contextualised language models, most of these generic NLP algorithms struggle at specific clinical NLP tasks which require biomedical knowledge and reasoning. Besides, there is limited research to study generative NLP algorithms to generate clinical reports and summaries automatically by considering salient clinical information. This thesis aims to design and develop novel NLP algorithms especially clinical-driven contextualised language models to understand textual healthcare data and generate clinical narratives which can potentially support clinicians, medical scientists and patients. The first contribution of this thesis focuses on capturing phenotypic information of patients from clinical notes which is important to profile patient situation and improve patient outcomes. The thesis proposes a novel self-supervised language model, named Phenotypic Intelligence Extraction (PIE), to annotate phenotypes from clinical notes with the detection of contextual synonyms and the enhancement to reason with numerical values. The second contribution is to demonstrate the utility and benefits of using phenotypic features of patients in clinical use cases by predicting patient outcomes in Intensive Care Units (ICU) and identifying patients at risk of specific diseases with better accuracy and model interpretability. The third contribution is to propose generative models to generate clinical narratives to automate and accelerate the process of report writing and summarisation by clinicians. This thesis first proposes a novel summarisation language model named PEGASUS which surpasses or is on par with the state-of-the-art performance on 12 downstream datasets including biomedical literature from PubMed. PEGASUS is further extended to generate medical scientific documents from input tabular data.Open Acces

Spiral - Imperial College Digital Repository

Language and mind

Author
Publication venue: Hrvatsko društvo za primijenjenu lingvistiku
Publication date: 09/05/2018
Field of study

Repository of University of Primorska