Search CORE

145 research outputs found

Analyzing Granger causality in climate data with time series classification methods

Author: Decubber Stijn
Demuzere Matthias
Miralles Diego
Papagiannopoulou Christina
Verhoest Niko
Waegeman Willem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

Ghent University Academic Bibliography

Detection of Software Vulnerability Communication in Expert Social Media Channels: A Data-driven Approach

Author: Queiroz Andrei Lima
Publication venue: Dublin Institute of Technology
Publication date: 01/09/2020
Field of study

Conceptually, a vulnerability is: A flaw or weakness in a system’s design, implementation,or operation and management that could be exploited to violate the system’s security policy .Some of these flaws can go undetected and exploited for long periods of time after soft-ware release. Although some software providers are making efforts to avoid this situ-ation, inevitability, users are still exposed to vulnerabilities that allow criminal hackersto take advantage. These vulnerabilities are constantly discussed in specialised forumson social media. Therefore, from a cyber security standpoint, the information found inthese places can be used for countermeasures actions against malicious exploitation ofsoftware. However, manual inspection of the vast quantity of shared content in socialmedia is impractical. For this reason, in this thesis, we analyse the real applicability ofsupervised classification models to automatically detect software vulnerability com-munication in expert social media channels. We cover the following three principal aspects: Firstly, we investigate the applicability of classification models in a range of 5 differ-ent datasets collected from 3 Internet Domains: Dark Web, Deep Web and SurfaceWeb. Since supervised models require labelled data, we have provided a systematiclabelling process using multiple annotators to guarantee accurate labels to carry outexperiments. Using these datasets, we have investigated the classification models withdifferent combinations of learning-based algorithms and traditional features represen-tation. Also, by oversampling the positive instances, we have achieved an increaseof 5% in Positive Recall (on average) in these models. On top of that, we have appiiplied Feature Reduction, Feature Extraction and Feature Selection techniques, whichprovided a reduction on the dimensionality of these models without damaging the accuracy, thus, providing computationally efficient models. Furthermore, in addition to traditional features representation, we have investigated the performance of robust language models, such as Word Embedding (WEMB) andSentence Embedding (SEMB) on the accuracy of classification models. RegardingWEMB, our experiment has shown that this model trained with a small security-vocabulary dataset provides comparable results with WEMB trained in a very large general-vocabulary dataset. Regarding SEMB model, our experiment has shown thatits use overcomes WEMB model in detecting vulnerability communication, recording 8% of Avg. Class Accuracy and 74% of Positive Recall. In addition, we investigate twoDeep Learning algorithms as classifiers, text CNN (Convolutional Neural Network)and RNN (Recurrent Neural Network)-based algorithms, which have improved ourmodel, resulting in the best overall performance for our task

Arrow@TUDublin

Ordinal HyperPlane Loss

Author: Vanderheyden Bob
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/01/2003
Field of study

This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize the ordering information of classes. Alternatively, the researcher may choose to treat the ordered classes as though they are continuous values. This strategy imposes a strong assumption that the real “distance” between two adjacent classes is equal to the distance between two other adjacent classes (e.g., a rating of ‘0’ versus ‘1,’ on an 11-point scale is the same distance as a ‘9’ versus a ‘10’). For Deep Neural Networks (DNNs), the problem of predicting k ordinal classes is typically addressed by performing k-1 binary classifications. These models may be estimated within a single DNN and require an evaluation strategy to determine the class prediction. Another common option is to treat ordinal classes as continuous values for regression and then adjust the cutoff points that represent class boundaries that differentiate one class from another. This research reviews a novel loss function called Ordinal Hyperplane Loss (OHPL) that is particularly designed for data with ordinal classes. OHPLnet has been demonstrated to be a significant advancement in predicting ordinal classes for industry standard structured datasets. The loss function also enables deep learning techniques to be applied to the ordinal classification problem of unstructured data. By minimizing OHPL, a deep neural network learns to map data to an optimal space in which the distance between points and their class centroids are minimized while a nontrivial ordering relationship among classes are maintained. The research reported in this document advances OHPL loss, from a minimally viable loss function, to a more complete deep learning methodology. New analysis strategies were developed and tested that improve model performance as well as algorithm consistency in developing classification models. In the applications chapters, a new algorithm variant is introduced that enables OHPLall to be used when large data records cause a severe limitation on batch size when developing a related Deep Neural Network

Pacific McGeorge School of Law

DigitalCommons@Kennesaw State University

Scholarly Commons

User Attribute Inference via Mining User-Generated Data

Author: Ding Shichang
Publication venue
Publication date: 01/12/2020
Field of study

Georg-August-University Göttingen

Customer churn prediction using composite deep learning technique

Author: Ahmad Hussain
Asghar Muhammad Usama
Asghar Muhammad Zubair
Khan Aurangzeb
Khattak Asad
Mehak Zartashia
Publication venue: ZU Scholars
Publication date: 01/12/2023
Field of study

Customer churn, a phenomenon that causes large financial losses when customers leave a business, makes it difficult for modern organizations to retain customers. When dissatisfied customers find their present company\u27s services inadequate, they frequently migrate to another service provider. Machine learning and deep learning (ML/DL) approaches have already been used to successfully identify customer churn. In some circumstances, however, ML/DL-based algorithms lacks in delivering promising results for detecting client churn. Previous research on estimating customer churn revealed unexpected forecasts when utilizing machine learning classifiers and traditional feature encoding methodologies. Deep neural networks were also used in these efforts to extract features without taking into account the sequence information. In view of these issues, the current study provides an effective method for predicting customer churn based on a hybrid deep learning model termed BiLSTM-CNN. The goal is to effectively estimate customer churn using benchmark data and increase the churn prediction process\u27s accuracy. The experimental results show that when trained, tested, and validated on the benchmark dataset, the proposed BiLSTM-CNN model attained a remarkable accuracy of 81%

ZU Scholars (Zayed University)

Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

Author: Bunte Kerstin
Canducci Marco
De Rijcke Sven
Mastropietro Michele
Peletier Reynier
Taghribi Albolfazl
Tino Peter
Yin H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2021
Field of study

The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Birmingham Research Portal

Dissertations of the University of Groningen

Predicting the Need for Urgent Instructor Intervention in MOOC Environments

Author: ALRAJHI LAILA,MOHAMMED
Publication venue
Publication date: 01/01/2024
Field of study

In recent years, massive open online courses (MOOCs) have become universal knowledge resources and arguably one of the most exciting innovations in e-learning environments. MOOC platforms comprise numerous courses covering a wide range of subjects and domains. Thousands of learners around the world enrol on these online platforms to satisfy their learning needs (mostly) free of charge. However, the retention rates of MOOC courses (i.e., those who successfully complete a course of study) are low (around 10% on average); dropout rates tend to be very high (around 90%). The principal channel via which MOOC learners can communicate their difficulties with the learning content and ask for assistance from instructors is by posting in a dedicated MOOC forum. Importantly, in the case of learners who are suffering from burnout or stress, some of these posts require urgent intervention. Given the above, urgent instructor intervention regarding learner requests for assistance via posts made on MOOC forums has become an important topic for research among researchers. Timely intervention by MOOC instructors may mitigate dropout issues and make the difference between a learner dropping out or staying on a course. However, due to the typically extremely high learner-to-instructor ratio in MOOCs and the often-huge numbers of posts on forums, while truly urgent posts are rare, managing them can be very challenging –– if not sometimes impossible. Instructors can find it challenging to monitor all existing posts and identify which posts require immediate intervention to help learners, encourage retention, and reduce the current high dropout rates. The main objective of this research project, therefore, was thus to mine and analyse learners’ MOOC posts as a fundamental step towards understanding their need for instructor intervention. To achieve this, the researcher proposed and built comprehensive classification models to predict the need for instructor intervention. The ultimate goal is to help instructors by guiding them to posts, topics, and learners that require immediate interventions. Given the above research aim the researcher conducted different experiments to fill the gap in literature based on different platform datasets (the FutureLearn platform and the Stanford MOOCPosts dataset) in terms of the former, three MOOC corpora were prepared: two of them gold-standard MOOC corpora to identify urgent posts, annotated by selected experts in the field; the third is a corpus detailing learner dropout. Based in these datasets, different architectures and classification models based on traditional machine learning, and deep learning approaches were proposed. In this thesis, the task of determining the need for instructor intervention was tackled from three perspectives: (i) identifying relevant posts, (ii) identifying relevant topics, and (iii) identifying relevant learners. Posts written by learners were classified into two categories: (i) (urgent) intervention and (ii) (non-urgent) intervention. Also, learners were classified into: (i) requiring instructor intervention (at risk of dropout) and (ii) no need for instructor intervention (completer). In identifying posts, two experiments were used to contribute to this field. The first is a novel classifier based on a deep learning model that integrates novel MOOC post dimensions such as numerical data in addition to textual data; this represents a novel contribution to the literature as all available models at the time of writing were based on text-only. The results demonstrate that the combined, multidimensional features model proposed in this project is more effective than the text-only model. The second contribution relates to creating various simple and hybrid deep learning models by applying plug & play techniques with different types of inputs (word-based or word-character-based) and different ways of representing target input words as vector representations of a particular word. According to the experimental findings, employing Bidirectional Encoder Representations from Transformers (BERT) for word embedding rather than word2vec as the former is more effective at the intervention task than the latter across all models. Interestingly, adding word-character inputs with BERT does not improve performance as it does for word2vec. Additionally, on the task of identifying topics, this is the first time in the literature that specific language terms to identify the need for urgent intervention in MOOCs were obtained. This was achieved by analysing learner MOOC posts using latent Dirichlet allocation (LDA) and offers a visualisation tool for instructors or learners that may assist them and improve instructor intervention. In addition, this thesis contributes to the literature by creating mechanisms for identifying MOOC learners who may need instructor intervention in a new context, i.e., by using their historical online forum posts as a multi-input approach for other deep learning architectures and Transformer models. The findings demonstrate that using the Transformer model is more effective at identifying MOOC learners who require instructor intervention. Next, the thesis sought to expand its methodology to identify posts that relate to learner behaviour, which is also a novel contribution, by proposing a novel priority model to identify the urgency of intervention building based on learner histories. This model can classify learners into three groups: low risk, mid risk, and high risk. The results show that the completion rates of high-risk learners are very low, which confirms the importance of this model. Next, as MOOC data in terms of urgent posts tend to be highly unbalanced, the thesis contributes by examining various data balancing methods to spot situations in which MOOC posts urgently require instructor assistance. This included developing learner and instructor models to assist instructors to respond to urgent MOOCs posts. The results show that models with undersampling can predict the most urgent cases; 3x augmentation + undersampling usually attains the best performance. Finally, for the first time, this thesis contributes to the literature by applying text classification explainability (eXplainable Artificial Intelligence (XAI)) to an instructor intervention model, demonstrating how using a reliable predictor in combination with XAI and colour-coded visualisation could be utilised to assist instructors in deciding when posts require urgent intervention, as well as supporting annotators to create high-quality, gold-standard datasets to determine posts cases where urgent intervention is required

Durham e-Theses

Recommended from our members

Enhancing YouTube Spam Detection

Author: Pesaru Sai Charan
Publication venue: CSUSB ScholarWorks
Publication date: 01/08/2024
Field of study

This culminating experience project investigated various methods for enhancing spam detection on YouTube, a prevalent issue impacting user experience and platform integrity. The research questions addressed were: Q1) How do different spam detection methods compare regarding robustness, efficiency, and accuracy? Q2) What role do deep learning approaches like RNNs and CNNs play in improving spam comment identification? Q3) What are the unique benefits of using deep learning models for spam comment identification on YouTube? Q4) How can machine learning models be optimized for real-time spam detection on YouTube? The study gave adequate findings that explained each research question. In the case of (Q1), while algorithms like the Naïve Bayes and Logistic Regression offered precision in identifying spam emails, the models have proven ineffectual at adapting to new forms of spam and constant enhancement in spam techniques, deep learning algorithms like the CNN and RNN offered high accuracy through their robustness due to the models\u27 abilities of feature extraction independently from the text data. The results shown in (Q2) indicate that RNNs and CNNs are critical in transforming the level of spam detection by addressing the problem of semantic meaning and temporal relationships in comments and surpassing traditional methods. Concerning (Q3), it was pointed out that deep learning models are the most accurate, scalable, and resistant to false negatives when identifying spam comments on the videos hosted on YouTube, which helps regain users\u27 trust and enhance the platform\u27s security as the traffic continues to grow. (Q4) was focused on advancing machine learning models for real-time processing, using methods such as model pruning and distribution. The findings were as follows: (Q1) found that although conventional approaches are efficient at meeting accurate results, deep learning models are highly effective in dealing with the changes in spam strategies. (Q2) pointed out that RNNs and CNNs contribute immensely to discovering spam in SM platforms due to their raw power in NLP and pattern recognition. (Q3) established that the deep learning models\u27 accuracy, scalability, and adaptability, including CNN and RNN, are beneficial in identifying spam on YouTube due to their effectiveness in tackling the ever-evolving spam tactics. (Q4) It has emerged that the fine-tuning of machine learning models is imperative for scaling up the approaches by deploying high-end methodologies for real-time spam detection, which subserves the daunting task of training the algorithms to deal with the flood of user-generated content in the context of YouTube. Areas of further study include analyzing other complex natural language processing methods combined with classifiers for better spam identification, improving the computational time for multi-modal learning for spam comment detection, and considering federated learning for real-time spam identification on platforms such as YouTube. These research directions are being carried out to boost the existing permutations and improve the permeate spam detection technologies in Information Systems so that they can be efficient, effective, and highly accurate systems capable of coping with the newly emerged spam technologies in flexible, transparent, and effective ways

CSUSB ScholarWorks

Machine learning applications on time series data for systematic investing

Author: Fons Elizabeth
Publication venue
Publication date: 31/12/2022
Field of study

The University of Manchester - Institutional Repository

A food recipe recommendation system based on nutritional factors in the Finnish food communit

Author: Walpitage A. (Ashan)
Publication venue: University of Oulu
Publication date: 30/06/2023
Field of study

Abstract. This thesis presents a comprehensive study on the relationships between user feedback, recipe content, and additional factors in the context of a recipe recommendation system. The aim was to investigate the influence of various factors on user ratings and comments related to nutritional variables, while also exploring the potential for personalized recipe suggestions. Statistical analysis, clustering techniques, and sentiment analysis were employed to analyze a dataset of food recipes and user feedback. We determined that user feedback is a complex phenomenon influenced by subjective factors beyond recipe content alone. Cluster analysis identified four distinct clusters within the dataset, highlighting variations in nutritional values and sentiment among recipes. However, due to an imbalanced distribution within the clusters, these relationships were not considered in the recommendation system. To address the absence of user-related data, a content-based filtering approach was implemented, utilizing nutritional factors and a health factor calculation. The system provides personalized recipe recommendations based on nutritional similarity and health considerations. A maximum limit of 20 recommended recipes was set, allowing users to specify the desired number of recommendations. The accompanying API also provides a mean squared error metric to assess recommendation quality. This research contributes to a better understanding of user preferences, recipe content, and the challenges in developing effective recommendation systems for food recipes

University of Oulu Repository - Jultika