Search CORE

20 research outputs found

A Comprehensive Survey on Rare Event Prediction

Author: Sheth Amit
Shyalika Chathurangi
Wickramarachchi Ruwan
Publication venue
Publication date: 20/09/2023
Field of study

Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page

arXiv.org e-Print Archive

A Benchmark Knowledge Graph of Driving Scenes for Knowledge Completion Tasks

Author: Henson Cory
Sheth Amit
Wickramarachchi Ruwan
Publication venue: Scholar Commons
Publication date: 11/11/2024
Field of study

Knowledge graph completion (KGC) is a problem of significant importance due to the inherent incompleteness in knowledge graphs (KGs). The current approaches for KGC using link prediction (LP) mostly rely on a common set of benchmark datasets that are quite different from real-world industrial KGs. Therefore, the adaptability of current LP methods for real-world KGs and domain-specific ap- plications is questionable. To support the evaluation of current and future LP and KGC methods for industrial KGs, we introduce DSceneKG, a suite of real-world driving scene knowledge graphs that are currently being used across various industrial applications. The DSceneKG is publicly available at: https://github.com/ruwantw/DSceneKG

Scholar Commons - Institutional Repository of the University of South Carolina

An Evaluation of Knowledge Graph Embeddings for Autonomous Driving Data: Experience and Practice

Author: Henson Cory
Sheth Amit
Wickramarachchi Ruwan
Publication venue
Publication date: 29/02/2020
Field of study

The autonomous driving (AD) industry is exploring the use of knowledge graphs (KGs) to manage the vast amount of heterogeneous data generated from vehicular sensors. The various types of equipped sensors include video, LIDAR and RADAR. Scene understanding is an important topic in AD which requires consideration of various aspects of a scene, such as detected objects, events, time and location. Recent work on knowledge graph embeddings (KGEs) - an approach that facilitates neuro-symbolic fusion - has shown to improve the predictive performance of machine learning models. With the expectation that neuro-symbolic fusion through KGEs will improve scene understanding, this research explores the generation and evaluation of KGEs for autonomous driving data. We also present an investigation of the relationship between the level of informational detail in a KG and the quality of its derivative embeddings. By systematically evaluating KGEs along four dimensions -- i.e. quality metrics, KG informational detail, algorithms, and datasets -- we show that (1) higher levels of informational detail in KGs lead to higher quality embeddings, (2) type and relation semantics are better captured by the semantic transitional distance-based TransE algorithm, and (3) some metrics, such as coherence measure, may not be suitable for intrinsically evaluating KGEs in this domain. Additionally, we also present an (early) investigation of the usefulness of KGEs for two use-cases in the AD domain.Comment: 11 pages, To appear in AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020

arXiv.org e-Print Archive

Scholar Commons - Institutional Repository of the University of South Carolina

Towards Efficient Scoring of Student-generated Long-form Analogies in STEM

Author: Shalin Valerie L.
Sheth Amit P.
Wickramarachchi Ruwan
Wijesiriwardene Thilini
Publication venue: Scholar Commons
Publication date: 12/09/2022
Field of study

Switching from an analogy pedagogy based on comprehension to analogy pedagogy based on production raises an impractical manual analogy scoring problem. Conventional symbol-matching approaches to computational analogy evaluation focus on positive cases, and challenge computational feasibility. This work presents the Discriminative Analogy Features (DAF) pipeline to identify the discriminative features of strong and weak long-form text analogies. We introduce four feature categories (semantic, syntactic, sentiment, and statistical) used with supervised vector-based learning methods to discriminate between strong and weak analogies. Using a modestly sized vector of engineered features with SVM attains a 0.67 macro F1 score. While a semantic feature is the most discriminative, out of the top 15 discriminative features, most are syntactic. Combining these engineered features with an ELMo-generated embedding still improves classification relative to an embedding alone. While an unsupervised K-Means clustering-based approach falls short, similar hints of improvement appear when inputs include the engineered features used in supervised learning

Scholar Commons - Institutional Repository of the University of South Carolina

Tutorial: Knowledge-infused Learning for Autonomous Driving (KL4AD)

Author: Henson Cory
Monka Sebastian
Sheth Amit
Stepanova Daria
Wickramarachchi Ruwan
Publication venue: Scholar Commons
Publication date: 24/10/2022
Field of study

Autonomous Driving (AD) is considered as a testbed for tackling many hard AI problems. Despite the recent advancements in the field, AD is still far from achieving full autonomy due to core technical problems inherent in AD. The emerging field of neuro-symbolic AI and the methods for knowledge-infused learning are showing exciting ways of leveraging external knowledge within machine/deep learning solutions, with the potential benefits for interpretability, explainability, robustness, and transferability. In this tutorial, we will examine the use of knowledge-infused learning for three core state-of-the-art technical achievements within the AD domain. With a collaborative team from both academia and industry, we will demonstrate recent innovations using real-world datasets

Scholar Commons - Institutional Repository of the University of South Carolina

An Evaluation of Knowledge Graph Embeddings for Autonomous Driving Data: Experience and Practice

Author: Henson Cory
Sheth Amit
Wickramarachchi Ruwan
Publication venue: Scholar Commons
Publication date: 01/03/2020
Field of study

The autonomous driving (AD) industry is exploring the use of knowledge graphs (KGs) to manage the vast amount of heterogeneous data generated from vehicular sensors. The various types of equipped sensors include video, LIDAR and RADAR. Scene understanding is an important topic in AD which requires consideration of various aspects of a scene, such as detected objects, events, time and location. Recent work on knowledge graph embeddings (KGEs) - an approach that facilitates neuro-symbolic fusion - has shown to improve the predictive performance of machine learning models. With the expectation that neuro-symbolic fusion through KGEs will improve scene understanding, this research explores the generation and evaluation of KGEs for autonomous driving data. We also present an investigation of the relationship between the level of informational detail in a KG and the quality of its derivative embeddings. By systematically evaluating KGEs along four dimensions – i.e. quality metrics, KG informational detail, algorithms, and datasets – we show that (1) higher levels of informational detail in KGs lead to higher quality embeddings, (2) type and relation semantics are better captured by the semantic transitional distance-based TransE algorithm, and (3) some metrics, such as coherence measure, may not be suitable for intrinsically evaluating KGEs in this domain. Additionally, we also present an (early) investigation of the usefulness of KGEs for two use-cases in the AD domain

Scholar Commons - Institutional Repository of the University of South Carolina