Search CORE

298 research outputs found

Interactive Visual Labelling versus Active Learning: An Experimental Comparison

Author: A Culotta
A Inselberg
B Höferlin
B Settles
B Settles
CM Bishop
D Ceneda
D Kottke
F Heimerl
I Jolliffe
J Attenberg
J Bernard
J Bernard
J Bernard
JB Kruskal
L Shao
L van der Maaten
M Chegini
M Chegini
M Chegini
M Chegini
M Hall
T Scheffer
T Schreck
TK Ho
Y LeCun
Y Wu
Publication venue: 'Zhejiang University Press'
Publication date: 01/01/2020
Field of study

Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data analysis. The quality of supervised maching learning depends not only on the type of algorithm used, but also on the quality of the labelled dataset used to train the classifier. Labelling instances in a training dataset is often done manually relying on selections and annotations by expert analysts, and is often a tedious and time-consuming process. Active learning algorithms can automatically determine a subset of data instances for which labels would provide useful input to the learning process. Interactive visual labelling techniques are a promising alternative, providing effective visual overviews from which an analyst can simultaneously explore data records and select items to a label. By putting the analyst in the loop, higher accuracy can be achieved in the resulting classifier. While initial results of interactive visual labelling techniques are promising in the sense that user labelling can improve supervised learning, many aspects of these techniques are still largely unexplored. This paper presents a study conducted using the mVis tool to compare three interactive visualisations, similarity map, scatterplot matrix (SPLOM), and parallel coordinates, with each other and with active learning for the purpose of labelling a multivariate dataset. The results show that all three interactive visual labelling techniques surpass active learning algorithms in terms of classifier accuracy, and that users subjectively prefer the similarity map over SPLOM and parallel coordinates for labelling. Users also employ different labelling strategies depending on the visualisation used

Crossref

TUGraz OPEN Library

MPG.PuRe

Recommended from our members

Human-in-the-Loop: Visual Analytics for Building Models Recognising Behavioural Patterns in Time Series

Author: Andrienko G.
Andrienko N.
Artikis A.
Mantenoglou P.
Rinzivillo S.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 20/03/2024
Field of study

Results of automated detection of complex patterns in temporal data, such as trajectories of moving objects, may be not good enough due to the use of strict pattern specifications derived from imprecise domain concepts. To address this challenge, we propose a novel visual analytics approach that combines expert knowledge and automated pattern detection results to construct features that effectively distinguish patterns of interest from other types of behaviour. These features are then used to create interactive visualisations enabling a human analyst to generate labelled examples for building a feature-based pattern classifier. We evaluate our approach through a case study focused on detecting trawling activities in fishing vessel trajectories, demonstrating significant improvements in pattern recognition by leveraging domain knowledge and incorporating human reasoning and feedback. Our contribution is a novel framework that integrates human expertise and analytical reasoning with ML or AI techniques, advancing the field of data analytics

City Research Online

A study on labeling network hostile behavior with Intelligent Interactive tools

Author: Catania Carlos Adrian
Guerra Torres Jorge Luis
Veas Eduardo Enrique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Labeling a real network dataset is specially expensive in computersecurity, as an expert has to ponder several factors before assigningeach label. This paper describes an interactive intelligent systemto support the task of identifying hostile behaviors in network logs.The RiskID application uses visualizations to graphically encodefeatures of network connections and promote visual comparison. Inthe background, two algorithms are used to actively organize con-nections and predict potential labels: a recommendation algorithmand a semi-supervised learning strategy. These algorithms togetherwith interactive adaptions to the user interface constitute a behaviorrecommendation. A study is carried out to analyze how the algo-rithms for recommendation and prediction influence the workflowof labeling a dataset. The results of a study with 16 participantsindicate that the behaviour recommendation significantly improvesthe quality of labels. Analyzing interaction patterns, we identify amore intuitive workflow used when behaviour recommendation isavailable.Fil: Guerra Torres, Jorge Luis. Universidad Nacional de Cuyo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Veas, Eduardo Enrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentina. Universidad Nacional de Cuyo; ArgentinaFil: Catania, Carlos Adrian. Universidad Nacional de Cuyo; Argentina2019 IEEE Symposium on Visualization for Cyber SecurityVancouverCanadáInstitute of Electrical and Electronics Engineer

CONICET Digital

A multi-site quad-band radio frequency interference monitoring alerting and reporting system

Author: Bryne T.H.
Hakegard J.E.
Morrison Aiden J.
Ruotsalainen L.
Sokolova N.
Publication venue: IEEE
Publication date: 01/01/2020
Field of study

This paper reviews the motivation behind and development of a deployable Radio Frequency Interference (RFI) detection, alerting and reporting system which simultaneously monitors all Global Navigation Satellite System (GNSS) L-band signal transmission for disruption, captures interference events, characterizes them, notifies stakeholders of event occurrence and lastly marshals the captured data to cloud storage. Results of a multi-site international deployment program are presented and discussed. © 2020 German Institute of Navigation - DGON.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The state of the art in integrating machine learning into visual analytics

Author: Díaz Blanco I.
Endert A.
Nabney I.
Ribarsky W.
Rossi F.
Turkay C.
Wong B.L. William
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

Visual analytics systems combine machine learning or other analytic techniques with interactive data visualization to promote sensemaking and analytical reasoning. It is through such techniques that people can make sense of large, complex data. While progress has been made, the tactful combination of machine learning and data visualization is still under-explored. This state-of-the-art report presents a summary of the progress that has been made by highlighting and synthesizing select research advances. Further, it presents opportunities and challenges to enhance the synergy between machine learning and visual analytics for impactful future research directions

arXiv.org e-Print Archive

City Research Online

Crossref

Repositorio Institucional de la Universidad de Oviedo

Aston Publications Explorer

Middlesex University Research Repository

HAL-Paris1

Explore Bristol Research

Explainable shared control in assistive robotics

Author: Zolotas Mark
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/02/2021
Field of study

Shared control plays a pivotal role in designing assistive robots to complement human capabilities during everyday tasks. However, traditional shared control relies on users forming an accurate mental model of expected robot behaviour. Without this accurate mental image, users may encounter confusion or frustration whenever their actions do not elicit the intended system response, forming a misalignment between the respective internal models of the robot and human. The Explainable Shared Control paradigm introduced in this thesis attempts to resolve such model misalignment by jointly considering assistance and transparency. There are two perspectives of transparency to Explainable Shared Control: the human's and the robot's. Augmented reality is presented as an integral component that addresses the human viewpoint by visually unveiling the robot's internal mechanisms. Whilst the robot perspective requires an awareness of human "intent", and so a clustering framework composed of a deep generative model is developed for human intention inference. Both transparency constructs are implemented atop a real assistive robotic wheelchair and tested with human users. An augmented reality headset is incorporated into the robotic wheelchair and different interface options are evaluated across two user studies to explore their influence on mental model accuracy. Experimental results indicate that this setup facilitates transparent assistance by improving recovery times from adverse events associated with model misalignment. As for human intention inference, the clustering framework is applied to a dataset collected from users operating the robotic wheelchair. Findings from this experiment demonstrate that the learnt clusters are interpretable and meaningful representations of human intent. This thesis serves as a first step in the interdisciplinary area of Explainable Shared Control. The contributions to shared control, augmented reality and representation learning contained within this thesis are likely to help future research advance the proposed paradigm, and thus bolster the prevalence of assistive robots.Open Acces

Spiral - Imperial College Digital Repository

NetLabeller:architecture with data extraction and labelling framework for beyond 5G networks

Author: Alcaraz-Calero Jose M.
Andrade Hoz Jimena
Wang Qi
Publication venue
Publication date: 29/02/2024
Field of study

The next generation of network capabilities coupled with artificial intelligence (AI) can provide innovative solutions for network control and self-optimisation. Network control demands a detailed knowledge of the network components to enforce the correct control rules. To this end, an immense number of metrics related to devices, flows, network rules, etc. can be used to describe the state of the network and to gain insights about which rule to enforce depending on the context. However, selection of the most relevant metrics often proves challenging and there is no readily available tool that can facilitate the dataset extraction and labelling for AI model training. This research work therefore first develops an analysis of the most relevant metrics in terms of network control to create a training dataset for future AI development purposes. It then presents a new architecture to allow the extraction of these metrics from a 5G network with a novel dataset visualisation and labelling tool to help perform the exploratory analysis and the labelling process of the resultant dataset. It is expected that the proposed architecture and its associated tools would significantly speed up the training process, which is crucial for the data-driven approach in developing AI-based network control capabilities

Research Repository and Portal - University of the West of Scotland

Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

Author: Glaab Enrico
Publication venue
Publication date: 15/10/2011
Field of study

Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge. Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness. The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results: (1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining) (2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre) (3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs) (4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre) In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings

Nottingham eTheses

Visual Analysis of Large, Time-Dependent, Multi-Dimensional Smart Sensor Tracking Data

Author: James Walker
Publication venue: 'Swansea University'
Publication date: 01/01/2017
Field of study

Technological advancements over the past decade have increased our ability to collect data to previously unimaginable volumes [Kei02]. Understanding temporal patterns is key to gaining knowledge and insight. However, our capacity to store data now far exceeds the rate at which we are able to understand it [KKEM10]. This phenomenon has led to a growing need for advanced solutions to make sense and use of an ever-increasing data space. Abstract temporal data provides additional challenges in its, representation, size, and scalability, high dimensionality, and unique structure.One instance of such temporal data is acquired from smart sensor tags attached to freely roaming animals recording multiple parameters at infra-second rates which are becoming commonplace, and are transforming biologists understanding of the way wild animals behave.The excitement at the potential inherent in sophisticated tracking devices has, however, been limited by a lack of available software to advance research in the field. This thesis introduces methodologies to deal with the problem of the analysis of the large, multi-dimensional, time-dependent data acquired. Interpretation of such data is complex and currently limits the ability of biologists to realise the value of their recorded information.We present several contributions to the field of time-series visualisation, that is, the visualisation of ordered collections of real value data attributes at successive points in time sampled at uniform time intervals. Traditionally, time-series graphs have been used for temporal data. However, screen resolution is small in comparison to the large information space commonplace today. In such cases, we can only render a proportion of the data.It is widely accepted that the effective interpretation of large temporal data sets requires advanced methods and interaction techniques. In this thesis, we address these issues to enhance the exploration, analysis, and presentation of time-series data for movement ecologists in their smart sensor data analysis

Crossref

Cronfa at Swansea University

Development of Mining Sector Applications for Emerging Remote Sensing and Deep Learning Technologies

Author: Gallwey J
Publication venue: Camborne School of Mines
Publication date: 24/06/2021
Field of study

This thesis uses neural networks and deep learning to address practical, real-world problems in the mining sector. The main focus is on developing novel applications in the area of object detection from remotely sensed data. This area has many potential mining applications and is an important part of moving towards data driven strategic decision making across the mining sector. The scientific contributions of this research are twofold; firstly, each of the three case studies demonstrate new applications which couple remote sensing and neural network based technologies for improved data driven decision making. Secondly, the thesis presents a framework to guide implementation of these technologies in the mining sector, providing a guide for researchers and professionals undertaking further studies of this type. The first case study builds a fully connected neural network method to locate supporting rock bolts from 3D laser scan data. This method combines input features from the remote sensing and mobile robotics research communities, generating accuracy scores up to 22% higher than those found using either feature set in isolation. The neural network approach also is compared to the widely used random forest classifier and is shown to outperform this classifier on the test datasets. Additionally, the algorithms’ performance is enhanced by adding a confusion class to the training data and by grouping the output predictions using density based spatial clustering. The method is tested on two datasets, gathered using different laser scanners, in different types of underground mines which have different rock bolting patterns. In both cases the method is found to be highly capable of detecting the rock bolts with recall scores of 0.87-0.96. The second case study investigates modern deep learning for LiDAR data. Here, multiple transfer learning strategies and LiDAR data representations are examined for the task of identifying historic mining remains. A transfer learning approach based on a Lunar crater detection model is used, due to the task similarities between both the underlying data structures and the geometries of the objects to be detected. The relationship between dataset resolution and detection accuracy is also examined, with the results showing that the approach is capable of detecting pits and shafts to a high degree of accuracy with precision and recall scores between 0.80-0.92, provided the input data is of sufficient quality and resolution. Alongside resolution, different LiDAR data representations are explored, showing that the precision-recall balance varies depending on the input LiDAR data representation. The third case study creates a deep convolutional neural network model to detect artisanal scale mining from multispectral satellite data. This model is trained from initialisation without transfer learning and demonstrates that accurate multispectral models can be built from a smaller training dataset when appropriate design and data augmentation strategies are adopted. Alongside the deep learning model, novel mosaicing algorithms are developed both to improve cloud cover penetration and to decrease noise in the final prediction maps. When applied to the study area, the results from this model provide valuable information about the expansion, migration and forest encroachment of artisanal scale mining in southwestern Ghana over the last four years. Finally, this thesis presents an implementation framework for these neural network based object detection models, to generalise the findings from this research to new mining sector deep learning tasks. This framework can be used to identify applications which would benefit from neural network approaches; to build the models; and to apply these algorithms in a real world environment. The case study chapters confirm that the neural network models are capable of interpreting remotely sensed data to a high degree of accuracy on real world mining problems, while the framework guides the development of new models to solve a wide range of related challenges

Open Research Exeter