147 research outputs found
Visual and interactive exploration of point data
Point data, such as Unit Postcodes (UPC), can provide very detailed information at fine
scales of resolution. For instance, socio-economic attributes are commonly assigned to
UPC. Hence, they can be represented as points and observable at the postcode level.
Using UPC as a common field allows the concatenation of variables from disparate data
sources that can potentially support sophisticated spatial analysis. However, visualising
UPC in urban areas has at least three limitations. First, at small scales UPC occurrences
can be very dense making their visualisation as points difficult. On the other hand,
patterns in the associated attribute values are often hardly recognisable at large scales.
Secondly, UPC can be used as a common field to allow the concatenation of highly
multivariate data sets with an associated postcode. Finally, socio-economic variables
assigned to UPC (such as the ones used here) can be non-Normal in their distributions
as a result of a large presence of zero values and high variances which constrain their
analysis using traditional statistics.
This paper discusses a Point Visualisation Tool (PVT), a proof-of-concept system
developed to visually explore point data. Various well-known visualisation techniques
were implemented to enable their interactive and dynamic interrogation. PVT provides
multiple representations of point data to facilitate the understanding of the relations
between attributes or variables as well as their spatial characteristics. Brushing between
alternative views is used to link several representations of a single attribute, as well as
to simultaneously explore more than one variable. PVT’s functionality shows how the
use of visual techniques embedded in an interactive environment enable the exploration
of large amounts of multivariate point data
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
AUTOMATED ARTIFACT REMOVAL AND DETECTION OF MILD COGNITIVE IMPAIRMENT FROM SINGLE CHANNEL ELECTROENCEPHALOGRAPHY SIGNALS FOR REAL-TIME IMPLEMENTATIONS ON WEARABLES
Electroencephalogram (EEG) is a technique for recording asynchronous activation of neuronal firing inside the brain with non-invasive scalp electrodes. EEG signal is well studied to evaluate the cognitive state, detect brain diseases such as epilepsy, dementia, coma, autism spectral disorder (ASD), etc. In this dissertation, the EEG signal is studied for the early detection of the Mild Cognitive Impairment (MCI). MCI is the preliminary stage of Dementia that may ultimately lead to Alzheimers disease (AD) in the elderly people. Our goal is to develop a minimalistic MCI detection system that could be integrated to the wearable sensors. This contribution has three major aspects: 1) cleaning the EEG signal, 2) detecting MCI, and 3) predicting the severity of the MCI using the data obtained from a single-channel EEG electrode. Artifacts such as eye blink activities can corrupt the EEG signals. We investigate unsupervised and effective removal of ocular artifact (OA) from single-channel streaming raw EEG data. Wavelet transform (WT) decomposition technique was systematically evaluated for effectiveness of OA removal for a single-channel EEG system. Discrete Wavelet Transform (DWT) and Stationary Wavelet Transform (SWT), is studied with four WT basis functions: haar, coif3, sym3, and bior4.4. The performance of the artifact removal algorithm was evaluated by the correlation coefficients (CC), mutual information (MI), signal to artifact ratio (SAR), normalized mean square error (NMSE), and time-frequency analysis. It is demonstrated that WT can be an effective tool for unsupervised OA removal from single channel EEG data for real-time applications.For the MCI detection from the clean EEG data, we collected the scalp EEG data, while the subjects were stimulated with five auditory speech signals. We extracted 590 features from the Event-Related Potential (ERP) of the collected EEG signals, which included time and spectral domain characteristics of the response. The top 25 features, ranked by the random forest method, were used for classification models to identify subjects with MCI. Robustness of our model was tested using leave-one-out cross-validation while training the classifiers. Best results (leave-one-out cross-validation accuracy 87.9%, sensitivity 84.8%, specificity 95%, and F score 85%) were obtained using support vector machine (SVM) method with Radial Basis Kernel (RBF) (sigma = 10, cost = 102). Similar performances were also observed with logistic regression (LR), further validating the results. Our results suggest that single-channel EEG could provide a robust biomarker for early detection of MCI. We also developed a single channel Electro-encephalography (EEG) based MCI severity monitoring algorithm by generating the Montreal Cognitive Assessment (MoCA) scores from the features extracted from EEG. We performed multi-trial and single-trail analysis for the algorithm development of the MCI severity monitoring. We studied Multivariate Regression (MR), Ensemble Regression (ER), Support Vector Regression (SVR), and Ridge Regression (RR) for multi-trial and deep neural regression for the single-trial analysis. In the case of multi-trial, the best result was obtained from the ER. In our single-trial analysis, we constructed the time-frequency image from each trial and feed it to the convolutional deep neural network (CNN). Performance of the regression models was evaluated by the RMSE and the residual analysis. We obtained the best accuracy with the deep neural regression method
Biologically-inspired hierarchical architectures for object recognition
PhD ThesisThe existing methods for machine vision translate the three-dimensional
objects in the real world into two-dimensional images. These methods
have achieved acceptable performances in recognising objects. However,
the recognition performance drops dramatically when objects are transformed, for instance, the background, orientation, position in the image,
and scale. The human’s visual cortex has evolved to form an efficient
invariant representation of objects from within a scene. The superior
performance of human can be explained by the feed-forward multi-layer
hierarchical structure of human visual cortex, in addition to, the utilisation of different fields of vision depending on the recognition task.
Therefore, the research community investigated building systems that
mimic the hierarchical architecture of the human visual cortex as an
ultimate objective.
The aim of this thesis can be summarised as developing hierarchical
models of the visual processing that tackle the remaining challenges of
object recognition. To enhance the existing models of object recognition
and to overcome the above-mentioned issues, three major contributions
are made that can be summarised as the followings
1. building a hierarchical model within an abstract architecture that
achieves good performances in challenging image object datasets;
2. investigating the contribution for each region of vision for object
and scene images in order to increase the recognition performance
and decrease the size of the processed data;
3. further enhance the performance of all existing models of object
recognition by introducing hierarchical topologies that utilise the
context in which the object is found to determine the identity of
the object.
Statement ofHigher Committee For Education Development in Iraq (HCED
New approaches for unsupervised transcriptomic data analysis based on Dictionary learning
The era of high-throughput data generation enables new access to biomolecular profiles and exploitation thereof. However, the analysis of such biomolecular data, for example, transcriptomic data, suffers from the so-called "curse of dimensionality". This occurs in the analysis of datasets with a significantly larger number of variables than data points. As a consequence, overfitting and unintentional learning of process-independent patterns can appear. This can lead to insignificant results in the application. A common way of counteracting this problem is the application of dimension reduction methods and subsequent analysis of the resulting low-dimensional representation that has a smaller number of variables.
In this thesis, two new methods for the analysis of transcriptomic datasets are introduced and evaluated. Our methods are based on the concepts of Dictionary learning, which is an unsupervised dimension reduction approach. Unlike many dimension reduction approaches that are widely applied for transcriptomic data analysis, Dictionary learning does not impose constraints on the components that are to be derived. This allows for great flexibility when adjusting the representation to the data. Further, Dictionary learning belongs to the class of sparse methods. The result of sparse methods is a model with few non-zero coefficients, which is often preferred for its simplicity and ease of interpretation. Sparse methods exploit the fact that the analysed datasets are highly structured. Indeed, a characteristic of transcriptomic data is particularly their structuredness, which appears due to the connection of genes and pathways, for example. Nonetheless, the application of Dictionary learning in medical data analysis is mainly restricted to image analysis. Another advantage of Dictionary learning is that it is an interpretable approach. Interpretability is a necessity in biomolecular data analysis to gain a holistic understanding of the investigated processes.
Our two new transcriptomic data analysis methods are each designed for one main task: (1) identification of subgroups for samples from mixed populations, and (2) temporal ordering of samples from dynamic datasets, also referred to as "pseudotime estimation". Both methods are evaluated on simulated and real-world data and compared to other methods that are widely applied in transcriptomic data analysis. Our methods convince through high performance and overall outperform the comparison methods
Finding Frustration: a Dive into the EEG of Drivers
Emotion recognition technologies for driving are increasingly used to render automotive travel more
pleasurable and, more importantly, safer. Since emotions such as frustration and anger can lead to an
increase in traffic accidents, this thesis explored the utility of electroencephalogram (EEG) features to
recognize the driver’s frustration level. It, therefore, sought to find a balance between the ecologically valid
emotion induction of a driving simulator and the noise-sensitive but highly informative measure of the EEG.
Participants’ brain activity was captured with the CGX quick-30 mobile EEG system. 19 participants
completed four different frustration-inducing and two baseline driving scenarios in a 360° driving simulator.
Subsequently, the participants continuously rated their frustration level based on the replay of each scenario.
The resulting subjective measures were used to classify EEG time periods into episodes with or without
frustration. Results showed that the frequently used measure of the Alpha Asymmetry Index (AAI) had, as
hypothesized, significantly more negative indices for high frustration (vs. no frustration). However, a
commingling effect of anger on this result could not be dismissed. The results could not provide evidence
for the yet to be replicated previous research of frustration correlates within narrow-band oscillations (delta,
theta, alpha, and beta) at specified electrode positions (frontal, central, and posterior). This thesis concludes
with suggestions for subsequent research endeavors and forthcoming practical implications in the form of
insights acquired
Low-Density Cluster Separators for Large, High-Dimensional, Mixed and Non-Linearly Separable Data.
The location of groups of similar observations (clusters) in data is a well-studied problem, and has many practical applications. There are a wide range of approaches to clustering, which rely on different definitions of similarity, and are appropriate for datasets with different characteristics. Despite a rich literature, there exist a number of open problems in clustering, and limitations to existing algorithms. This thesis develops methodology for clustering high-dimensional, mixed datasets with complex clustering structures, using low-density cluster separators that bi-partition datasets using cluster boundaries that pass through regions of minimal density, separating regions of high probability density, associated with clusters. The bi-partitions arising from a succession of minimum density cluster separators are combined using divisive hierarchical and partitional algorithms, to locate a complete clustering, while estimating the number of clusters. The proposed algorithms locate cluster separators using one-dimensional arbitrarily oriented subspaces, circumventing the challenges associated with clustering in high-dimensional spaces. This requires continuous observations; thus, to extend the applicability of the proposed algorithms to mixed datasets, methods for producing an appropriate continuous representation of datasets containing non-continuous features are investigated. The exact evaluation of the density intersected by a cluster boundary is restricted to linear separators. This limitation is lifted by a non-linear mapping of the original observations into a feature space, in which a linear separator permits the correct identification of non-linearly separable clusters in the original dataset. In large, high-dimensional datasets, searching for one-dimensional subspaces, which result in a minimum density separator is computationally expensive. Therefore, a computationally efficient approach to low-density cluster separation using approximately optimal projection directions is proposed, which searches over a collection of one-dimensional random projections for an appropriate subspace for cluster identification. The proposed approaches produce high-quality partitions, that are competitive with well-established and state-of-the-art algorithms
- …