2,148 research outputs found
Incorporating stakeholders’ knowledge in group decision-making
International audienc
A survey on pre-processing techniques: relevant issues in the context of environmental data mining
One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft
Acoustic data optimisation for seabed mapping with visual and computational data mining
Oceans cover 70% of Earth’s surface but little is known about their waters.
While the echosounders, often used for exploration of our oceans, have developed at
a tremendous rate since the WWII, the methods used to analyse and interpret the data
still remain the same. These methods are inefficient, time consuming, and often
costly in dealing with the large data that modern echosounders produce. This PhD
project will examine the complexity of the de facto seabed mapping technique by
exploring and analysing acoustic data with a combination of data mining and visual
analytic methods.
First we test the redundancy issues in multibeam echosounder (MBES) data
by using the component plane visualisation of a Self Organising Map (SOM). A total
of 16 visual groups were identified among the 132 statistical data descriptors. The
optimised MBES dataset had 35 attributes from 16 visual groups and represented a
73% reduction in data dimensionality. A combined Principal Component Analysis
(PCA) + k-means was used to cluster both the datasets. The cluster results were
visually compared as well as internally validated using four different internal
validation methods.
Next we tested two novel approaches in singlebeam echosounder (SBES)
data processing and clustering – using visual exploration for outlier detection and
direct clustering of time series echo returns. Visual exploration identified further
outliers the automatic procedure was not able to find. The SBES data were then
clustered directly. The internal validation indices suggested the optimal number of
clusters to be three. This is consistent with the assumption that the SBES time series
represented the subsurface classes of the seabed.
Next the SBES data were joined with the corresponding MBES data based on
identification of the closest locations between MBES and SBES. Two algorithms,
PCA + k-means and fuzzy c-means were tested and results visualised. From visual
comparison, the cluster boundary appeared to have better definitions when compared
to the clustered MBES data only. The results seem to indicate that adding SBES did
in fact improve the boundary definitions.
Next the cluster results from the analysis chapters were validated against
ground truth data using a confusion matrix and kappa coefficients. For MBES, the
classes derived from optimised data yielded better accuracy compared to that of the
original data. For SBES, direct clustering was able to provide a relatively reliable
overview of the underlying classes in survey area. The combined MBES + SBES
data provided by far the best accuracy for mapping with almost a 10% increase in
overall accuracy compared to that of the original MBES data.
The results proved to be promising in optimising the acoustic data and
improving the quality of seabed mapping. Furthermore, these approaches have the
potential of significant time and cost saving in the seabed mapping process. Finally
some future directions are recommended for the findings of this research project with
the consideration that this could contribute to further development of seabed
mapping problems at mapping agencies worldwide
An overview of clustering methods with guidelines for application in mental health research
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity
by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and
increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements.
In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and
implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic
models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently
introduced. How to choose algorithms to address common issues as well as methods for pre-clustering
data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general
guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms,
we provide information on R functions and librarie
CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS
The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research
Principles and Guidelines for Advancement of Touchscreen-Based Non-visual Access to 2D Spatial Information
Graphical materials such as graphs and maps are often inaccessible to millions of blind and visually-impaired (BVI) people, which negatively impacts their educational prospects, ability to travel, and vocational opportunities. To address this longstanding issue, a three-phase research program was conducted that builds on and extends previous work establishing touchscreen-based haptic cuing as a viable alternative for conveying digital graphics to BVI users. Although promising, this approach poses unique challenges that can only be addressed by schematizing the underlying graphical information based on perceptual and spatio-cognitive characteristics pertinent to touchscreen-based haptic access. Towards this end, this dissertation empirically identified a set of design parameters and guidelines through a logical progression of seven experiments.
Phase I investigated perceptual characteristics related to touchscreen-based graphical access using vibrotactile stimuli, with results establishing three core perceptual guidelines: (1) a minimum line width of 1mm should be maintained for accurate line-detection (Exp-1), (2) a minimum interline gap of 4mm should be used for accurate discrimination of parallel vibrotactile lines (Exp-2), and (3) a minimum angular separation of 4mm should be used for accurate discrimination of oriented vibrotactile lines (Exp-3). Building on these parameters, Phase II studied the core spatio-cognitive characteristics pertinent to touchscreen-based non-visual learning of graphical information, with results leading to the specification of three design guidelines: (1) a minimum width of 4mm should be used for supporting tasks that require tracing of vibrotactile lines and judging their orientation (Exp-4), (2) a minimum width of 4mm should be maintained for accurate line tracing and learning of complex spatial path patterns (Exp-5), and (3) vibrotactile feedback should be used as a guiding cue to support the most accurate line tracing performance (Exp-6). Finally, Phase III demonstrated that schematizing line-based maps based on these design guidelines leads to development of an accurate cognitive map. Results from Experiment-7 provide theoretical evidence in support of learning from vision and touch as leading to the development of functionally equivalent amodal spatial representations in memory. Findings from all seven experiments contribute to new theories of haptic information processing that can guide the development of new touchscreen-based non-visual graphical access solutions
- …