1,706 research outputs found
Hybrid SOM+k-Means Clustering to Improve Planning, Operation and Management in Water Distribution Systems
[EN] With the advance of new technologies and emergence of the concept of the smart city, there has been a
dramatic increase in available information. Water distribution systems (WDSs) in which databases can be
updated every few minutes are no exception. Suitable techniques to evaluate available information and
produce optimized responses are necessary for planning, operation, and management. This can help
identify critical characteristics, such as leakage patterns, pipes to be replaced, and other features. This
paper presents a clustering method based on self-organizing maps coupled with k-means algorithms to
achieve groups that can be easily labeled and used for WDS decision-making. Three case-studies are
presented, namely a classification of Brazilian cities in terms of their water utilities; district metered area
creation to improve pressure control; and transient pressure signal analysis to identify burst pipes. In the
three cases, this hybrid technique produces excellent results.
© 2018 Elsevier Ltd. All rights reserved.This work is partially supported by Capes and CNPq, Brazilian research agencies. The use of English was revised by John Rawlins.Brentan, BM.; Meirelles, G.; Luvizotto, E.; Izquierdo Sebastián, J. (2018). Hybrid SOM+k-Means Clustering to Improve Planning, Operation and Management in Water Distribution Systems. Environmental Modelling & Software. 106:77-88. https://doi.org/10.1016/j.envsoft.2018.02.013S778810
Development of a R package to facilitate the learning of clustering techniques
This project explores the development of a tool, in the form of a R package, to ease the process of
learning clustering techniques, how they work and what their pros and cons are. This tool should provide
implementations for several different clustering techniques with explanations in order to allow the student
to get familiar with the characteristics of each algorithm by testing them against several different datasets
while deepening their understanding of them through the explanations. Additionally, these explanations
should adapt to the input data, making the tool not only adept for self-regulated learning but for teaching
too.Grado en Ingeniería Informátic
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
A growing number of applications, e.g. video surveillance and medical image
analysis, require training recognition systems from large amounts of weakly
annotated data while some targeted interactions with a domain expert are
allowed to improve the training process. In such cases, active learning (AL)
can reduce labeling costs for training a classifier by querying the expert to
provide the labels of most informative instances. This paper focuses on AL
methods for instance classification problems in multiple instance learning
(MIL), where data is arranged into sets, called bags, that are weakly labeled.
Most AL methods focus on single instance learning problems. These methods are
not suitable for MIL problems because they cannot account for the bag structure
of data. In this paper, new methods for bag-level aggregation of instance
informativeness are proposed for multiple instance active learning (MIAL). The
\textit{aggregated informativeness} method identifies the most informative
instances based on classifier uncertainty, and queries bags incorporating the
most information. The other proposed method, called \textit{cluster-based
aggregative sampling}, clusters data hierarchically in the instance space. The
informativeness of instances is assessed by considering bag labels, inferred
instance labels, and the proportion of labels that remain to be discovered in
clusters. Both proposed methods significantly outperform reference methods in
extensive experiments using benchmark data from several application domains.
Results indicate that using an appropriate strategy to address MIAL problems
yields a significant reduction in the number of queries needed to achieve the
same level of performance as single instance AL methods
No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling
Extracting knowledge from unlabeled texts using machine learning algorithms
can be complex. Document categorization and information retrieval are two
applications that may benefit from unsupervised learning (e.g., text clustering
and topic modeling), including exploratory data analysis. However, the
unsupervised learning paradigm poses reproducibility issues. The initialization
can lead to variability depending on the machine learning algorithm.
Furthermore, the distortions can be misleading when regarding cluster geometry.
Amongst the causes, the presence of outliers and anomalies can be a determining
factor. Despite the relevance of initialization and outlier issues for text
clustering and topic modeling, the authors did not find an in-depth analysis of
them. This survey provides a systematic literature review (2011-2022) of these
subareas and proposes a common terminology since similar procedures have
different terms. The authors describe research opportunities, trends, and open
issues. The appendices summarize the theoretical background of the text
vectorization, the factorization, and the clustering algorithms that are
directly or indirectly related to the reviewed works
Human-assisted self-supervised labeling of large data sets
There is a severe demand for, and shortage of, large accurately labeled datasets to train supervised computational intelligence (CI) algorithms in domains like unmanned aerial systems (UAS) and autonomous vehicles. This has hindered our ability to develop and deploy various computer vision algorithms in/across environments and niche domains for tasks like detection, localization, and tracking. Herein, I propose a new human-in-the-loop (HITL) based growing neural gas (GNG) algorithm to minimize human intervention during labeling large UAS data collections over a shared geospatial area. Specifically, I address human driven events like new class identification and mistake correction. I also address algorithm-centric operations like new pattern discovery and self-supervised labeling. Pattern discovery and identification through self-supervised labeling is made possible through open set recognition (OSR). Herein, I propose a classifier with the ability to say "I don't know" to identify outliers in the data and bootstrap deep learning (DL) models, specifically convolutional neural networks (CNNs), with the ability to classify on N+1 classes. The effectiveness of the algorithms are demonstrated using simulated realistic ray-traced low altitude UAS data from the Unreal Engine. The results show that it is possible to increase speed and reduce mental fatigue over hand labeling large image datasets.Includes bibliographical references
- …