3,642 research outputs found
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
Improving water network management by efficient division into supply clusters
El agua es un recurso escaso que, como tal, debe ser gestionado de manera eficiente. Así, uno de los propósitos de dicha gestión debiera ser la reducción de pérdidas de agua y la mejora del funcionamiento del abastecimiento. Para ello, es necesario crear un marco de trabajo basado en un conocimiento profundo de la redes de distribución. En los casos reales, llegar a este conocimiento es una tarea compleja debido a que estos sistemas pueden estar formados por miles de nodos de consumo, interconectados entre sí también por miles de tuberías y sus correspondientes elementos de alimentación. La mayoría de las veces, esas redes no son el producto de un solo proceso de diseño, sino la consecuencia de años de historia que han dado respuesta a demandas de agua continuamente crecientes con el tiempo. La división de la red en lo que denominaremos clusters de abastecimiento, permite la obtención del conocimiento hidráulico adecuado para planificar y operar las tareas de gestión oportunas, que garanticen el abastecimiento al consumidor final. Esta partición divide las redes de distribución en pequeñas sub-redes, que son virtualmente independientes y están alimentadas por un número prefijado de fuentes.
Esta tesis propone un marco de trabajo adecuado en el establecimiento de vías eficientes tanto para dividir la red de abastecimiento en sectores, como para desarrollar nuevas actividades de gestión, aprovechando esta estructura dividida. La propuesta de desarrollo de cada una de estas tareas será mediante el uso de métodos kernel y sistemas multi-agente. El spectral clustering y el aprendizaje semi-supervisado se mostrarán como métodos con buen comportamiento en el paradigma de encontrar una red sectorizada que necesite usar el número mínimo de válvulas de corte. No obstante, sus algoritmos se vuelven lentos (a veces infactibles) dividiendo una red de abastecimiento grande.Herrera Fernández, AM. (2011). Improving water network management by efficient division into supply clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11233Palanci
FIST: A Feature-Importance Sampling and Tree-Based Method for Automatic Design Flow Parameter Tuning
Design flow parameters are of utmost importance to chip design quality and
require a painfully long time to evaluate their effects. In reality, flow
parameter tuning is usually performed manually based on designers' experience
in an ad hoc manner. In this work, we introduce a machine learning-based
automatic parameter tuning methodology that aims to find the best design
quality with a limited number of trials. Instead of merely plugging in machine
learning engines, we develop clustering and approximate sampling techniques for
improving tuning efficiency. The feature extraction in this method can reuse
knowledge from prior designs. Furthermore, we leverage a state-of-the-art
XGBoost model and propose a novel dynamic tree technique to overcome
overfitting. Experimental results on benchmark circuits show that our approach
achieves 25% improvement in design quality or 37% reduction in sampling cost
compared to random forest method, which is the kernel of a highly cited
previous work. Our approach is further validated on two industrial designs. By
sampling less than 0.02% of possible parameter sets, it reduces area by 1.83%
and 1.43% compared to the best solutions hand-tuned by experienced designers
Machine learning for large-scale wearable sensor data in Parkinson disease:concepts, promises, pitfalls, and futures
For the treatment and monitoring of Parkinson's disease (PD) to be scientific, a key requirement is that measurement of disease stages and severity is quantitative, reliable, and repeatable. The last 50 years in PD research have been dominated by qualitative, subjective ratings obtained by human interpretation of the presentation of disease signs and symptoms at clinical visits. More recently, “wearable,” sensor-based, quantitative, objective, and easy-to-use systems for quantifying PD signs for large numbers of participants over extended durations have been developed. This technology has the potential to significantly improve both clinical diagnosis and management in PD and the conduct of clinical studies. However, the large-scale, high-dimensional character of the data captured by these wearable sensors requires sophisticated signal processing and machine-learning algorithms to transform it into scientifically and clinically meaningful information. Such algorithms that “learn” from data have shown remarkable success in making accurate predictions for complex problems in which human skill has been required to date, but they are challenging to evaluate and apply without a basic understanding of the underlying logic on which they are based. This article contains a nontechnical tutorial review of relevant machine-learning algorithms, also describing their limitations and how these can be overcome. It discusses implications of this technology and a practical road map for realizing the full potential of this technology in PD research and practice
The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis
In recent years, mobile devices (e.g., smartphones and tablets) have met an
increasing commercial success and have become a fundamental element of the
everyday life for billions of people all around the world. Mobile devices are
used not only for traditional communication activities (e.g., voice calls and
messages) but also for more advanced tasks made possible by an enormous amount
of multi-purpose applications (e.g., finance, gaming, and shopping). As a
result, those devices generate a significant network traffic (a consistent part
of the overall Internet traffic). For this reason, the research community has
been investigating security and privacy issues that are related to the network
traffic generated by mobile devices, which could be analyzed to obtain
information useful for a variety of goals (ranging from device security and
network optimization, to fine-grained user profiling).
In this paper, we review the works that contributed to the state of the art
of network traffic analysis targeting mobile devices. In particular, we present
a systematic classification of the works in the literature according to three
criteria: (i) the goal of the analysis; (ii) the point where the network
traffic is captured; and (iii) the targeted mobile platforms. In this survey,
we consider points of capturing such as Wi-Fi Access Points, software
simulation, and inside real mobile devices or emulators. For the surveyed
works, we review and compare analysis techniques, validation methods, and
achieved results. We also discuss possible countermeasures, challenges and
possible directions for future research on mobile traffic analysis and other
emerging domains (e.g., Internet of Things). We believe our survey will be a
reference work for researchers and practitioners in this research field.Comment: 55 page
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Hybrid SOM+k-Means Clustering to Improve Planning, Operation and Management in Water Distribution Systems
[EN] With the advance of new technologies and emergence of the concept of the smart city, there has been a
dramatic increase in available information. Water distribution systems (WDSs) in which databases can be
updated every few minutes are no exception. Suitable techniques to evaluate available information and
produce optimized responses are necessary for planning, operation, and management. This can help
identify critical characteristics, such as leakage patterns, pipes to be replaced, and other features. This
paper presents a clustering method based on self-organizing maps coupled with k-means algorithms to
achieve groups that can be easily labeled and used for WDS decision-making. Three case-studies are
presented, namely a classification of Brazilian cities in terms of their water utilities; district metered area
creation to improve pressure control; and transient pressure signal analysis to identify burst pipes. In the
three cases, this hybrid technique produces excellent results.
© 2018 Elsevier Ltd. All rights reserved.This work is partially supported by Capes and CNPq, Brazilian research agencies. The use of English was revised by John Rawlins.Brentan, BM.; Meirelles, G.; Luvizotto, E.; Izquierdo Sebastián, J. (2018). Hybrid SOM+k-Means Clustering to Improve Planning, Operation and Management in Water Distribution Systems. Environmental Modelling & Software. 106:77-88. https://doi.org/10.1016/j.envsoft.2018.02.013S778810
- …