3,642 research outputs found

    Semi-supervised model-based clustering with controlled clusters leakage

    Full text link
    In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

    Improving water network management by efficient division into supply clusters

    Full text link
    El agua es un recurso escaso que, como tal, debe ser gestionado de manera eficiente. Así, uno de los propósitos de dicha gestión debiera ser la reducción de pérdidas de agua y la mejora del funcionamiento del abastecimiento. Para ello, es necesario crear un marco de trabajo basado en un conocimiento profundo de la redes de distribución. En los casos reales, llegar a este conocimiento es una tarea compleja debido a que estos sistemas pueden estar formados por miles de nodos de consumo, interconectados entre sí también por miles de tuberías y sus correspondientes elementos de alimentación. La mayoría de las veces, esas redes no son el producto de un solo proceso de diseño, sino la consecuencia de años de historia que han dado respuesta a demandas de agua continuamente crecientes con el tiempo. La división de la red en lo que denominaremos clusters de abastecimiento, permite la obtención del conocimiento hidráulico adecuado para planificar y operar las tareas de gestión oportunas, que garanticen el abastecimiento al consumidor final. Esta partición divide las redes de distribución en pequeñas sub-redes, que son virtualmente independientes y están alimentadas por un número prefijado de fuentes. Esta tesis propone un marco de trabajo adecuado en el establecimiento de vías eficientes tanto para dividir la red de abastecimiento en sectores, como para desarrollar nuevas actividades de gestión, aprovechando esta estructura dividida. La propuesta de desarrollo de cada una de estas tareas será mediante el uso de métodos kernel y sistemas multi-agente. El spectral clustering y el aprendizaje semi-supervisado se mostrarán como métodos con buen comportamiento en el paradigma de encontrar una red sectorizada que necesite usar el número mínimo de válvulas de corte. No obstante, sus algoritmos se vuelven lentos (a veces infactibles) dividiendo una red de abastecimiento grande.Herrera Fernández, AM. (2011). Improving water network management by efficient division into supply clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11233Palanci

    FIST: A Feature-Importance Sampling and Tree-Based Method for Automatic Design Flow Parameter Tuning

    Full text link
    Design flow parameters are of utmost importance to chip design quality and require a painfully long time to evaluate their effects. In reality, flow parameter tuning is usually performed manually based on designers' experience in an ad hoc manner. In this work, we introduce a machine learning-based automatic parameter tuning methodology that aims to find the best design quality with a limited number of trials. Instead of merely plugging in machine learning engines, we develop clustering and approximate sampling techniques for improving tuning efficiency. The feature extraction in this method can reuse knowledge from prior designs. Furthermore, we leverage a state-of-the-art XGBoost model and propose a novel dynamic tree technique to overcome overfitting. Experimental results on benchmark circuits show that our approach achieves 25% improvement in design quality or 37% reduction in sampling cost compared to random forest method, which is the kernel of a highly cited previous work. Our approach is further validated on two industrial designs. By sampling less than 0.02% of possible parameter sets, it reduces area by 1.83% and 1.43% compared to the best solutions hand-tuned by experienced designers

    Machine learning for large-scale wearable sensor data in Parkinson disease:concepts, promises, pitfalls, and futures

    Get PDF
    For the treatment and monitoring of Parkinson's disease (PD) to be scientific, a key requirement is that measurement of disease stages and severity is quantitative, reliable, and repeatable. The last 50 years in PD research have been dominated by qualitative, subjective ratings obtained by human interpretation of the presentation of disease signs and symptoms at clinical visits. More recently, “wearable,” sensor-based, quantitative, objective, and easy-to-use systems for quantifying PD signs for large numbers of participants over extended durations have been developed. This technology has the potential to significantly improve both clinical diagnosis and management in PD and the conduct of clinical studies. However, the large-scale, high-dimensional character of the data captured by these wearable sensors requires sophisticated signal processing and machine-learning algorithms to transform it into scientifically and clinically meaningful information. Such algorithms that “learn” from data have shown remarkable success in making accurate predictions for complex problems in which human skill has been required to date, but they are challenging to evaluate and apply without a basic understanding of the underlying logic on which they are based. This article contains a nontechnical tutorial review of relevant machine-learning algorithms, also describing their limitations and how these can be overcome. It discusses implications of this technology and a practical road map for realizing the full potential of this technology in PD research and practice

    The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis

    Full text link
    In recent years, mobile devices (e.g., smartphones and tablets) have met an increasing commercial success and have become a fundamental element of the everyday life for billions of people all around the world. Mobile devices are used not only for traditional communication activities (e.g., voice calls and messages) but also for more advanced tasks made possible by an enormous amount of multi-purpose applications (e.g., finance, gaming, and shopping). As a result, those devices generate a significant network traffic (a consistent part of the overall Internet traffic). For this reason, the research community has been investigating security and privacy issues that are related to the network traffic generated by mobile devices, which could be analyzed to obtain information useful for a variety of goals (ranging from device security and network optimization, to fine-grained user profiling). In this paper, we review the works that contributed to the state of the art of network traffic analysis targeting mobile devices. In particular, we present a systematic classification of the works in the literature according to three criteria: (i) the goal of the analysis; (ii) the point where the network traffic is captured; and (iii) the targeted mobile platforms. In this survey, we consider points of capturing such as Wi-Fi Access Points, software simulation, and inside real mobile devices or emulators. For the surveyed works, we review and compare analysis techniques, validation methods, and achieved results. We also discuss possible countermeasures, challenges and possible directions for future research on mobile traffic analysis and other emerging domains (e.g., Internet of Things). We believe our survey will be a reference work for researchers and practitioners in this research field.Comment: 55 page

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    Hybrid SOM+k-Means Clustering to Improve Planning, Operation and Management in Water Distribution Systems

    Full text link
    [EN] With the advance of new technologies and emergence of the concept of the smart city, there has been a dramatic increase in available information. Water distribution systems (WDSs) in which databases can be updated every few minutes are no exception. Suitable techniques to evaluate available information and produce optimized responses are necessary for planning, operation, and management. This can help identify critical characteristics, such as leakage patterns, pipes to be replaced, and other features. This paper presents a clustering method based on self-organizing maps coupled with k-means algorithms to achieve groups that can be easily labeled and used for WDS decision-making. Three case-studies are presented, namely a classification of Brazilian cities in terms of their water utilities; district metered area creation to improve pressure control; and transient pressure signal analysis to identify burst pipes. In the three cases, this hybrid technique produces excellent results. © 2018 Elsevier Ltd. All rights reserved.This work is partially supported by Capes and CNPq, Brazilian research agencies. The use of English was revised by John Rawlins.Brentan, BM.; Meirelles, G.; Luvizotto, E.; Izquierdo Sebastián, J. (2018). Hybrid SOM+k-Means Clustering to Improve Planning, Operation and Management in Water Distribution Systems. Environmental Modelling & Software. 106:77-88. https://doi.org/10.1016/j.envsoft.2018.02.013S778810
    corecore