2,715 research outputs found

    Interactive search techniques for content-based retrieval from archives of images

    Get PDF
    Through a little investigation by file types it is possible to easily find that one of the most popular search engines has in its indexes about 10 billion of images. Even considering that this data is probably an underestimate of the real number, however, immediately it gives us an idea of how the images are a key component in human communication. This so exorbitant number puts us in the face of the enormous difficulties encountered when one has to deal with them. Until now, the images have always been accompanied by textual data: description, tags, labels, ... which are used to retrieve them fromthe archives. However it is clear that their increase, occurred in recent years, does not allow this type cataloguing. Furthermore, for its own nature, a manual cataloguing is subjective, partial and without doubt subject to error. To overcome this situation in recent years it has gotten a footing a kind of search based on the intrinsic characteristics of images such as colors and shapes. This information is then converted into numerical vectors, and through their comparison it is possible to find images that have similar characteristics. It is clear that a search, on this level of representation of the images, is far from the user perception that of the images. To allow the interaction between users and retrieval systems and improve the performance, it has been decided to involve the user in the search allowing to him to give a feedback of relevance of the images retrieved so far. In this the kind of image that are interesting for user can be learnt by the system and an improvement in the next iteration can be obtained. These techniques, although studied for many years, still present open issues. High dimensional feature spaces, lack of relevant training images, and feature spaceswith lowdiscriminative capability are just some of the problems encountered. In this thesis these problems will be faced by proposing some innovative solutions both to improve performance obtained by methods proposed in the literature, and to provide to retrieval systems greater generalization capability. Techniques of data fusion, both at the feature space level and at the level of different retrieval techniques, will be presented, showing that the former allow greater discriminative capability while the latter provide more robustness to the system. To overcome the lack of images of training it will be proposed a method to generate synthetic patterns allowing in this way a more balanced learning. Finally, new methods to measure similarity between images and to explore more efficiently the feature space will be proposed. The presented results show that the proposed approaches are indeed helpful in resolving some of the main problems in content based image retrieval

    ECHAD: Embedding-Based Change Detection from Multivariate Time Series in Smart Grids

    Get PDF
    Smart grids are power grids where clients may actively participate in energy production, storage and distribution. Smart grid management raises several challenges, including the possible changes and evolutions in terms of energy consumption and production, that must be taken into account in order to properly regulate the energy distribution. In this context, machine learning methods can be fruitfully adopted to support the analysis and to predict the behavior of smart grids, by exploiting the large amount of streaming data generated by sensor networks. In this article, we propose a novel change detection method, called ECHAD (Embedding-based CHAnge Detection), that leverages embedding techniques, one-class learning, and a dynamic detection approach that incrementally updates the learned model to reflect the new data distribution. Our experiments show that ECHAD achieves optimal performances on synthetic data representing challenging scenarios. Moreover, a qualitative analysis of the results obtained on real data of a real power grid reveals the quality of the change detection of ECHAD. Specifically, a comparison with state-of-the-art approaches shows the ability of ECHAD in identifying additional relevant changes, not detected by competitors, avoiding false positive detections

    Learning from Multi-Class Imbalanced Big Data with Apache Spark

    Get PDF
    With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results as learning from the rare cases may be the primary goal. Many of the classic machine learning algorithms were not designed for multi-class, imbalanced data or parallelism, and so their effectiveness has been hindered. This dissertation addresses some of these challenges with in-depth experimentation using novel implementations of machine learning algorithms using Apache Spark, a distributed computing framework based on the MapReduce model designed to handle very large datasets. Experimentation showed that many of the traditional classifier algorithms do not translate well to a distributed computing environment, indicating the need for a new generation of algorithms targeting modern high-performance computing. A collection of popular oversampling methods, originally designed for small binary class datasets, have been implemented using Apache Spark for the first time to improve parallelism and add multi-class support. An extensive study on how instance level difficulty affects the learning from large datasets was also performed

    Machine learning methods for the characterization and classification of complex data

    Get PDF
    This thesis work presents novel methods for the analysis and classification of medical images and, more generally, complex data. First, an unsupervised machine learning method is proposed to order anterior chamber OCT (Optical Coherence Tomography) images according to a patient's risk of developing angle-closure glaucoma. In a second study, two outlier finding techniques are proposed to improve the results of above mentioned machine learning algorithm, we also show that they are applicable to a wide variety of data, including fraud detection in credit card transactions. In a third study, the topology of the vascular network of the retina, considering it a complex tree-like network is analyzed and we show that structural differences reveal the presence of glaucoma and diabetic retinopathy. In a fourth study we use a model of a laser with optical injection that presents extreme events in its intensity time-series to evaluate machine learning methods to forecast such extreme events.El presente trabajo de tesis desarrolla nuevos métodos para el análisis y clasificación de imágenes médicas y datos complejos en general. Primero, proponemos un método de aprendizaje automático sin supervisión que ordena imágenes OCT (tomografía de coherencia óptica) de la cámara anterior del ojo en función del grado de riesgo del paciente de padecer glaucoma de ángulo cerrado. Luego, desarrollamos dos métodos de detección automática de anomalías que utilizamos para mejorar los resultados del algoritmo anterior, pero que su aplicabilidad va mucho más allá, siendo útil, incluso, para la detección automática de fraudes en transacciones de tarjetas de crédito. Mostramos también, cómo al analizar la topología de la red vascular de la retina considerándola una red compleja, podemos detectar la presencia de glaucoma y de retinopatía diabética a través de diferencias estructurales. Estudiamos también un modelo de un láser con inyección óptica que presenta eventos extremos en la serie temporal de intensidad para evaluar diferentes métodos de aprendizaje automático para predecir dichos eventos extremos.Aquesta tesi desenvolupa nous mètodes per a l’anàlisi i la classificació d’imatges mèdiques i dades complexes. Hem proposat, primer, un mètode d’aprenentatge automàtic sense supervisió que ordena imatges OCT (tomografia de coherència òptica) de la cambra anterior de l’ull en funció del grau de risc del pacient de patir glaucoma d’angle tancat. Després, hem desenvolupat dos mètodes de detecció automàtica d’anomalies que hem utilitzat per millorar els resultats de l’algoritme anterior, però que la seva aplicabilitat va molt més enllà, sent útil, fins i tot, per a la detecció automàtica de fraus en transaccions de targetes de crèdit. Mostrem també, com en analitzar la topologia de la xarxa vascular de la retina considerant-la una xarxa complexa, podem detectar la presència de glaucoma i de retinopatia diabètica a través de diferències estructurals. Finalment, hem estudiat un làser amb injecció òptica, el qual presenta esdeveniments extrems en la sèrie temporal d’intensitat. Hem avaluat diferents mètodes per tal de predir-los.Postprint (published version

    Activity-controlled annealing of colloidal monolayers.

    Get PDF
    Molecular motors are essential to the living, generating fluctuations that boost transport and assist assembly. Active colloids, that consume energy to move, hold similar potential for man-made materials controlled by forces generated from within. Yet, their use as a powerhouse in materials science lacks. Here we show a massive acceleration of the annealing of a monolayer of passive beads by moderate addition of self-propelled microparticles. We rationalize our observations with a model of collisions that drive active fluctuations and activate the annealing. The experiment is quantitatively compared with Brownian dynamic simulations that further unveil a dynamical transition in the mechanism of annealing. Active dopants travel uniformly in the system or co-localize at the grain boundaries as a result of the persistence of their motion. Our findings uncover the potential of internal activity to control materials and lay the groundwork for the rise of materials science beyond equilibrium

    Complexity in Developmental Systems: Toward an Integrated Understanding of Organ Formation

    Get PDF
    During animal development, embryonic cells assemble into intricately structured organs by working together in organized groups capable of implementing tightly coordinated collective behaviors, including patterning, morphogenesis and migration. Although many of the molecular components and basic mechanisms underlying such collective phenomena are known, the complexity emerging from their interplay still represents a major challenge for developmental biology. Here, we first clarify the nature of this challenge and outline three key strategies for addressing it: precision perturbation, synthetic developmental biology, and data-driven inference. We then present the results of our effort to develop a set of tools rooted in two of these strategies and to apply them to uncover new mechanisms and principles underlying the coordination of collective cell behaviors during organogenesis, using the zebrafish posterior lateral line primordium as a model system. To enable precision perturbation of migration and morphogenesis, we sought to adapt optogenetic tools to control chemokine and actin signaling. This endeavor proved far from trivial and we were ultimately unable to derive functional optogenetic constructs. However, our work toward this goal led to a useful new way of perturbing cortical contractility, which in turn revealed a potential role for cell surface tension in lateral line organogenesis. Independently, we hypothesized that the lateral line primordium might employ plithotaxis to coordinate organ formation with collective migration. We tested this hypothesis using a novel optical tool that allows targeted arrest of cell migration, finding that contrary to previous assumptions plithotaxis does not substantially contribute to primordium guidance. Finally, we developed a computational framework for automated single-cell segmentation, latent feature extraction and quantitative analysis of cellular architecture. We identified the key factors defining shape heterogeneity across primordium cells and went on to use this shape space as a reference for mapping the results of multiple experiments into a quantitative atlas of primordium cell architecture. We also propose a number of data-driven approaches to help bridge the gap from big data to mechanistic models. Overall, this study presents several conceptual and methodological advances toward an integrated understanding of complex multi-cellular systems
    • …