2,760 research outputs found

    JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics

    Full text link
    In applications of machine learning to particle physics, a persistent challenge is how to go beyond discrimination to learn about the underlying physics. To this end, a powerful tool would be a framework for unsupervised learning, where the machine learns the intricate high-dimensional contours of the data upon which it is trained, without reference to pre-established labels. In order to approach such a complex task, an unsupervised network must be structured intelligently, based on a qualitative understanding of the data. In this paper, we scaffold the neural network's architecture around a leading-order model of the physics underlying the data. In addition to making unsupervised learning tractable, this design actually alleviates existing tensions between performance and interpretability. We call the framework JUNIPR: "Jets from UNsupervised Interpretable PRobabilistic models". In this approach, the set of particle momenta composing a jet are clustered into a binary tree that the neural network examines sequentially. Training is unsupervised and unrestricted: the network could decide that the data bears little correspondence to the chosen tree structure. However, when there is a correspondence, the network's output along the tree has a direct physical interpretation. JUNIPR models can perform discrimination tasks, through the statistically optimal likelihood-ratio test, and they permit visualizations of discrimination power at each branching in a jet's tree. Additionally, JUNIPR models provide a probability distribution from which events can be drawn, providing a data-driven Monte Carlo generator. As a third application, JUNIPR models can reweight events from one (e.g. simulated) data set to agree with distributions from another (e.g. experimental) data set.Comment: 37 pages, 24 figure

    The Machine Learning Landscape of Top Taggers

    Full text link
    Based on the established task of identifying boosted, hadronically decaying top quarks, we compare a wide range of modern machine learning approaches. Unlike most established methods they rely on low-level input, for instance calorimeter output. While their network architectures are vastly different, their performance is comparatively similar. In general, we find that these new approaches are extremely powerful and great fun.Comment: Yet another tagger included

    Common pulse retrieval algorithm: a fast and universal method to retrieve ultrashort pulses

    Full text link
    We present a common pulse retrieval algorithm (COPRA) that can be used for a broad category of ultrashort laser pulse measurement schemes including frequency-resolved optical gating (FROG), interferometric FROG, dispersion scan, time domain ptychography, and pulse shaper assisted techniques such as multiphoton intrapulse interference phase scan (MIIPS). We demonstrate its properties in comprehensive numerical tests and show that it is fast, reliable and accurate in the presence of Gaussian noise. For FROG it outperforms retrieval algorithms based on generalized projections and ptychography. Furthermore, we discuss the pulse retrieval problem as a nonlinear least-squares problem and demonstrate the importance of obtaining a least-squares solution for noisy data. These results improve and extend the possibilities of numerical pulse retrieval. COPRA is faster and provides more accurate results in comparison to existing retrieval algorithms. Furthermore, it enables full pulse retrieval from measurements for which no retrieval algorithm was known before, e.g., MIIPS measurements

    Traditional and new principles of perceptual grouping

    Get PDF
    Perceptual grouping refers to the process of determining which regions and parts of the visual scene belong together as parts of higher order perceptual units such as objects or patterns. In the early 20th century, Gestalt psychologists identified a set of classic grouping principles which specified how some image features lead to grouping between elements given that all other factors were held constant. Modern vision scientists have expanded this list to cover a wide range of image features but have also expanded the importance of learning and other non-image factors. Unlike early Gestalt accounts which were based largely on visual demonstrations, modern theories are often explicitly quantitative and involve detailed models of how various image features modulate grouping. Work has also been done to understand the rules by which different grouping principles integrate to form a final percept. This chapter gives an overview of the classic principles, modern developments in understanding them, and new principles and the evidence for them. There is also discussion of some of the larger theoretical issues about grouping such as at what stage of visual processing it occurs and what types of neural mechanisms may implement grouping principles

    Automated Pattern Detection and Generalization of Building Groups

    Get PDF
    This dissertation focuses on the topic of building group generalization by considering the detection of building patterns. Generalization is an important research field in cartography, which is part of map production and the basis for the derivation of multiple representation. As one of the most important features on map, buildings occupy large amount of map space and normally have complex shape and spatial distribution, which leads to that the generalization of buildings has long been an important and challenging task. For social, architectural and geographical reasons, the buildings were built with some special rules which forms different building patterns. Building patterns are crucial structures which should be carefully considered during graphical representation and generalization. Although people can effortlessly perceive these patterns, however, building patterns are not explicitly described in building datasets. Therefore, to better support the subsequent generalization process, it is important to automatically recognize building patterns. The objective of this dissertation is to develop effective methods to detect building patterns from building groups. Based on the identified patterns, some generalization methods are proposed to fulfill the task of building generalization. The main contribution of the dissertation is described as the following five aspects: (1) The terminology and concept of building pattern has been clearly explained; a detailed and relative complete typology of building patterns has been proposed by summarizing the previous researches as well as extending by the author; (2) A stroke-mesh based method has been developed to group buildings and detect different patterns from the building groups; (3) Through the analogy between line simplification and linear building group typification, a stroke simplification based typification method has been developed aiming at solving the generalization of building groups with linear patterns; (4) A mesh-based typification method has been developed for the generalization of the building groups with grid patterns; (5) A method of extracting hierarchical skeleton structures from discrete buildings have been proposed. The extracted hierarchical skeleton structures are regarded as the representations of the global shape of the entire region, which is used to control the generalization process. With the above methods, the building patterns are detected from the building groups and the generalization of building groups are executed based on the patterns. In addition, the thesis has also discussed the drawbacks of the methods and gave the potential solutions.:Abstract I Kurzfassung III Contents V List of Figures IX List of Tables XIII List of Abbreviations XIV Chapter 1 Introduction 1 1.1 Background and motivation 1 1.1.1 Cartographic generalization 1 1.1.2 Urban building and building patterns 1 1.1.3 Building generalization 3 1.1.4 Hierarchical property in geographical objects 3 1.2 Research objectives 4 1.3 Study area 5 1.4 Thesis structure 6 Chapter 2 State of the Art 8 2.1 Operators for building generalization 8 2.1.1 Selection 9 2.1.2 Aggregation 9 2.1.3 Simplification 10 2.1.4 Displacement 10 2.2 Researches of building grouping and pattern detection 11 2.2.1 Building grouping 11 2.2.2 Pattern detection 12 2.2.3 Problem analysis . 14 2.3 Researches of building typification 14 2.3.1 Global typification 15 2.3.2 Local typification 15 2.3.3 Comparison analysis 16 2.3.4 Problem analysis 17 2.4 Summary 17 Chapter 3 Using stroke and mesh to recognize building group patterns 18 3.1 Abstract 19 3.2 Introduction 19 3.3 Literature review 20 3.4 Building pattern typology and study area 22 3.4.1 Building pattern typology 22 3.4.2 Study area 24 3.5 Methodology 25 3.5.1 Generating and refining proximity graph 25 3.5.2 Generating stroke and mesh 29 3.5.3 Building pattern recognition 31 3.6 Experiments 33 3.6.1 Data derivation and test framework 33 3.6.2 Pattern recognition results 35 3.6.3 Evaluation 39 3.7 Discussion 40 3.7.1 Adaptation of parameters 40 3.7.2 Ambiguity of building patterns 44 3.7.3 Advantage and Limitation 45 3.8 Conclusion 46 Chapter 4 A typification method for linear building groups based on stroke simplification 47 4.1 Abstract 48 4.2 Introduction 48 4.3 Detection of linear building groups 50 4.3.1 Stroke-based detection method 50 4.3.2 Distinguishing collinear and curvilinear patterns 53 4.4 Typification method 55 4.4.1 Analogy of building typification and line simplification 55 4.4.2 Stroke generation 56 4.4.3 Stroke simplification 57 4.5 Representation of newly typified buildings 60 4.6 Experiment 63 4.6.1 Linear building group detection 63 4.6.2 Typification results 65 4.7 Discussion 66 4.7.1 Comparison of reallocating remained nodes 66 4.7.2 Comparison with classic line simplification method 67 4.7.3 Advantage 69 4.7.4 Further improvement 71 4.8 Conclusion 71 Chapter 5 A mesh-based typification method for building groups with grid patterns 73 5.1 Abstract 74 5.2 Introduction 74 5.3 Related work 75 5.4 Methodology of mesh-based typification 78 5.4.1 Grid pattern classification 78 5.4.2 Mesh generation 79 5.4.3 Triangular mesh elimination 80 5.4.4 Number and positioning of typified buildings 82 5.4.5 Representation of typified buildings 83 5.4.6 Resizing Newly Typified Buildings 85 5.5 Experiments 86 5.5.1 Data derivation 86 5.5.2 Typification results and evaluation 87 5.5.3 Comparison with official map 91 5.6 Discussion 92 5.6.1 Advantages 92 5.6.2 Further improvements 93 5.7 Conclusion 94 Chapter 6 Hierarchical extraction of skeleton structures from discrete buildings 95 6.1 Abstract 96 6.2 Introduction 96 6.3 Related work 97 6.4 Study area 99 6.5 Hierarchical extraction of skeleton structures 100 6.5.1 Proximity Graph Network (PGN) of buildings 100 6.5.2 Centrality analysis of proximity graph network 103 6.5.3 Hierarchical skeleton structures of buildings 108 6.6 Generalization application 111 6.7 Experiment and discussion 114 6.7.1 Data statement 114 6.7.2 Experimental results 115 6.7.3 Discussion 118 6.8 Conclusions 120 Chapter 7 Discussion 121 7.1 Revisiting the research problems 121 7.2 Evaluation of the presented methodology 123 7.2.1 Strengths 123 7.2.2 Limitations 125 Chapter 8 Conclusions 127 8.1 Main contributions 127 8.2 Outlook 128 8.3 Final thoughts 131 Bibliography 132 Acknowledgements 142 Publications 14

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

    Algorithms and methods for large-scale genome rearrangements identification

    Get PDF
    Esta tesis por compendio aborda la definición formal de SB, empezando por Pares de Segmentos de alta puntuación (HSP), los cuales son bien conocidos y aceptados. El primer objetivo se centró en la detección de SB como una combinación de HSPs incluyendo repeticiones lo cual incrementó la complejidad del modelo. Como resultado, se obtuvo un método más preciso y que mejora la calidad de los resultados del estado del arte. Este método aplica reglas basadas en la adyacencia de SBs, permitiendo además detectar LSGR e identificarlos como inversiones, translocaciones o duplicaciones, constituyendo un framework capaz de trabajar con LSGR para organismos de un solo cromosoma. Más tarde en un segundo artículo, se utilizó este framework para refinar los bordes de los SBs. En nuestra novedosa propuesta, las repeticiones que flanquean los SB se utilizaron para refinar los bordes explotando la redundancia introducida por dichas repeticiones. Mediante un alineamiento múltiple de estas repeticiones se calculan los vectores de identidad del SB y de la secuencia consenso de las repeticiones alineadas. Posteriormente, una máquina de estados finitos diseñada para detectar los puntos de transición en la diferencia de ambos vectores determina los puntos de inicio y fin de los SB refinados. Este método también se mostró útil a la hora de detectar "puntos de ruptura" (conocidos como break points (BP)). Estos puntos aparecen como la región entre dos SBs adyacentes. El método no fuerza a que el BP sea una región o un punto, sino que depende de los alineamientos de las repeticiones y del SB en cuestión. El método es aplicado en un tercer trabajo, donde se afronta un caso de uso de análisis de metagenomas. Es bien sabido que la información almacenada en las bases de datos no corresponde necesariamente a las muestras no cultivadas contenidas en un metagenoma, y es posible imaginar que la asignación de una muestra de un metagenoma se vea dificultada por un evento reorganizativo. En el articulo se muestra que las muestras de un metagenoma que mapean sobre las regiones exclusivas de un genoma (aquellas que no comparte con otros genomas) respaldan la presencia de ese genoma en el metagenoma. Estas regiones exclusivas son fácilmente derivadas a partir de una comparación múltiple de genomas, como aquellas regiones que no forman parte de ningún SB. Una definición bajo un espacio de comparación múltiple de genomas es más precisa que las definiciones construidas a partir de una comparación de pares, ya que entre otras cosas, permite un refinamiento siguiendo un procedimiento similar al descrito en el segundo artículo (usando SBs, en vez de repeticiones). Esta definición también resuelve la contradicción existente en la definición de puntos de BPs (mencionado en la segunda publicación), por la cual una misma región de un genoma puede ser detectada como BP o formar parte de un SB dependiendo del genoma con el que se compare. Esta definición de SB en comparación múltiple proporciona además información precisa para la reconstrucción de LSGR, con vistas a obtener una aproximación del verdadero ancestro común entre especies. Además, proporciona una solución para el problema de la granularidad en la detección de SBs: comenzamos por SBs pequeños y bien conservados y a través de la reconstrucción de LSGR se va aumentando gradualmente el tamaño de dichos bloques. Los resultados que se esperan de esta línea de trabajo apuntan a una definición de una métrica destinada a obtener distancias inter genómicas más precisas, combinando similaridad entre secuencias y frecuencias de LSGR.Esta tesis es un compendio de tres artículos recientemente publicados en revistas de alto impacto, en los cuales mostramos el proceso que nos ha llevado a proponer la definición de Unidades Elementales de Conservación (regiones conservadas entre genomas que son detectadas después de una comparación múltiple), así como algunas operaciones básicas como inversiones, transposiciones y duplicaciones. Los tres artículos están transversalmente conectados por la detección de Bloques de Sintenia (SB) y reorganizaciones genómicas de gran escala (LSGR) (consultar sección 2), y respaldan la necesidad de elaborar el framework que se describe en la sección "Systems And Methods". De hecho, el trabajo intelectual llevado a cabo en esta tesis y las conclusiones aportadas por las publicaciones han sido esenciales para entender que una definición de SB apropiada es la clave para muchos de los métodos de comparativa genómica. Los eventos de reorganización del ADN son una de las principales causas de evolución y sus efectos pueden ser observados en nuevas especies, nuevas funciones biológicas etc. Las reorganizaciones a pequeña escala como inserciones, deleciones o substituciones han sido ampliamente estudiadas y existen modelos aceptados para detectarlas. Sin embargo, los métodos para identificar reorganizaciones a gran escala aún sufren de limitaciones y falta de precisión, debido principalmente a que no existe todavía una definición de SB aceptada. El concepto de SB hace referencia a regiones conservadas entre dos genomas que guardan el mismo orden y {strand. A pesar de que existen métodos para detectarlos, éstos evitan tratar con repeticiones o restringen la búsqueda centrándose solamente en las regiones codificantes en aras de un modelo más simple. El refinamiento de los bordes de estos bloques es a día de hoy un problema aún por solucionar
    corecore