281 research outputs found

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    ARC 2014: a multidimensional FPGA-based parallel DBSCAN architecture

    Get PDF

    Separation logic for high-level synthesis

    Get PDF
    High-level synthesis (HLS) promises a significant shortening of the digital hardware design cycle by raising the abstraction level of the design entry to high-level languages such as C/C++. However, applications using dynamic, pointer-based data structures remain difficult to implement well, yet such constructs are widely used in software. Automated optimisations that leverage the memory bandwidth of dedicated hardware implementations by distributing the application data over separate on-chip memories and parallelise the implementation are often ineffective in the presence of dynamic data structures, due to the lack of an automated analysis that disambiguates pointer-based memory accesses. This thesis takes a step towards closing this gap. We explore recent advances in separation logic, a rigorous mathematical framework that enables formal reasoning about the memory access of heap-manipulating programs. We develop a static analysis that automatically splits heap-allocated data structures into provably disjoint regions. Our algorithm focuses on dynamic data structures accessed in loops and is accompanied by automated source-to-source transformations which enable loop parallelisation and physical memory partitioning by off-the-shelf HLS tools. We then extend the scope of our technique to pointer-based memory-intensive implementations that require access to an off-chip memory. The extended HLS design aid generates parallel on-chip multi-cache architectures. It uses the disjointness property of memory accesses to support non-overlapping memory regions by private caches. It also identifies regions which are shared after parallelisation and which are supported by parallel caches with a coherency mechanism and synchronisation, resulting in automatically specialised memory systems. We show up to 15x acceleration from heap partitioning, parallelisation and the insertion of the custom cache system in demonstrably practical applications.Open Acces

    Theoretical and algorithmic approaches to field-programmable gate array partitioning

    Get PDF
    Many practical problems dealing with the design of Very Large Scale Integrated (VLSI) circuits can be modeled as graphs in which vertices represent components of the circuit and edges represent a relationship between these components. When expressed as graphs, these problems can then often be solved using graph theoretic methods. Unfortunately, many such problems are NP-complete, hence no practical exact solutions are known to exist. In this dissertation, we study NP-complete problems taken from the realm of partitioning for Field-Programmable Gate Arrays (FPGAs). We adopt a two-pronged approach of theory and practice, developing practical heuristics driven by theoretical study. The theoretical approach is motivated by well-quasi-order (WQO) theory, which can be used to show, among other things, that when some hard problems have fixed parameters, polynomial-time solutions exist. This is of significance in the area of FPGA partitioning, in which practical problems are often characterized by fixed parameter instances. WQO techniques are not generally practical, however, and we develop new methods to solve several important problems in VLSI that are not even amenable to WQO techniques. We begin with a representative partitioning problem, Min Degree Graph Partition (MDGP), the fixed-parameter version of which is closed under the immersion order. \Ve show that the obstruction set ( set of immersion minimal elements) for this problem is computable; we prove both upper and lower bounds on the obstruction set size; and we completely characterize all fixed-parameter MDGP simple tree obstructions. WQO theory tells us only that fixed-parameter MDGP is solvable in (high-degree) polynomial time. We attack the problem using what we refer to as kd-candidate subsets, culminating in linear-time decision and search algorithms. The kd-candidate subset method also paves the way for an efficient heuristic for the FPGA Minimization problem. We then broaden our scope to incorporate delay minimization into FPGA partitioning. We develop, analyze and test a novel method called critical path compression, inspired in part by compiler optimization techniques. We then look at a variety of generalizations of MDGP. Some of these problems are not immersion closed; others are not even defined in a way that WQO theory applies. However, almost all of them are efficiently solvable via the kd-candidate subset approach. Interspersed in these results are many refinements of what is known about the complexity of these problems. We also discuss a few other solution strategies, and present many open problems

    Learning from Multi-Class Imbalanced Big Data with Apache Spark

    Get PDF
    With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results as learning from the rare cases may be the primary goal. Many of the classic machine learning algorithms were not designed for multi-class, imbalanced data or parallelism, and so their effectiveness has been hindered. This dissertation addresses some of these challenges with in-depth experimentation using novel implementations of machine learning algorithms using Apache Spark, a distributed computing framework based on the MapReduce model designed to handle very large datasets. Experimentation showed that many of the traditional classifier algorithms do not translate well to a distributed computing environment, indicating the need for a new generation of algorithms targeting modern high-performance computing. A collection of popular oversampling methods, originally designed for small binary class datasets, have been implemented using Apache Spark for the first time to improve parallelism and add multi-class support. An extensive study on how instance level difficulty affects the learning from large datasets was also performed

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Fast feature matching for simultaneous localization and mapping

    Get PDF
    Bakalářská práce se zabývá rychlým vyhledáváním lokálních obrazových vlastností v rozsáhlých databázích pro simultánní lokalizaci a mapování prostředí. Součástí práce je krátký přehled detektorů a deskriptorů invariantních vůči rotaci, translaci, změně měřítka a affinitě. Pro řadu aplikací z oblasti počítačového vidění (SLAM, object retrieval, wide–robust baseline stereo, tracking, . . . ) je odezva reálném čase naprosto nezbytná. Jako řešení sublineární časové náročnosti vyhledávání v databázích bylo navrženo použití vícenásobných náhodně generovaných KD–stromů. Dále je předkládán nový způsob dělení dat do vícenásobných KD–stromů. Navíc byl navržen nový, obecně použitelný vyhodnocovací software (podporovány jsou KD–stromy, BBD-stromy a k-means stromy.)The thesis deals with the fast feature matching for simultaneous localization and mapping. A brief description of local features invariant to scale, rotation, translation and affine transformations, their detectors and descriptors are included. In general, real–time response for matching is crucial for various computer vision applications (SLAM, object retrieval, wide–robust baseline stereo, tracking, . . . ). We solve the problem of sub–linear search complexity by multiple randomised KD–trees. In addition, we propose a novel way of splitting dataset into the multiple trees. Moreover, a new evaluation package for general use (KD–trees, BBD–trees, k–means trees) was developed.

    MATLAB

    Get PDF
    A well-known statement says that the PID controller is the "bread and butter" of the control engineer. This is indeed true, from a scientific standpoint. However, nowadays, in the era of computer science, when the paper and pencil have been replaced by the keyboard and the display of computers, one may equally say that MATLAB is the "bread" in the above statement. MATLAB has became a de facto tool for the modern system engineer. This book is written for both engineering students, as well as for practicing engineers. The wide range of applications in which MATLAB is the working framework, shows that it is a powerful, comprehensive and easy-to-use environment for performing technical computations. The book includes various excellent applications in which MATLAB is employed: from pure algebraic computations to data acquisition in real-life experiments, from control strategies to image processing algorithms, from graphical user interface design for educational purposes to Simulink embedded systems

    Photorealistic Rendering Using "Photon Mapping" Method

    Get PDF
    Tato práce se zabýva metodou photon mappingu. Nejprve byla naimplementována jednoduchá metoda photon mappingu a poté jeji progresivní varianta byla naimplementována na procesoru a grafické kartě. Po implementaci progressivní varianty photon mappingu na GPU, několik akceleračních technik bylo navrženo. Na konci této práce byl představený genetický klustrovací algoritmus, který se snaží pomocí vhodnějších clusterů urychlit čas výpočtu photon mappingu na gpu.This master thesis focuses on photon mapping rendering technique. A simple photon mapping was implemented as a baseline and then progressive photon mapping was prepared for CPU and GPU. After implementing progressive photon mapping on GPU, further acceleration techniques were proposed. Finally, in the thesis, genetic clustering algorithm for suitable clusters on GPU was proposed.
    • …
    corecore