36 research outputs found

    Cluster validity in clustering methods

    Get PDF

    Incremental Learning of Nonparametric Bayesian Mixture Models

    Get PDF
    Clustering is a fundamental task in many vision applications. To date, most clustering algorithms work in a batch setting and training examples must be gathered in a large group before learning can begin. Here we explore incremental clustering, in which data can arrive continuously. We present a novel incremental model-based clustering algorithm based on nonparametric Bayesian methods, which we call Memory Bounded Variational Dirichlet Process (MB-VDP). The number of clusters are determined flexibly by the data and the approach can be used to automatically discover object categories. The computational requirements required to produce model updates are bounded and do not grow with the amount of data processed. The technique is well suited to very large datasets, and we show that our approach outperforms existing online alternatives for learning nonparametric Bayesian mixture models

    Scalable data clustering using GPUs

    Get PDF
    The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA\u27s CUDA framework and Tesla architecture, were investigated as a low-cost, high performance solution to a number of clustering algorithms. C-means and Expectation Maximization with Gaussian mixture models were implemented using the CUDA framework. The algorithm implementations use a hybrid of CUDA, OpenMP, and MPI to scale to many GPUs on multiple nodes in a high performance computing environment. This framework is envisioned as part of a larger cloud-based workflow service where biologists can apply multiple algorithms and parameter sweeps to their data sets and quickly receive a thorough set of results that can be further analyzed by experts. Improvements over previous GPU-accelerated implementations range from 1.42x to 21x for C-means and 3.72x to 5.65x for the Gaussian mixture model on non-trivial data sets. Using a single NVIDIA GTX 260 speedups are on average 90x for C-means and 74x for Gaussians with flow cytometry files compared to optimized C code running on a single core of a modern Intel CPU. Using the TeraGrid Lincoln high performance cluster at NCSA C-means achieves 42% parallel efficiency and a CPU speedup of 4794x with 128 Tesla C1060 GPUs. The Gaussian mixture model achieves 72% parallel efficiency and a CPU speedup of 6286x

    Constructive spiking neural networks for simulations of neuroplasticity

    Get PDF
    Artificial neural networks are important tools in machine learning and neuroscience; however, a difficult step in their implementation is the selection of the neural network size and structure. This thesis develops fundamental theory on algorithms for constructing neurons in spiking neural networks and simulations of neuroplasticity. This theory is applied in the development of a constructive algorithm based on spike-timing- dependent plasticity (STDP) that achieves continual one-shot learning of hidden spike patterns through neuron construction. The theoretical developments in this thesis begin with the proposal of a set of definitions of the fundamental components of constructive neural networks. Disagreement in terminology across the literature and a lack of clear definitions and requirements for constructive neural networks is a factor in the poor visibility and fragmentation of research. The proposed definitions are used as the basis for a generalised methodology for decomposing constructive neural networks into components to perform comparisons, design and analysis. Spiking neuron models are uncommon in constructive neural network literature; however, spiking neurons are common in simulated studies in neuroscience. Spike- timing-dependent construction is proposed as a distinct class of constructive algorithm for spiking neural networks. Past algorithms that perform spike-timing-dependent construction are decomposed into defined components for a detailed critical comparison and found to have limited applicability in simulations of biological neural networks. This thesis develops concepts and principles for designing constructive algorithms that are compatible with simulations of biological neural networks. Simulations often have orders of magnitude fewer neurons than related biological neural systems; there- fore, the neurons in a simulation may be assumed to be a selection or subset of a larger neural system with many neurons not simulated. Neuron construction and pruning may therefore be reinterpreted as the transfer of neurons between sets of simulated neurons and hypothetical neurons in the neural system. Constructive algorithms with a functional equivalence to transferring neurons between sets allow simulated neural networks to maintain biological plausibility while changing size. The components of a novel constructive algorithm are incrementally developed from the principles for biological plausibility. First, processes for calculating new synapse weights from observed simulation activity and estimates of past STDP are developed and analysed. Second, a method for predicting postsynaptic spike times for synapse weight calculations through the simulation of a proxy for hypothetical neurons is developed. Finally, spike-dependent conditions for neuron construction and pruning are developed and the processes are combined in a constructive algorithm for simulations of STDP. Repeating hidden spike patterns can be detected by neurons tuned through STDP; this result is reproduced in STDP simulations with neuron construction. Tuned neurons become unresponsive to other activity, preventing detuning but also preventing neurons from learning new spike patterns. Continual learning is demonstrated through neuron construction with immediate detection of new spike patterns from one-shot predictions of STDP convergence. Future research may investigate applications of the developed constructive algorithm in neuroscience and machine learning. The developed theory on constructive neural networks and concepts of selective simulation of neurons also provide new directions for future research.Thesis (Ph.D.) -- University of Adelaide, School of Mechanical Engineering, 201

    Use of collateral information to improve LANDSAT classification accuracies

    Get PDF
    There are no author-identified significant results in this report

    Inferring cluster-based networks from differently stimulated multiple time-course gene expression data

    Get PDF
    Motivation: Clustering and gene network inference often help to predict the biological functions of gene subsets. Recently, researchers have accumulated a large amount of time-course transcriptome data collected under different treatment conditions to understand the physiological states of cells in response to extracellular stimuli and to identify drug-responsive genes. Although a variety of statistical methods for clustering and inferring gene networks from expression profiles have been proposed, most of these are not tailored to simultaneously treat expression data collected under multiple stimulation conditions

    ALFALFA : fast and accurate mapping of long next generation sequencing reads

    Get PDF

    Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference

    Full text link
    Over the past decade, Deep Learning (DL) and Deep Neural Network (DNN) have gone through a rapid development. They are now vastly applied to various applications and have profoundly changed the life of hu- man beings. As an essential element of DNN, Recurrent Neural Networks (RNN) are helpful in processing time-sequential data and are widely used in applications such as speech recognition and machine translation. RNNs are difficult to compute because of their massive arithmetic operations and large memory footprint. RNN inference workloads used to be executed on conventional general-purpose processors including Central Processing Units (CPU) and Graphics Processing Units (GPU); however, they have un- necessary hardware blocks for RNN computation such as branch predictor, caching system, making them not optimal for RNN processing. To accelerate RNN computations and outperform the performance of conventional processors, previous work focused on optimization methods on both software and hardware. On the software side, previous works mainly used model compression to reduce the memory footprint and the arithmetic operations of RNNs. On the hardware side, previous works also designed domain-specific hardware accelerators based on Field Pro- grammable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC) with customized hardware pipelines optimized for efficient pro- cessing of RNNs. By following this software-hardware co-design strategy, previous works achieved at least 10X speedup over conventional processors. Many previous works focused on achieving high throughput with a large batch of input streams. However, in real-time applications, such as gaming Artificial Intellegence (AI), dynamical system control, low latency is more critical. Moreover, there is a trend of offloading neural network workloads to edge devices to provide a better user experience and privacy protection. Edge devices, such as mobile phones and wearable devices, are usually resource-constrained with a tight power budget. They require RNN hard- ware that is more energy-efficient to realize both low-latency inference and long battery life. Brain neurons have sparsity in both the spatial domain and time domain. Inspired by this human nature, previous work mainly explored model compression to induce spatial sparsity in RNNs. The delta network algorithm alternatively induces temporal sparsity in RNNs and can save over 10X arithmetic operations in RNNs proven by previous works. In this work, we have proposed customized hardware accelerators to exploit temporal sparsity in Gated Recurrent Unit (GRU)-RNNs and Long Short-Term Memory (LSTM)-RNNs to achieve energy-efficient real-time RNN inference. First, we have proposed DeltaRNN, the first-ever RNN accelerator to exploit temporal sparsity in GRU-RNNs. DeltaRNN has achieved 1.2 TOp/s effective throughput with a batch size of 1, which is 15X higher than its related works. Second, we have designed EdgeDRNN to accelerate GRU-RNN edge inference. Compared to DeltaRNN, EdgeDRNN does not rely on on-chip memory to store RNN weights and focuses on reducing off-chip Dynamic Random Access Memory (DRAM) data traffic using a more scalable architecture. EdgeDRNN have realized real-time inference of large GRU-RNNs with submillisecond latency and only 2.3 W wall plug power consumption, achieving 4X higher energy efficiency than commercial edge AI platforms like NVIDIA Jetson Nano. Third, we have used DeltaRNN to realize the first-ever continuous speech recognition sys- tem with the Dynamic Audio Sensor (DAS) as the front-end. The DAS is a neuromorphic event-driven sensor that produces a stream of asyn- chronous events instead of audio data sampled at a fixed sample rate. We have also showcased how an RNN accelerator can be integrated with an event-driven sensor on the same chip to realize ultra-low-power Keyword Spotting (KWS) on the extreme edge. Fourth, we have used EdgeDRNN to control a powered robotic prosthesis using an RNN controller to replace a conventional proportionalā€“derivative (PD) controller. EdgeDRNN has achieved 21 Ī¼s latency of running the RNN controller and could maintain stable control of the prosthesis. We have used DeltaRNN and EdgeDRNN to solve these problems to prove their value in solving real-world problems. Finally, we have applied the delta network algorithm on LSTM-RNNs and have combined it with a customized structured pruning method, called Column-Balanced Targeted Dropout (CBTD), to induce spatio-temporal sparsity in LSTM-RNNs. Then, we have proposed another FPGA-based accelerator called Spartus, the first RNN accelerator that exploits spatio- temporal sparsity. Spartus achieved 9.4 TOp/s effective throughput with a batch size of 1, the highest among present FPGA-based RNN accelerators with a power budget around 10 W. Spartus can complete the inference of an LSTM layer having 5 million parameters within 1 Ī¼s

    Quantitative Methods For Select Problems In Facility Location And Facility Logistics

    Get PDF
    This dissertation presented three logistics problems. The first problem is a parallel machine scheduling problems that considers multiple unique characteristics including release dates, due dates, limited machine availability and job splitting. The objective of is to minimize the total amount of time required to complete work. A mixed integer programming model is presented and a heuristic is developed for solving the problem. The second problem extends the first parallel scheduling problem to include two additional practical considerations. The first is a setup time that occurs when warehouse staff change from one type of task to another. The second is a fixed time window for employee breaks. A simulated annealing (SA) heuristic is developed for its solution. The last problem studied in this dissertation is a new facility location problem variant with application in disaster relief with both verified data and unverified user-generated data are available for consideration during decision making. A total of three decision strategies that can be used by an emergency manager faced with a POD location decision for which both verified and unverified data are available are proposed: Consider Only Verified, Consider All and Consider Minimax Regret. The strategies differ according to how the uncertain user-generated data is incorporated in the planning process. A computational study to compare the performance of the three decision strategies across a range of plausible disaster scenarios is presented
    corecore