36 research outputs found
Incremental Learning of Nonparametric Bayesian Mixture Models
Clustering is a fundamental task in many vision applications.
To date, most clustering algorithms work in a
batch setting and training examples must be gathered in a
large group before learning can begin. Here we explore
incremental clustering, in which data can arrive continuously.
We present a novel incremental model-based clustering
algorithm based on nonparametric Bayesian methods,
which we call Memory Bounded Variational Dirichlet
Process (MB-VDP). The number of clusters are determined
flexibly by the data and the approach can be used to automatically
discover object categories. The computational requirements
required to produce model updates are bounded
and do not grow with the amount of data processed. The
technique is well suited to very large datasets, and we show
that our approach outperforms existing online alternatives
for learning nonparametric Bayesian mixture models
Scalable data clustering using GPUs
The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA\u27s CUDA framework and Tesla architecture, were investigated as a low-cost, high performance solution to a number of clustering algorithms. C-means and Expectation Maximization with Gaussian mixture models were implemented using the CUDA framework. The algorithm implementations use a hybrid of CUDA, OpenMP, and MPI to scale to many GPUs on multiple nodes in a high performance computing environment. This framework is envisioned as part of a larger cloud-based workflow service where biologists can apply multiple algorithms and parameter sweeps to their data sets and quickly receive a thorough set of results that can be further analyzed by experts. Improvements over previous GPU-accelerated implementations range from 1.42x to 21x for C-means and 3.72x to 5.65x for the Gaussian mixture model on non-trivial data sets. Using a single NVIDIA GTX 260 speedups are on average 90x for C-means and 74x for Gaussians with flow cytometry files compared to optimized C code running on a single core of a modern Intel CPU. Using the TeraGrid Lincoln high performance cluster at NCSA C-means achieves 42% parallel efficiency and a CPU speedup of 4794x with 128 Tesla C1060 GPUs. The Gaussian mixture model achieves 72% parallel efficiency and a CPU speedup of 6286x
Constructive spiking neural networks for simulations of neuroplasticity
Artificial neural networks are important tools in machine learning and neuroscience;
however, a difficult step in their implementation is the selection of the neural network size and
structure. This thesis develops fundamental theory on algorithms for constructing neurons in
spiking neural networks and simulations of neuroplasticity. This theory is applied in the
development of a constructive algorithm based on spike-timing- dependent plasticity (STDP) that
achieves continual one-shot learning of hidden spike patterns through neuron construction.
The theoretical developments in this thesis begin with the proposal of a set of definitions of
the fundamental components of constructive neural networks. Disagreement in terminology across the
literature and a lack of clear definitions and requirements for constructive neural networks is a
factor in the poor visibility and fragmentation of research. The proposed definitions are used as
the basis for a generalised methodology for decomposing constructive neural networks into
components to perform comparisons, design and analysis.
Spiking neuron models are uncommon in constructive neural network literature; however, spiking
neurons are common in simulated studies in neuroscience. Spike- timing-dependent construction is
proposed as a distinct class of constructive algorithm for spiking neural networks. Past algorithms
that perform spike-timing-dependent construction are decomposed into defined components for a
detailed critical comparison and found to have limited applicability in simulations of biological
neural networks.
This thesis develops concepts and principles for designing constructive algorithms that are
compatible with simulations of biological neural networks. Simulations often have orders of
magnitude fewer neurons than related biological neural systems; there- fore, the neurons in a
simulation may be assumed to be a selection or subset of a larger neural system with many neurons
not simulated. Neuron construction and pruning may therefore be reinterpreted as the transfer of
neurons between sets of simulated neurons and hypothetical neurons in the neural system.
Constructive algorithms with a functional equivalence to transferring neurons between sets allow
simulated neural networks to maintain biological plausibility while changing size.
The components of a novel constructive algorithm are incrementally developed from the principles
for biological plausibility. First, processes for calculating new synapse weights from observed
simulation activity and estimates of past STDP are developed and analysed. Second, a method for
predicting postsynaptic spike times for synapse weight calculations through the simulation of a proxy for hypothetical neurons is developed. Finally, spike-dependent conditions for neuron construction and pruning are developed and
the processes are combined in a constructive algorithm for simulations of STDP.
Repeating hidden spike patterns can be detected by neurons tuned through STDP; this result is
reproduced in STDP simulations with neuron construction. Tuned neurons become unresponsive to other
activity, preventing detuning but also preventing neurons from learning new spike patterns.
Continual learning is demonstrated through neuron construction with immediate detection of new
spike patterns from one-shot predictions of STDP convergence.
Future research may investigate applications of the developed constructive algorithm in
neuroscience and machine learning. The developed theory on constructive neural networks and
concepts of selective simulation of neurons also provide new directions for future research.Thesis (Ph.D.) -- University of Adelaide, School of Mechanical Engineering, 201
Use of collateral information to improve LANDSAT classification accuracies
There are no author-identified significant results in this report
Inferring cluster-based networks from differently stimulated multiple time-course gene expression data
Motivation: Clustering and gene network inference often help to predict the biological functions of gene subsets. Recently, researchers have accumulated a large amount of time-course transcriptome data collected under different treatment conditions to understand the physiological states of cells in response to extracellular stimuli and to identify drug-responsive genes. Although a variety of statistical methods for clustering and inferring gene networks from expression profiles have been proposed, most of these are not tailored to simultaneously treat expression data collected under multiple stimulation conditions
Recommended from our members
Complex Query Operators on Modern Parallel Architectures
Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators
Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference
Over the past decade, Deep Learning (DL) and Deep Neural Network (DNN) have gone through a rapid development. They are now vastly applied to various applications and have profoundly changed the life of hu- man beings. As an essential element of DNN, Recurrent Neural Networks (RNN) are helpful in processing time-sequential data and are widely used in applications such as speech recognition and machine translation. RNNs are difficult to compute because of their massive arithmetic operations and large memory footprint. RNN inference workloads used to be executed on conventional general-purpose processors including Central Processing Units (CPU) and Graphics Processing Units (GPU); however, they have un- necessary hardware blocks for RNN computation such as branch predictor, caching system, making them not optimal for RNN processing. To accelerate RNN computations and outperform the performance of conventional processors, previous work focused on optimization methods on both software and hardware. On the software side, previous works mainly used model compression to reduce the memory footprint and the arithmetic operations of RNNs. On the hardware side, previous works also designed domain-specific hardware accelerators based on Field Pro- grammable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC) with customized hardware pipelines optimized for efficient pro- cessing of RNNs. By following this software-hardware co-design strategy, previous works achieved at least 10X speedup over conventional processors. Many previous works focused on achieving high throughput with a large batch of input streams. However, in real-time applications, such as gaming Artificial Intellegence (AI), dynamical system control, low latency is more critical. Moreover, there is a trend of offloading neural network workloads to edge devices to provide a better user experience and privacy protection. Edge devices, such as mobile phones and wearable devices, are usually resource-constrained with a tight power budget. They require RNN hard- ware that is more energy-efficient to realize both low-latency inference and long battery life. Brain neurons have sparsity in both the spatial domain and time domain. Inspired by this human nature, previous work mainly explored model compression to induce spatial sparsity in RNNs. The delta network algorithm alternatively induces temporal sparsity in RNNs and can save over 10X arithmetic operations in RNNs proven by previous works.
In this work, we have proposed customized hardware accelerators to exploit temporal sparsity in Gated Recurrent Unit (GRU)-RNNs and Long Short-Term Memory (LSTM)-RNNs to achieve energy-efficient real-time RNN inference. First, we have proposed DeltaRNN, the first-ever RNN accelerator to exploit temporal sparsity in GRU-RNNs. DeltaRNN has achieved 1.2 TOp/s effective throughput with a batch size of 1, which is 15X higher than its related works. Second, we have designed EdgeDRNN to accelerate GRU-RNN edge inference. Compared to DeltaRNN, EdgeDRNN does not rely on on-chip memory to store RNN weights and focuses on reducing off-chip Dynamic Random Access Memory (DRAM) data traffic using a more scalable architecture. EdgeDRNN have realized real-time inference of large GRU-RNNs with submillisecond latency and only 2.3 W wall plug power consumption, achieving 4X higher energy efficiency than commercial edge AI platforms like NVIDIA Jetson Nano. Third, we have used DeltaRNN to realize the first-ever continuous speech recognition sys- tem with the Dynamic Audio Sensor (DAS) as the front-end. The DAS is a neuromorphic event-driven sensor that produces a stream of asyn- chronous events instead of audio data sampled at a fixed sample rate. We have also showcased how an RNN accelerator can be integrated with an event-driven sensor on the same chip to realize ultra-low-power Keyword Spotting (KWS) on the extreme edge. Fourth, we have used EdgeDRNN to control a powered robotic prosthesis using an RNN controller to replace a conventional proportionalāderivative (PD) controller. EdgeDRNN has achieved 21 Ī¼s latency of running the RNN controller and could maintain stable control of the prosthesis. We have used DeltaRNN and EdgeDRNN to solve these problems to prove their value in solving real-world problems. Finally, we have applied the delta network algorithm on LSTM-RNNs and have combined it with a customized structured pruning method, called Column-Balanced Targeted Dropout (CBTD), to induce spatio-temporal sparsity in LSTM-RNNs. Then, we have proposed another FPGA-based accelerator called Spartus, the first RNN accelerator that exploits spatio- temporal sparsity. Spartus achieved 9.4 TOp/s effective throughput with a batch size of 1, the highest among present FPGA-based RNN accelerators with a power budget around 10 W. Spartus can complete the inference of an LSTM layer having 5 million parameters within 1 Ī¼s
Quantitative Methods For Select Problems In Facility Location And Facility Logistics
This dissertation presented three logistics problems. The first problem is a parallel machine scheduling problems that considers multiple unique characteristics including release dates, due dates, limited machine availability and job splitting. The objective of is to minimize the total amount of time required to complete work. A mixed integer programming model is presented and a heuristic is developed for solving the problem. The second problem extends the first parallel scheduling problem to include two additional practical considerations. The first is a setup time that occurs when warehouse staff change from one type of task to another. The second is a fixed time window for employee breaks. A simulated annealing (SA) heuristic is developed for its solution. The last problem studied in this dissertation is a new facility location problem variant with application in disaster relief with both verified data and unverified user-generated data are available for consideration during decision making. A total of three decision strategies that can be used by an emergency manager faced with a POD location decision for which both verified and unverified data are available are proposed: Consider Only Verified, Consider All and Consider Minimax Regret. The strategies differ according to how the uncertain user-generated data is incorporated in the planning process. A computational study to compare the performance of the three decision strategies across a range of plausible disaster scenarios is presented