984 research outputs found
Superpixel Convolutional Networks using Bilateral Inceptions
In this paper we propose a CNN architecture for semantic image segmentation.
We introduce a new 'bilateral inception' module that can be inserted in
existing CNN architectures and performs bilateral filtering, at multiple
feature-scales, between superpixels in an image. The feature spaces for
bilateral filtering and other parameters of the module are learned end-to-end
using standard backpropagation techniques. The bilateral inception module
addresses two issues that arise with general CNN segmentation architectures.
First, this module propagates information between (super) pixels while
respecting image edges, thus using the structured information of the problem
for improved results. Second, the layer recovers a full resolution segmentation
result from the lower resolution solution of a CNN. In the experiments, we
modify several existing CNN architectures by inserting our inception module
between the last CNN (1x1 convolution) layers. Empirical results on three
different datasets show reliable improvements not only in comparison to the
baseline networks, but also in comparison to several dense-pixel prediction
techniques such as CRFs, while being competitive in time.Comment: European Conference on Computer Vision (ECCV), 201
High-performance time-series quantitative retrieval from satellite images on a GPU cluster
The quality and accuracy of remote sensing instruments continue to increase, allowing geoscientists to perform various quantitative retrieval applications to observe the geophysical variables of land, atmosphere, ocean, etc. The explosive growth of time-series remote sensing (RS) data over large-scales poses great challenges on managing, processing, and interpreting RS ‘‘Big Data.’’ To explore these time-series RS data efficiently, in this paper, we design and implement a high-performance framework to address the time-consuming time-series quantitative retrieval issue on a graphics processing unit cluster, taking the aerosol optical depth (AOD) retrieval from satellite images as a study case. The presented framework exploits the multilevel parallelism for time-series quantitative RS retrieval to promote efficiency. At the coarse-grained level of parallelism, the AOD time-series retrieval is represented as multidirected acyclic graph workflows and scheduled based on a list-based heuristic algorithm, heterogeneous earliest finish time, taking the idle slot and priorities of retrieval jobs into account. At the fine-grained level, the parallel strategies for the major remote sensing image processing algorithms divided into three categories, i.e., the point or pixel-based operations, the local operations, and the global or irregular operations have been summarized. The parallel framework was implemented with message passing interface and compute unified device architecture, and experimental results with the AOD retrieval case verify the effectiveness of the presented framework.N/
Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search
Though recent years have witnessed remarkable progress in single image
super-resolution (SISR) tasks with the prosperous development of deep neural
networks (DNNs), the deep learning methods are confronted with the computation
and memory consumption issues in practice, especially for resource-limited
platforms such as mobile devices. To overcome the challenge and facilitate the
real-time deployment of SISR tasks on mobile, we combine neural architecture
search with pruning search and propose an automatic search framework that
derives sparse super-resolution (SR) models with high image quality while
satisfying the real-time inference requirement. To decrease the search cost, we
leverage the weight sharing strategy by introducing a supernet and decouple the
search problem into three stages, including supernet construction,
compiler-aware architecture and pruning search, and compiler-aware pruning
ratio search. With the proposed framework, we are the first to achieve
real-time SR inference (with only tens of milliseconds per frame) for
implementing 720p resolution with competitive image quality (in terms of PSNR
and SSIM) on mobile platforms (Samsung Galaxy S20)
Fast Machine Learning Algorithms for Massive Datasets with Applications in the Biomedical Domain
The continuous increase in the size of datasets introduces computational challenges for machine learning algorithms. In this dissertation, we cover the machine learning algorithms and applications in large-scale data analysis in manufacturing and healthcare. We begin with introducing a multilevel framework to scale the support vector machine (SVM), a popular supervised learning algorithm with a few tunable hyperparameters and highly accurate prediction. The computational complexity of nonlinear SVM is prohibitive on large-scale datasets compared to the linear SVM, which is more scalable for massive datasets. The nonlinear SVM has shown to produce significantly higher classification quality on complex and highly imbalanced datasets. However, a higher classification quality requires a computationally expensive quadratic programming solver and extra kernel parameters for model selection. We introduce a generalized fast multilevel framework for regular, weighted, and instance weighted SVM that achieves similar or better classification quality compared to the state-of-the-art SVM libraries such as LIBSVM. Our framework improves the runtime more than two orders of magnitude for some of the well-known benchmark datasets. We cover multiple versions of our proposed framework and its implementation in detail. The framework is implemented using PETSc library which allows easy integration with scientific computing tasks. Next, we propose an adaptive multilevel learning framework for SVM to reduce the variance between prediction qualities across the levels, improve the overall prediction accuracy, and boost the runtime. We implement multi-threaded support to speed up the parameter fitting runtime that results in more than an order of magnitude speed-up. We design an early stopping criteria to reduce the extra computational cost when we achieve expected prediction quality. This approach provides significant speed-up, especially for massive datasets. Finally, we propose an efficient low dimensional feature extraction over massive knowledge networks. Knowledge networks are becoming more popular in the biomedical domain for knowledge representation. Each layer in knowledge networks can store the information from one or multiple sources of data. The relationships between concepts or between layers represent valuable information. The proposed feature engineering approach provides an efficient and highly accurate prediction of the relationship between biomedical concepts on massive datasets. Our proposed approach utilizes semantics and probabilities to reduce the potential search space for the exploration and learning of machine learning algorithms. The calculation of probabilities is highly scalable with the size of the knowledge network. The number of features is fixed and equivalent to the number of relationships or classes in the data. A comprehensive comparison of well-known classifiers such as random forest, SVM, and deep learning over various features extracted from the same dataset, provides an overview for performance and computational trade-offs. Our source code, documentation and parameters will be available at https://github.com/esadr/
Video Processing Acceleration using Reconfigurable Logic and Graphics Processors
A vexing question is `which architecture will prevail as the core feature of the next state of
the art video processing system?' This thesis examines the substitutive and collaborative
use of the two alternatives of the reconfigurable logic and graphics processor architectures.
A structured approach to executing architecture comparison is presented - this includes a
proposed `Three Axes of Algorithm Characterisation' scheme and a formulation of perfor-
mance drivers. The approach is an appealing platform for clearly defining the problem,
assumptions and results of a comparison. In this work it is used to resolve the advanta-
geous factors of the graphics processor and reconfigurable logic for video processing, and
the conditions determining which one is superior. The comparison results prompt the
exploration of the customisable options for the graphics processor architecture. To clearly
define the architectural design space, the graphics processor is first identifed as part of
a wider scope of homogeneous multi-processing element (HoMPE) architectures. A novel
exploration tool is described which is suited to the investigation of the customisable op-
tions of HoMPE architectures. The tool adopts a systematic exploration approach and a
high-level parameterisable system model, and is used to explore pre- and post-fabrication
customisable options for the graphics processor. A positive result of the exploration is the
proposal of a reconfigurable engine for data access (REDA) to optimise graphics processor
performance for video processing-specific memory access patterns. REDA demonstrates
the viability of the use of reconfigurable logic as collaborative `glue logic' in the graphics
processor architecture
- …