Search CORE

586 research outputs found

Specification and implementation of a data warehousing system for the ATLAS' distributed data management system

Author: Salgado Pedro Emanuel de Castro Faria
Publication venue
Publication date: 01/01/2008
Field of study

Estágio realizado na ATLAS Distributed Computing Group e orientado por Markus ElsingTese de mestrado integrado. Engenharia Informátca e Computação. Faculdade de Engenharia. Universidade do Porto. 200

Repositório Aberto da Universidade do Porto

Monocular 3d Object Recognition

Author: Zhu Menglong
Publication venue: ScholarlyCommons
Publication date: 01/01/2016
Field of study

Object recognition is one of the fundamental tasks of computer vision. Recent advances in the field enable reliable 2D detections from a single cluttered image. However, many challenges still remain. Object detection needs timely response for real world applications. Moreover, we are genuinely interested in estimating the 3D pose and shape of an object or human for the sake of robotic manipulation and human-robot interaction. In this thesis, a suite of solutions to these challenges is presented. First, Active Deformable Part Models (ADPM) is proposed for fast part-based object detection. ADPM dramatically accelerates the detection by dynamically scheduling the part evaluations and efficiently pruning the image locations. Second, we unleash the power of marrying discriminative 2D parts with an explicit 3D geometric representation. Several methods of such scheme are proposed for recovering rich 3D information of both rigid and non-rigid objects from monocular RGB images. (1) The accurate 3D pose of an object instance is recovered from cluttered images using only the CAD model. (2) A global optimal solution for simultaneous 2D part localization, 3D pose and shape estimation is obtained by optimizing a unified convex objective function. Both appearance and geometric compatibility are jointly maximized. (3) 3D human pose estimation from an image sequence is realized via an Expectation-Maximization algorithm. The 2D joint location uncertainties are marginalized out during inference and 3D pose smoothness is enforced across frames. By bridging the gap between 2D and 3D, our methods provide an end-to-end solution to 3D object recognition from images. We demonstrate a range of interesting applications using only a single image or a monocular video, including autonomous robotic grasping with a single image, 3D object image pop-up and a monocular human MoCap system. We also show empirical start-of-art results on a number of benchmarks on 2D detection and 3D pose and shape estimation

ScholarlyCommons@Penn

Auto-tuning compiler options for HPC

Author: Jones Jessica
Publication venue
Publication date: 04/09/2019
Field of study

OPUS

Classification algorithms on the cell processor

Author: Wyganowski Mateusz
Publication venue: RIT Scholar Works
Publication date: 01/08/2008
Field of study

The rapid advancement in the capacity and reliability of data storage technology has allowed for the retention of virtually limitless quantity and detail of digital information. Massive information databases are becoming more and more widespread among governmental, educational, scientific, and commercial organizations. By segregating this data into carefully defined input (e.g.: images) and output (e.g.: classification labels) sets, a classification algorithm can be used develop an internal expert model of the data by employing a specialized training algorithm. A properly trained classifier is capable of predicting the output for future input data from the same input domain that it was trained on. Two popular classifiers are Neural Networks and Support Vector Machines. Both, as with most accurate classifiers, require massive computational resources to carry out the training step and can take months to complete when dealing with extremely large data sets. In most cases, utilizing larger training improves the final accuracy of the trained classifier. However, access to the kinds of computational resources required to do so is expensive and out of reach of private or under funded institutions. The Cell Broadband Engine (CBE), introduced by Sony, Toshiba, and IBM has recently been introduced into the market. The current most inexpensive iteration is available in the Sony Playstation 3 ® computer entertainment system. The CBE is a novel multi-core architecture which features many hardware enhancements designed to accelerate the processing of massive amounts of data. These characteristics and the cheap and widespread availability of this technology make the Cell a prime candidate for the task of training classifiers. In this work, the feasibility of the Cell processor in the use of training Neural Networks and Support Vector Machines was explored. In the Neural Network family of classifiers, the fully connected Multilayer Perceptron and Convolution Network were implemented. In the Support Vector Machine family, a Working Set technique known as the Gradient Projection-based Decomposition Technique, as well as the Cascade SVM were implemented

RIT Scholar Works

Scalable Graph Analysis and Clustering on Commodity Hardware

Author: Mhembere Disa N.
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 30/10/2019
Field of study

The abundance of large-scale datasets both in industry and academia today has lead to a need for scalable data analysis frameworks and libraries. This assertion is exceedingly apparent in large-scale graph datasets. The vast majority of existing frameworks focus on distributing computation within a cluster, neglecting to fully utilize each individual node, leading to poor overall performance. This thesis is motivated by the prevalence of Non-Uniform Memory Access (NUMA) architectures within multicore machines and the advancements in the performance of external memory devices like SSDs. This thesis focusses on the development of machine learning frameworks, libraries, and application development principles to enable scalable data analysis, with minimal resource consumption. We develop novel optimizations that leverage fine-grain I/O and NUMA-awareness to advance the state-of-the-art within the areas of scalable graph analytics and machine learning. We focus on minimality, scalability and memory parallelism when data reside either in (i) memory, (ii) semi-externally, or (iii) distributed memory. We target two core areas: (i) graph analytics and (ii) community detection (clustering). The semi-external memory (SEM) paradigm is an attractive middle ground for limited resource consumption and near-in-memory performance on a single thick compute node. In recent years, its adoption has steadily risen in popularity with framework developers, despite having limited adoption from application developers. We address key questions surrounding the development of state-of-the-art applications within an SEM, vertex-centric graph framework. Our target is to lower the barrier for entry to SEM, vertex-centric application development. As such, we develop Graphyti, a library of highly optimized applications in Semi-External Memory (SEM) using the FlashGraph framework. We utilize this library to identify the core principles that underlie the development of state-of-the-art vertex-centric graph applications in SEM. We then address scaling the task of community detection through clustering given arbitrary hardware budgets. We develop the clusterNOR extensible clustering framework and library with facilities for optimized scale-out and scale-up computation. In summation, this thesis develops key SEM design principles for graph analytics, introduces novel algorithmic and systems-oriented optimizations for scalable algorithms that utilize a two-step Majorize-Minimization or Minorize-Maximization (MM) objective function optimization pattern. The optimizations we develop enable the applications and libraries provided to attain state-of-the-art performance in varying memory settings

JScholarship

KOLAM : human computer interfaces fro visual analytics in big data imagery

Author: Haridas Anoop
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

In the present day, we are faced with a deluge of disparate and dynamic information from multiple heterogeneous sources. Among these are the big data imagery datasets that are rapidly being generated via mature acquisition methods in the geospatial, surveillance (specifically, Wide Area Motion Imagery or WAMI) and biomedical domains. The need to interactively visualize these imagery datasets by using multiple types of views (as needed) into the data is common to these domains. Furthermore, researchers in each domain have additional needs: users of WAMI datasets also need to interactively track objects of interest using algorithms of their choice, visualize the resulting object trajectories and interactively edit these results as needed. While software tools that fulfill each of these requirements individually are available and well-used at present, there is still a need for tools that can combine the desired aspects of visualization, human computer interaction (HCI), data analysis, data management, and (geo-)spatial and temporal data processing into a single flexible and extensible system. KOLAM is an open, cross-platform, interoperable, scalable and extensible framework for visualization and analysis that we have developed to fulfil the above needs. The novel contributions in this thesis are the following: 1) Spatio-temporal caching for animating both giga-pixel and Full Motion Video (FMV) imagery, 2) Human computer interfaces purposefully designed to accommodate big data visualization, 3) Human-in-the-loop interactive video object tracking - ground-truthing of moving objects in wide area imagery using algorithm assisted human-in-the-loop coupled tracking, 4) Coordinated visualization using stacked layers, side-by-side layers/video sub-windows and embedded imagery, 5) Efficient one-click manual tracking, editing and data management of trajectories, 6) Efficient labeling of image segmentation regions and passing these results to desired modules, 7) Visualization of image processing results generated by non-interactive operators using layers, 8) Extension of interactive imagery and trajectory visualization to multi-monitor wall display environments, 9) Geospatial applications: Providing rapid roam, zoom and hyper-jump spatial operations, interactive blending, colormap and histogram enhancement, spherical projection and terrain maps, 10) Biomedical applications: Visualization and target tracking of cell motility in time-lapse cell imagery, collecting ground-truth from experts on whole-slide imagery (WSI) for developing histopathology analytic algorithms and computer-aided diagnosis for cancer grading, and easy-to-use tissue annotation features.Includes bibliographical reference

University of Missouri: MOspace

Purple Computational Environment With Mappings to ACE Requirements for the General Availability User Environment Capabilities

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Facilitating High Performance Code Parallelization

Author: Abi Saad Maria
Publication venue: SURFACE at Syracuse University
Publication date: 14/05/2017
Field of study

With the surge of social media on one hand and the ease of obtaining information due to cheap sensing devices and open source APIs on the other hand, the amount of data that can be processed is as well vastly increasing. In addition, the world of computing has recently been witnessing a growing shift towards massively parallel distributed systems due to the increasing importance of transforming data into knowledge in today’s data-driven world. At the core of data analysis for all sorts of applications lies pattern matching. Therefore, parallelizing pattern matching algorithms should be made efficient in order to cater to this ever-increasing abundance of data. We propose a method that automatically detects a user’s single threaded function call to search for a pattern using Java’s standard regular expression library, and replaces it with our own data parallel implementation using Java bytecode injection. Our approach facilitates parallel processing on different platforms consisting of shared memory systems (using multithreading and NVIDIA GPUs) and distributed systems (using MPI and Hadoop). The major contributions of our implementation consist of reducing the execution time while at the same time being transparent to the user. In addition to that, and in the same spirit of facilitating high performance code parallelization, we present a tool that automatically generates Spark Java code from minimal user-supplied inputs. Spark has emerged as the tool of choice for efficient big data analysis. However, users still have to learn the complicated Spark API in order to write even a simple application. Our tool is easy to use, interactive and offers Spark’s native Java API performance. To the best of our knowledge and until the time of this writing, such a tool has not been yet implemented

Syracuse University Research Facility and Collaborative Environment

Recommended from our members

Scalable Emulation of Heterogeneous Systems

Author: Garcia Cota Emilio
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors. To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution. To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators. We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration

Columbia University Academic Commons

Ontology design and management for eCare services

Author: Ongenae Femke
Publication venue: Department of Information technology
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography