9,465 research outputs found

    Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture

    Get PDF
    Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks are designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature levels. After designing a set of networks, these models may be combined into a single network topology using graph optimization techniques. This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common network layers, and can be further optimized by retraining to achieve an improved model compared to the individual topologies. In this study, four SPDNN models are trained and have been evaluated at 2 stages on the KITTI dataset. The ground truth images in the first part of the experiment are provided by the benchmark, and for the second part, the ground truth images are the depth map results from applying a state-of-the-art stereo matching method. The results of this evaluation demonstrate that using post-processing techniques to refine the target of the network increases the accuracy of depth estimation on individual mono images. The second evaluation shows that using segmentation data alongside the original data as the input can improve the depth estimation results to a point where performance is comparable with stereo depth estimation. The computational time is also discussed in this study.Comment: 44 pages, 25 figure

    String Matching Problems with Parallel Approaches An Evaluation for the Most Recent Studies

    Get PDF
    In recent years string matching plays a functional role in many application like information retrieval, gene analysis, pattern recognition, linguistics, bioinformatics etc. For understanding the functional requirements of string matching algorithms, we surveyed the real time parallel string matching patterns to handle the current trends. Primarily, in this paper, we focus on present developments of parallel string matching, and the central ideas of the algorithms and their complexities. We present the performance of the different algorithms and their effectiveness. Finally this analysis helps the researchers to develop the better techniques

    Revisiting Matrix Product on Master-Worker Platforms

    Get PDF
    This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at Ecole Normale Superieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms

    Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

    Full text link
    Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

    Revisiting Actor Programming in C++

    Full text link
    The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. In this paper, we report on designing and building CAF, the "C++ Actor Framework". CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts---in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities---that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPGPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime behaviour for up to 64 cores at very low memory footprint, or in the presence of GPUs. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page
    • …
    corecore