5 research outputs found

    Adding Data Parallelism to Streaming Pipelines for Throughput Optimization

    Get PDF
    The streaming model is a popular model for writing high-throughput parallel applications. A streaming application is represented by a graph of computation stages that communicate with each other via FIFO channels. In this report, we consider the problem of mapping streaming pipelines — streaming applications where the graph is a linear chain — in order to maximize throughput. In a parallel setting, subsets of stages, called components can be mapped onto different computing resources. The through-put of an application is determined by the throughput of the slowest component. Therefore, if some stage is much slower than others, then it may be useful to replicate the stage’s code and divide its workload among two or more replicas in order to increase throughput. However, pipelines may consist of some replicable and some non-replicable stages. In this paper, we address the problem of mapping these partially replicable streaming pipelines on both homogeneous and heterogeneous platforms so as to maximize throughput. We consider two types of platforms, homogeneous platforms — where all resources are identical, and heterogeneous platforms — where resources may have different speeds. In both cases, we consider two network topologies — unidirectional chain and clique. We provide polynomial-time algorithms for mapping partially replicable pipelines onto unidirectional chains for both homogeneous and heterogeneous platforms. For homogeneous platforms, the algorithm for unidirectional chains generalizes to clique topologies. However, for heterogeneous platforms, mapping these pipelines onto clique topologies is NP-complete. We provide heuristics to generate solutions for cliques by applying our chain algorithms to a series of chains sampled from the clique. Our empirical results show that these heuristics rapidly converge to near-optimal solutions

    Run Time Approximation of Non-blocking Service Rates for Streaming Systems

    Full text link
    Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires monitoring and optimization of multiple communications links. Most techniques to optimize these links use queueing network models or network flow models, which require some idea of the actual execution rate of each independent compute kernel within the system. What we want to know is how fast can each kernel process data independent of other communicating kernels. This is known as the "service rate" of the kernel within the queueing literature. Current approaches to divining service rates are static. Modern workloads, however, are often dynamic. Shared cloud systems also present applications with highly dynamic execution environments (multiple users, hardware migration, etc.). It is therefore desirable to continuously re-tune an application during run time (online) in response to changing conditions. Our approach enables online service rate monitoring under most conditions, obviating the need for reliance on steady state predictions for what are probably non-steady state phenomena. First, some of the difficulties associated with online service rate determination are examined. Second, the algorithm to approximate the online non-blocking service rate is described. Lastly, the algorithm is implemented within the open source RaftLib framework for validation using a simple microbenchmark as well as two full streaming applications.Comment: technical repor

    Locality-Aware Concurrency Platforms

    Get PDF
    Modern computing systems from all domains are becoming increasingly more parallel. Manufacturers are taking advantage of the increasing number of available transistors by packaging more and more computing resources together on a single chip or within a single system. These platforms generally contain many levels of private and shared caches in addition to physically distributed main memory. Therefore, some memory is more expensive to access than other and high-performance software must consider memory locality as one of the first level considerations. Memory locality is often difficult for application developers to consider directly, however, since many of these NUMA affects are invisible to the application programmer and only show up in low performance. Moreover, on parallel platforms, the performance depends on both locality and load balance and these two metrics are often at odds with each other. Therefore, directly considering locality and load balance at the application level may make the application much more complex to program. In this work, we develop locality-conscious concurrency platforms for multiple different structured parallel programming models, including streaming applications, task-graphs and parallel for loops. In all of this work, the idea is to minimally disrupt the application programming model so that the application developer is either unimpacted or must only provide high-level hints to the runtime system. The runtime system then schedules the application to provide good locality of access while, at the same time also providing good load balance. In particular, we address cache locality for streaming applications through static partitioning and developed an extensible platform to execute partitioned streaming applications. For task-graphs, we extend a task-graph scheduling library to guide scheduling decisions towards better NUMA locality with the help of user-provided locality hints. CilkPlus parallel for loops utilize a randomized dynamic scheduler to distribute work which, in many loop based applications, results in poor locality at all levels of the memory hierarchy. We address this issue with a novel parallel for loop implementation that can get good cache and NUMA locality while providing support to maintain good load balance dynamically

    Bio-Inspired Multi-Spectral Imaging Sensors and Algorithms for Image Guided Surgery

    Get PDF
    Image guided surgery (IGS) utilizes emerging imaging technologies to provide additional structural and functional information to the physician in clinical settings. This additional visual information can help physicians delineate cancerous tissue during resection as well as avoid damage to near-by healthy tissue. Near-infrared (NIR) fluorescence imaging (700 nm to 900 nm wavelengths) is a promising imaging modality for IGS, namely for the following reasons: First, tissue absorption and scattering in the NIR window is very low, which allows for deeper imaging and localization of tumor tissue in the range of several millimeters to a centimeter depending on the tissue surrounding the tumor. Second, spontaneous tissue fluorescence emission is minimal in the NIR region, allowing for high signal-to-background ratio imaging compared to visible spectrum fluorescence imaging. Third, decoupling the fluorescence signal from the visible spectrum allows for optimization of NIR fluorescence while attaining high quality color images. Fourth, there are two FDA approved fluorescent dyes in the NIR region—namely methylene blue (MB) and indocyanine green—which can help to identify tumor tissue due to passive accumulation in human subjects. The aforementioned advantages have led to the development of NIR fluorescence imaging systems for a variety of clinical applications, such as sentinel lymph node imaging, angiography, and tumor margin assessment. With these technological advances, secondary surgeries due to positive tumor margins or damage to healthy organs can be largely mitigated, reducing the emotional and financial toll on the patient. Currently, several NIR fluorescence imaging systems (NFIS) are available commercially or are undergoing clinical trials, such as FLARE, SPY, PDE, Fluobeam, and others. These systems capture multi-spectral images using complex optical equipment and are combined with real-time image processing to present an augmented view to the surgeon. The information is presented on a standard monitor above the operating bed, which requires the physician to stop the surgical procedure and look up at the monitor. The break in the surgical flow sometimes outweighs the benefits of fluorescence based IGS, especially in time-critical surgical situations. Furthermore, these instruments tend to be very bulky and have a large foot print, which significantly complicates their adoption in an already crowded operating room. In this document, I present the development of a compact and wearable goggle system capable of real-time sensing of both NIR fluorescence and color information. The imaging system is inspired by the ommatidia of the monarch butterfly, in which pixelated spectral filters are integrated with light sensitive elements. The pixelated spectral filters are fabricated via a carefully optimized nanofabrication procedure and integrated with a CMOS imaging array. The entire imaging system has been optimized for high signal-to-background fluorescence imaging using an analytical approach, and the efficacy of the system has been experimentally verified. The bio-inspired spectral imaging sensor is integrated with an FPGA for compact and real-time signal processing and a wearable goggle for easy integration in the operating room. The complete imaging system is undergoing clinical trials at Washington University in the St. Louis Medical School for imaging sentinel lymph nodes in both breast cancer patients and melanoma patients

    Adding data parallelism to streaming pipelines for throughput optimization

    No full text
    corecore