618 research outputs found

    GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

    Full text link
    Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.Comment: 17 pages, 13 figure

    Uma exploração de processamento associativo com o simulador RV-Across

    Get PDF
    Orientador: Lucas Francisco WannerDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Trabalhos na academia apontam para um gargalo de desempenho entre o processador e a memória. Esse gargalo se destaca na execução de aplicativos, como o Aprendizado de Máquina, que processam uma grande quantidade de dados. Nessas aplicações, a movimentação de dados representa uma parcela significativa tanto em termos de tempo de processamento quanto de consumo de energia. O uso de novas arquiteturas multi-core, aceleradores e Unidades de Processamento Gráfico (Graphics Processing Unit --- GPU) pode melhorar o desempenho desses aplicativos por meio do processamento paralelo. No entanto, a utilização dessas arquiteturas não elimina a necessidade de mover dados, que passam por diferentes níveis de uma hierarquia de memória para serem processados. Nosso trabalho explora o Processamento em Memória (Processing in Memory --- PIM), especificamente o Processamento Associativo, como alternativa para acelerar as aplicações, processando seus dados em paralelo na memória permitindo melhor desempenho do sistema e economia de energia. O Processamento Associativo fornece computação paralela de alto desempenho e com baixo consumo de energia usando uma Memória endereçável por conteúdo (Content-Adressable Memory --- CAM). Através do poder de comparação e escrita em paralelo do CAM, complementado por registradores especiais de controle e tabelas de consulta (Lookup Tables), é possível realizar operações entre vetores de dados utilizando um número pequeno e constante de ciclos por operação. Em nosso trabalho, analisamos o potencial do Processamento Associativo em termos de tempo de execução e consumo de energia em diferentes kernels de aplicações. Para isso, desenvolvemos o RV-Across, um simulador de Processamento Associativo baseado em RISC-V para teste, validação e modelagem de operações associativas. O simulador facilita o projeto de arquiteturas de processamento associativo e próximo à memória, oferecendo interfaces tanto para a construção de novas operações quanto para experimentação de alto nível. Criamos um modelo de arquitetura para o simulador com processamento associativo e o comparamos este modelo com os alternativas baseadas em CPU e multi-core. Para avaliação de desempenho, construímos um modelo de latência e energia fundamentado em dados da literatura. Aplicamos o modelo para comparar diferentes cenários, alterando características das entradas e o tamanho do Processador Associativo nas aplicações. Nossos resultados destacam a relação direta entre o tamanho dos dados e a melhoria potencial de desempenho do processamento associativo. Para a convolução 2D, o modelo de Processamento Associativo obteve um ganho relativo de 2x em latência, 2x em consumo de energia, e 13x no número de operações de load/store. Na multiplicação de matrizes, a aceleração aumenta linearmente com a dimensão das matrizes, atingindo 8x para matrizes de 200x200 bytes e superando a execução paralela em uma CPU de 8 núcleos. As vantagens do Processamento associativo evidenciadas nos resultados revelam uma alternativa para sistemas que necessitam manter um equilíbrio entre processamento e gasto energético, como os dispositivos embarcados. Finalmente, o ambiente de simulação e avaliação que construímos pode habilitar mais exploração dessa alternativa em diferentes aplicações e cenários de usoAbstract: Many works have pointed to a performance bottleneck between Processor and Memory. This bottleneck stands out when running applications, such as Machine Learning, which process large quantities of data. For these applications, the movement of data represents a significant fraction of processing time and energy consumption. The use of new multi-core architectures, accelerators, and Graphic Processing Units (GPU) can improve the performance of these applications through parallel processing. However, utilizing these architectures does not eliminate the need to move data, which are transported through different levels of a memory hierarchy to be processed. Our work explores Processing in Memory (PIM), and in particular Associative Processing, as an alternative to accelerate applications, by processing data in parallel in memory, thus allowing for better system performance and energy savings. Associative Processing provides high-performance and energy-efficient parallel computation using a Content-Addressable Memory (CAM). CAM provides parallel comparison and writing, and by augmenting a CAM with special control registers and Lookup Tables, it is possible to perform computation between vectors of data with a small and constant number of cycles per operation. In our work, we analyze the potential of Associative Processing in terms of execution time and energy consumption in different application kernels. To achieve this goal we developed RV-Across, an Associative Processing Simulator based on RISC-V for testing, validation, and modeling associative operations. The simulator eases the design of associative and near-memory processing architectures by offering interfaces to both building new operations and performing high-level experimentation. We created an architectural model for the simulator with associative processing and evaluated it by comparing it with the CPU-only and multi-core models. The simulator includes latency and energy models based on data from the literature to allow for evaluation and comparison. We apply the models to compare different scenarios, changing the input and size of the Associative Processor in the applications. Our results highlight the direct relation between data length and potential performance and energy improvement of associative processing. For 2D convolution, the Associative Processing model obtained a relative gain of 2x in latency, 2x in energy, and 13x in the number of load/store operations. For matrix multiplication, the speed-up increases linearly with input dimensions, achieving 8x for 200x200 bytes matrices and outperforming parallel execution in an 8-core CPU. The advantages of associative processing shown in the results are indicative of a real alternative for systems that need to maintain a balance between processing and energy expenditure, such as embedded devices. Finally, the simulation and evaluation environment we have built can enable further exploration of this alternative for different usage scenarios and applicationsMestradoCiência da ComputaçãoMestre em Ciência da Computação001CAPE

    RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes

    Full text link
    Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either 1) require powerful computational resources that may not be available for portable sequencers or 2) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: 1) read mapping, 2) relative abundance estimation, and 3) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides 1) 25.8x and 3.4x better average throughput and 2) an average speedup of 32.1x and 2.1x in the mapping time, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash

    New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

    Get PDF
    Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal-oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm

    Exploring New Computing Paradigms for Data-Intensive Applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    NASA Tech Briefs, May 2012

    Get PDF
    Topics covered include: An "Inefficient Fin" Non-Dimensional Parameter to Measure Gas Temperatures Efficiently; On-Wafer Measurement of a Multi-Stage MMIC Amplifier with 10 dB of Gain at 475 GHz; Software to Control and Monitor Gas Streams; Miniaturized Laser Heterodyne Radiometer (LHR) for Measurements of Greenhouse Gases in the Atmospheric Column; Anomaly Detection in Test Equipment via Sliding Mode Observers; Absolute Position of Targets Measured Through a Chamber Window Using Lidar Metrology Systems; Goldstone Solar System Radar Waveform Generator; Fast and Adaptive Lossless Onboard Hyperspectral Data Compression System; Iridium Interfacial Stack - IrIS; Downsampling Photodetector Array with Windowing; Optical Phase Recovery and Locking in a PPM Laser Communication Link; High-Speed Edge-Detecting Line Scan Smart Camera; Optical Communications Channel Combiner; Development of Thermal Infrared Sensor to Supplement Operational Land Imager; Amplitude-Stabilized Oscillator for a Capacitance-Probe Electrometer; Automated Performance Characterization of DSN System Frequency Stability Using Spacecraft Tracking Data; Histogrammatic Method for Determining Relative Abundance of Input Gas Pulse; Predictive Sea State Estimation for Automated Ride Control and Handling - PSSEARCH; LEGION: Lightweight Expandable Group of Independently Operating Nodes; Real-Time Projection to Verify Plan Success During Execution; Automated Performance Characterization of DSN System Frequency Stability Using Spacecraft Tracking Data; Web-Based Customizable Viewer for Mars Network Overflight Opportunities; Fabrication of a Cryogenic Terahertz Emitter for Bolometer Focal Plane Calibrations; Fabrication of an Absorber-Coupled MKID Detector; Graphene Transparent Conductive Electrodes for Next- Generation Microshutter Arrays; Method of Bonding Optical Elements with Near-Zero Displacement; Free-Mass and Interface Configurations of Hammering Mechanisms; Wavefront Compensation Segmented Mirror Sensing and Control; Long-Life, Lightweight, Multi-Roller Traction Drives for Planetary Vehicle Surface Exploration; Reliable Optical Pump Architecture for Highly Coherent Lasers Used in Space Metrology Applications; Electrochemical Ultracapacitors Using Graphitic Nanostacks; Improved Whole-Blood-Staining Device; Monitoring Location and Angular Orientation of a Pill; Molecular Technique to Reduce PCR Bias for Deeper Understanding of Microbial Diversity; Laser Ablation Electrodynamic Ion Funnel for In Situ Mass Spectrometry on Mars; High-Altitude MMIC Sounding Radiometer for the Global Hawk Unmanned Aerial Vehicle; PRTs and Their Bonding for Long-Duration, Extreme-Temperature Environments; Mid- and Long-IR Broadband Quantum Well Photodetector; 3D Display Using Conjugated Multiband Bandpass Filters; Real-Time, Non-Intrusive Detection of Liquid Nitrogen in Liquid Oxygen at High Pressure and High Flow; Method to Enhance the Operation of an Optical Inspection Instrument Using Spatial Light Modulators; Dual-Compartment Inflatable Suitlock; Large-Strain Transparent Magnetoactive Polymer Nanocomposites; Thermodynamic Vent System for an On-Orbit Cryogenic Reaction Control Engine; Time Distribution Using SpaceWire in the SCaN Testbed on ISS; and Techniques for Solution- Assisted Optical Contacting
    corecore