56 research outputs found

    MASA-SSE : comparação de sequências biológicas utilizando instruções vetoriais

    Get PDF
    Monografia (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2015.A comparação de sequências biológicas é uma das operações mais básicas e importantes da Bioinformática. Os métodos exatos de comparação de sequências possuem complexidade quadrática de tempo e por isso soluções paralelas são utilizadas para acelerar a produção de resultados. O framework MASA [3] é uma solução paralela flexível e customizável que permite o alinhamento de sequências biológicas em diferentes hardwares e softwares. Ele foi inicialmente pensado para execução paralela da comparação de sequências em GPUs (Graphics Processing Units), porém, atualmente existem duas soluções MASA para CPU: MASA-CPU e MASA-OpenMP. Essas soluções não utilizam instruções vetoriais, deixando de explorar um grande potencial para paralelismo. O presente trabalho de graduação propõe e avalia o MASA-SSE, uma solução em CPU que utiliza as instruções vetoriais SSE da Intel, implementando o algoritmo de Farrar [6], que é considerado o estado da arte em comparação de sequências biológicas com instruções vetoriais. Os resultados obtidos a partir da comparação de várias sequências reais de DNA em duas máquinas distintas mostram que o MASA-SSE, executando em uma thread e, utilizando instruções vetoriais, possui desempenho superior ao do MASA-OpenMP com quatro threads. _____________________________________________________________________________ ABSTRACTBiological sequence comparison is one of the most basic and important operations in Bioinformatics. The exact methods that compare two biological sequences have quadratic time complexity and, for this reason, parallel solutions are often used to accelerate the execution. The MASA framework [3] is a flexible and customizable parallel solution for biological sequence comparison which was initially designed for GPU (Graphics Processing Unit) execution but nowadays integrates two CPU solutions: MASA-CPU and MASA-OpenMP. These CPU solutions do not use vector instructions and thus miss the opportunity of exploring a high potential for parallelism. This graduation project proposes and evaluates MASA-SSE, a CPU solution that uses the SSE vector instructions from Intel and implements the Farrar algorithm [6], which is the state-of-the-art algorithm for biological sequence comparison with vector instructions. Experimental results obtained with the comparison of real DNA sequences in two different machines show that MASA-SSE, executing with one thread and vector instructions, outperforms MASA-OpemMP, execution with four threads

    An Evaluation of Java for Numerical Computing

    Get PDF

    Improving data prefetching efficacy in multimedia applications

    Full text link
    The workload of multimedia applications has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In many cases, standard cache memory organization achieves poorer performance when used for multimedia. A widely-explored approach to improve cache performance is hardware prefetching, which allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches are unable to exploit the potential improvement in performance, since they are not tailored to multimedia locality. In this paper we propose novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. Experimental results are reported for a suite of multimedia image processing programs including MPEG-2 decoding and encoding, convolution, thresholding, and edge chain coding

    The CSI multimedia architecture

    Full text link

    SIMD-Swift: Improving Performance of Swift Fault Detection

    Get PDF
    The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program\'s operations on two copies of the same data. This idea is based on the observation that SIMD is ubiquitous in modern CPUs and is usually an underutilized resource. It allows us to detect bit-flips in hardware by a simple comparison of two copies under the assumption that only one copy is affected by a fault. We implemented this idea as a source-to-source compiler which performs hardening of a program on the source code level. The evaluation of our several implementations shows that it is beneficial to use it for applications that are dominated by arithmetic or logical operations, but those that have more control-flow or memory operations are actually performing better with the regular instruction replication. For example, we managed to get only 15% performance overhead on Fast Fourier Transformation benchmark, which is dominated by arithmetic instructions, but memory-access-dominated Dijkstra algorithm has shown a high overhead of 200%

    A novel algorithm and hardware architecture for fast video-based shape reconstruction of space debris

    Get PDF
    In order to enable the non-cooperative rendezvous, capture, and removal of large space debris, automatic recognition of the target is needed. Video-based techniques are the most suitable in the strict context of space missions, where low-energy consumption is fundamental, and sensors should be passive in order to avoid any possible damage to external objects as well as to the chaser satellite. This paper presents a novel fast shape-from-shading (SfS) algorithm and a field-programmable gate array (FPGA)-based system hardware architecture for video-based shape reconstruction of space debris. The FPGA-based architecture, equipped with a pair of cameras, includes a fast image pre-processing module, a core implementing a feature-based stereo-vision approach, and a processor that executes the novel SfS algorithm. Experimental results show the limited amount of logic resources needed to implement the proposed architecture, and the timing improvements with respect to other state-of-the-art SfS methods. The remaining resources available in the FPGA device can be exploited to integrate other vision-based techniques to improve the comprehension of debris model, allowing a fast evaluation of associated kinematics in order to select the most appropriate approach for capture of the target space debris

    Applying the Engineering Statechart Formalism to the evaluation of soft real-time in operating systems : a use case tailored modeling and analysis technique

    Get PDF
    Multimedia applications that have emerged in recent years impose unique requirements on an underlying general purpose operating system (GPOS). The suitability of a GPOS for multimedia processing is judged by its soft real-time capabilities. To date, the question of how these capabilities can be assessed has scarcely been addressed: this is a gap in GPOS research. By answering questions on the impacts of the Interrupt Handling Facility (IHF) on the overall soft real-time capabilities of a GPOS, this thesis contributes to the filling of this blank space. The Engineering Statechart Formalism (ESF), a use case tailored formal method of modeling real-world OS, is syntactically and semantically defined. Models of the IHF of selected real-world operating systems are then created by means of this technique. As no appropriate real-time concept fitting the goals of this thesis as yet exists, a suitable definition is constructed. By projecting this system-wide idea to the interrupt subsystem, specific indicators for this subsystem are erived. These indicators are then evaluated by applying formal techniques such as graph-based analysis and temporal logic model checking to the ESF models. Finally, the assertions derived from this evaluation are interpreted with respect to their impacts on real-time multimedia processing in different general purpose operating systems.Multimedia-Anwendungen haben in den letzten Jahren weite Verbreitung erfahren. Solche Anwendungen stellen besondere Anforderungen an das Betriebssystem (BS), auf dem sie ausgeführt werden. Insbesondere Echtzeitfähigkeiten des Betriebssystems sind von Bedeutung, wenn es um seine Eignung für Multimedia-Verarbeitung geht. Bis heute wurde die Frage, wie sich diese Fähigkeiten konkret innerhalb eines BS manifestieren, nur unzureichend untersucht. Die vorliegende Arbeit leistet einen Beitrag zur Füllung dieser Lücke in der BS-Forschung. Die Effekte des Subsystems zur Unterbrechungsbehandlung in BS auf die Echtzeitfähigkeit des Gesamtsystems werden detailliert auf Basis von Modellen dieses Subsystems in verschiedenen BS analysiert. Um eine formale Auswertung zu erlauben, wird eine auf den Anwendungsfall zugeschnittene formale Methode zur BS-Modellierung verwendet. Die spezifizierte Syntax und Semantik dieses Engineering Statechart Formalism (ESF) basieren auf dem klassischen Statechart-Formalismus. Da bislang kein geeigneter Echtzeit-Begriff existiert, wird eine konsistente Definition hergeleitet. Durch die Abbildung dieser sich auf das Gesamtsystem beziehenden Eigenschaft auf die Unterbrechungsbehandlung werden spezifische Indikatoren für dieses Subsystem hergeleitet. Die Ausprägungen dieser Indikatoren für die verschiedenen untersuchten Betriebssyteme werden anhand formaler Methoden wie graphbasierter Analyse und Temporal Logic Model Checking ausgewertet. Die Interpretation der Untersuchungsergebnisse liefert Aussagen über die Effekte der Implementierung der Unterbrechungsbehandlung auf die Echtzeitfähigkeit der untersuchten Betriebssysteme bei der Verarbeitung von multimedialen Daten

    Composable accelerator-rich microprocessor enhanced for adaptivity and longevity

    Get PDF
    Abstract Accelerator-rich platforms demonstrate orders of magnitude improvement in performance and energy efficiency over software, yet they lack adaptivity to new algorithms and can see low accelerator utilization. To address these issues we propose CAMEL: Composable Accelerator-rich Microprocessor Enhanced for Longevity. CAMEL features programmable fabric (PF) to extend the use of ASIC composable accelerators in supporting algorithms that are beyond the scope of the baseline platform. Using a combination of hardware extensions and compiler support, we demonstrate on average 11.6X performance improvement and 13.9X energy savings across benchmarks that deviate from the original domain for our baseline platform
    corecore