Search CORE

4,333 research outputs found

Massively Parallel Computing and the Search for Jets and Black Holes at the LHC

Author: Halyo V.
LeGresley P.
Lujan P.
Publication venue: 'Elsevier BV'
Publication date: 24/09/2013
Field of study

Massively parallel computing at the LHC could be the next leap necessary to reach an era of new discoveries at the LHC after the Higgs discovery. Scientific computing is a critical component of the LHC experiment, including operation, trigger, LHC computing GRID, simulation, and analysis. One way to improve the physics reach of the LHC is to take advantage of the flexibility of the trigger system by integrating coprocessors based on Graphics Processing Units (GPUs) or the Many Integrated Core (MIC) architecture into its server farm. This cutting edge technology provides not only the means to accelerate existing algorithms, but also the opportunity to develop new algorithms that select events in the trigger that previously would have evaded detection. In this article we describe new algorithms that would allow to select in the trigger new topological signatures that include non-prompt jet and black hole--like objects in the silicon tracker.Comment: 15 pages, 11 figures, submitted to NIM

arXiv.org e-Print Archive

CERN Document Server

Scalable collision detection for distributed virtual environments

Author: Storey Kier
Publication venue: Newcastle University
Publication date: 01/01/2007
Field of study

PhD ThesisDistributed Virtual Environments (DVEs) provide a mechanism whereby dispersed users can interact with one-another within a shared \'irtual world. DVEs commonly allow users to interact with one-another in ways analogous to the real-world, e.g. mimicking Newtonian physics. A scalable DVE should enable large numbers of users to participate simultaneously, regardless of the In geographical location and hardware configurations of individual users. addition, these users should perceive a mutually-consistent virtual world in which each user perceives a consistent series of events in real-time. Collision detection and response is a fundamental requirement of most virtual environments and simulations. It is a computationally-expensive operation which must be perfonned at frequent intervals in all virtual environments which simulate the motion of solid objects. Collision detection has received large amounts of research interest and as a result a number of efficient collision detection algorithms have been proposed. However, these collision detection approaches are designed to detect collisions efficiently in simulations run on a single machine and are not capable of overcoming problems associated with scalability and consistency, which are of paramount importance in DVEs. This thesis presents a new collision detection approach, tenned distributed collision detection, which provides high-levels of scalability, consistency and responsiveness. This thesis presents the algorithms and theory which underpin the distributed collision detection approach and provides experimental results demonstrating its scalability and responsiveness

Newcastle University eTheses

A Motion Planning Processor on Reconfigurable Hardware

Author: Atay Nuzhet
Bayazit Burchan
Publication venue: Washington University Open Scholarship
Publication date: 23/09/2005
Field of study

Motion planning algorithms enable us to ﬁnd feasible paths for moving objects. These algorithms utilize feasibility checks to differentiate valid paths from invalid ones. Unfortunately, the computationally expensive nature of such checks reduces the effectiveness of motion planning algorithms. However, by using hardware acceleration to speed up the feasibility checks, we can greatly enhance the performance of the motion planning algorithms. Of course, such acceleration is not limited to feasibility checks; other components of motion planning algorithms can also be accelerated using specially designed hardware. A Field Programmable Gate Array (FPGA) is a great platform to support such an acceleration. An FPGA is a collection of digital gates which can be reprogrammed at run time, i.e., it can be used as a CPU that reconﬁgures itself for a given task. In this paper, we study the feasibility of an FPGA based motion planning processor and evaluate its performance. In order to leverage its highly parallel nature and its modular structure, our processor utilizes the probabilistic roadmap method at its core. The modularity enables us to replace the feasibility criteria with other ones. The reconﬁgurability lets us run our processor in different roles, such as a motion planning co-processor, an autonomous motion planning processor or dedicated collision detection chip. Our experiments show that such a processor is not only feasible but also can greatly increase the performance of current algorithms

Washington University St. Louis: Open Scholarship

FPGA-based High-Performance Collision Detection: An Enabling Technique for Image-Guided Robotic Surgery

Author: Abbott
Abolhassani
Akenine-Moller
Altomonte
Avril
Barequet
Basdogan
Bethea
Bowyer
Brost
Brown
Carter
Che
Chen
Chow
Collange
Cope
Courtecuisse
Fons
Gibson
Govindaraju
Huebner
Kestur
Kim
Kwok
Lee
Li
Liu
Maier-Hein
Maier-Hein
Mainzer
Meijden
Monmasson
Njiki
Okamura
Okamura
Pabst
Papadonikolakis
Peterlik
Redon
Sano
Scharstein
Schostek
Siciliano
Smach
Sridhar
Stoyanov
Stoyanov
Stoyanov
Stoyanov
Sudha
Vachhani
Vadakkepat
Wang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

Collision detection, which refers to the computational problem of finding the relative placement or con-figuration of two or more objects, is an essential component of many applications in computer graphics and robotics. In image-guided robotic surgery, real-time collision detection is critical for preserving healthy anatomical structures during the surgical procedure. However, the computational complexity of the problem usually results in algorithms that operate at low speed. In this paper, we present a fast and accurate algorithm for collision detection between Oriented-Bounding-Boxes (OBBs) that is suitable for real-time implementation. Our proposed Sweep and Prune algorithm can perform a preliminary filtering to reduce the number of objects that need to be tested by the classical Separating Axis Test algorithm, while the OBB pairs of interest are preserved. These OBB pairs are re-checked by the Separating Axis Test algorithm to obtain accurate overlapping status between them. To accelerate the execution, our Sweep and Prune algorithm is tailor-made for the proposed method. Meanwhile, a high performance scalable hardware architecture is proposed by analyzing the intrinsic parallelism of our algorithm, and is implemented on FPGA platform. Results show that our hardware design on the FPGA platform can achieve around 8X higher running speed than the software design on a CPU platform. As a result, the proposed algorithm can achieve a collision frame rate of 1 KHz, and fulfill the requirement for the medical surgery scenario of Robot Assisted Laparoscopy.published_or_final_versio

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

UCL Discovery

HKU Scholars Hub

Reducing redundancy of real time computer graphics in mobile systems

Author: De Lucas Enrique
Publication venue: Universitat Politècnica de Catalunya
Publication date: 20/04/2018
Field of study

The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer técnicas efectivas y originales para eliminar computaciones inútiles que aparecen en aplicaciones gráficas, con especial énfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energética de los sistemas CPU/GPU no es solo clave para alargar la vida de la batería, sino también incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energía en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energía del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminación de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de él no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una técnica arquitectónica original para GPUs móviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y así minimizar el overshading, y por lo tanto reducir el tiempo de ejecución y el consumo de energía. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a través de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mínimos en energía. Solo requiere añadir una pequeña unidad hardware para capturar la información de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducción de energía en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la Detección de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos móviles y la tendencia es hacia escenas más complejas y realistas con simulaciones físicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de físicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en términos de consumo energético. Proponemos Render Based Collision Detection (RBCD), una técnica energéticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecución es reducido casi tres órdenes de magnitud (600x speedup), porque la mayoría de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es así necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulación de físicas está en el camino crítico. Sin embargo, la ventaja más importante de nuestra técnica es el enorme ahorro de energía que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyéndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energía consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Deep Space Network information system architecture study

Author: Atkinson D. J.
Beswick C. A.
Cooper L. P.
Crowe R. A.
Jenkins J. S.
Markley R. W.
Masline R. C.
Stoloff M. J.
Tausworthe R. C.
Thomas J. L.
Publication venue
Publication date: 15/05/1992
Field of study

The purpose of this article is to describe an architecture for the Deep Space Network (DSN) information system in the years 2000-2010 and to provide guidelines for its evolution during the 1990s. The study scope is defined to be from the front-end areas at the antennas to the end users (spacecraft teams, principal investigators, archival storage systems, and non-NASA partners). The architectural vision provides guidance for major DSN implementation efforts during the next decade. A strong motivation for the study is an expected dramatic improvement in information-systems technologies, such as the following: computer processing, automation technology (including knowledge-based systems), networking and data transport, software and hardware engineering, and human-interface technology. The proposed Ground Information System has the following major features: unified architecture from the front-end area to the end user; open-systems standards to achieve interoperability; DSN production of level 0 data; delivery of level 0 data from the Deep Space Communications Complex, if desired; dedicated telemetry processors for each receiver; security against unauthorized access and errors; and highly automated monitor and control

NASA Technical Reports Server

NASA Automated Rendezvous and Capture Review. Executive summary

Author
Publication venue
Publication date
Field of study

In support of the Cargo Transfer Vehicle (CTV) Definition Studies in FY-92, the Advanced Program Development division of the Office of Space Flight at NASA Headquarters conducted an evaluation and review of the United States capabilities and state-of-the-art in Automated Rendezvous and Capture (AR&C). This review was held in Williamsburg, Virginia on 19-21 Nov. 1991 and included over 120 attendees from U.S. government organizations, industries, and universities. One hundred abstracts were submitted to the organizing committee for consideration. Forty-two were selected for presentation. The review was structured to include five technical sessions. Forty-two papers addressed topics in the five categories below: (1) hardware systems and components; (2) software systems; (3) integrated systems; (4) operations; and (5) supporting infrastructure

NASA Technical Reports Server