3,791 research outputs found

    Research and Education in Computational Science and Engineering

    Get PDF
    Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie

    Improved Compressive Sensing Of Natural Scenes Using Localized Random Sampling

    Get PDF
    Compressive sensing (CS) theory demonstrates that by using uniformly-random sampling, rather than uniformly-spaced sampling, higher quality image reconstructions are often achievable. Considering that the structure of sampling protocols has such a profound impact on the quality of image reconstructions, we formulate a new sampling scheme motivated by physiological receptive field structure, localized random sampling, which yields significantly improved CS image reconstructions. For each set of localized image measurements, our sampling method first randomly selects an image pixel and then measures its nearby pixels with probability depending on their distance from the initially selected pixel. We compare the uniformly-random and localized random sampling methods over a large space of sampling parameters, and show that, for the optimal parameter choices, higher quality image reconstructions can be consistently obtained by using localized random sampling. In addition, we argue that the localized random CS optimal parameter choice is stable with respect to diverse natural images, and scales with the number of samples used for reconstruction. We expect that the localized random sampling protocol helps to explain the evolutionarily advantageous nature of receptive field structure in visual systems and suggests several future research areas in CS theory and its application to brain imaging

    Distributed approaches for coverage missions with multiple heterogeneous UAVs for coastal areas.

    Get PDF
    This Thesis focuses on a high-level framework proposal for heterogeneous aerial, fixed wing teams of robots, which operate in complex coastal areas. Recent advances in the computational capabilities of modern processors along with the decrement of small scale aerial platform manufacturing costs, have given researchers the opportunity to propose efficient and low-cost solutions to a wide variety of problems. Regarding marine sciences and more generally coastal or sea operations, the use of aerial robots brings forth a number of advantages, including information redundancy and operator safety. This Thesis initially deals with complex coastal decomposition in relation with a vehicles’ on-board sensor. This decomposition decreases the computational complexity of planning a flight path, while respecting various aerial or ground restrictions. The sensor-based area decomposition also facilitates a team-wide heterogeneous solution for any team of aerial vehicles. Then, it proposes a novel algorithmic approach of partitioning any given complex area, for an arbitrary number of Unmanned Aerial Vehicles (UAV). This partitioning schema, respects the relative flight autonomy capabilities of the robots, providing them a corresponding region of interest. In addition, a set of algorithms is proposed for obtaining coverage waypoint plans for those areas. These algorithms are designed to afford the non-holonomic nature of fixed-wing vehicles and the restrictions their dynamics impose. Moreover, this Thesis also proposes a variation of a well-known path tracking algorithm, in order to further reduce the flight error of waypoint following, by introducing intermediate waypoints and providing an autopilot parametrisation. Finally, a marine studies test case of buoy information extraction is presented, demonstrating in that manner the flexibility and modular nature of the proposed framework.Esta tesis se centra en la propuesta de un marco de alto nivel para equipos heterogéneos de robots de ala fija que operan en áreas costeras complejas. Los avances recientes en las capacidades computacionales de los procesadores modernos, junto con la disminución de los costes de fabricación de plataformas aéreas a pequeña escala, han brindado a los investigadores la oportunidad de proponer soluciones eficientes y de bajo coste para enfrentar un amplio abanico de cuestiones. Con respecto a las ciencias marinas y, en términos más generales, a las operaciones costeras o marítimas, el uso de robots aéreos conlleva una serie de ventajas, incluidas la redundancia de la información y la seguridad del operador. Esta tesis trata inicialmente con la descomposición de áreas costeras complejas en relación con el sensor a bordo de un vehículo. Esta descomposición disminuye la complejidad computacional de la planificación de una trayectoria de vuelo, al tiempo que respeta varias restricciones aéreas o terrestres. La descomposición del área basada en sensores también facilita una solución heterogénea para todo el equipo para cualquier equipo de vehículos aéreos. Luego, propone un novedoso enfoque algorítmico de partición de cualquier área compleja dada, para un número arbitrario de vehículos aéreos no tripulados (UAV). Este esquema de partición respeta las capacidades relativas de autonomía de vuelo de los robots, proporcionándoles una región de interés correspondiente. Además, se propone un conjunto de algoritmos para obtener planes de puntos de cobertura para esas áreas. Estos algoritmos están diseñados teniendo en cuenta la naturaleza no holonómica de los vehículos de ala fija y las restricciones que impone su dinámica. En ese sentido, esta Tesis también ofrece una variación de un algoritmo de seguimiento de rutas bien conocido, con el fin de reducir aún más el error de vuelo del siguiente punto de recorrido, introduciendo puntos intermedios y proporcionando una parametrización del piloto automático. Finalmente, se presenta un caso de prueba de estudios marinos de extracción de información de boyas, que demuestra de esa manera la flexibilidad y el carácter modular del marco propuesto

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    Domain-specific Architectures for Data-intensive Applications

    Full text link
    Graphs' versatile ability to represent diverse relationships, make them effective for a wide range of applications. For instance, search engines use graph-based applications to provide high-quality search results. Medical centers use them to aid in patient diagnosis. Most recently, graphs are also being employed to support the management of viral pandemics. Looking forward, they are showing promise of being critical in unlocking several other opportunities, including combating the spread of fake content in social networks, detecting and preventing fraudulent online transactions in a timely fashion, and in ensuring collision avoidance in autonomous vehicle navigation, to name a few. Unfortunately, all these applications require more computational power than what can be provided by conventional computing systems. The key reason is that graph applications present large working sets that fail to fit in the small on-chip storage of existing computing systems, while at the same time they access data in seemingly unpredictable patterns, thus cannot draw benefit from traditional on-chip storage. In this dissertation, we set out to address the performance limitations of existing computing systems so to enable emerging graph applications like those described above. To achieve this, we identified three key strategies: 1) specializing memory architecture, 2) processing data near its storage, and 3) message coalescing in the network. Based on these strategies, this dissertation develops several solutions: OMEGA, which employs specialized on-chip storage units, with co-located specialized compute engines to accelerate the computation; MessageFusion, which coalesces messages in the interconnect; and Centaur, providing an architecture that optimizes the processing of infrequently-accessed data. Overall, these solutions provide 2x in performance improvements, with negligible hardware overheads, across a wide range of applications. Finally, we demonstrate the applicability of our strategies to other data-intensive domains, by exploring an acceleration solution for MapReduce applications, which achieves a 4x performance speedup, also with negligible area and power overheads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163186/1/abrahad_1.pd

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Virtual Target Selection for a Multiple-Pursuer Multiple-Evader Scenario

    Full text link
    This paper considers an M-pursuer N-evader scenario involving virtual targets. The virtual targets serve as an intermediary target for the pursuers, allowing the pursuers to delay their final assignment to the evaders. However, upon reaching the virtual target, the pursuers must decide which evader to capture. It is assumed that there are more pursuers than evaders and that the pursuers are faster than the evaders. The objective is two-part: first, assign each pursuer to a virtual target and evader such that the pursuer team's energy is minimized, and second, choose the virtual targets' locations for this minimization problem. The approach taken is to consider the Apollonius geometry between each pursuer's virtual target location and each evader. Using the constructed Apollonius circles, the pursuer's travel distance and maneuver at a virtual target are obtained. These metrics serve as a gauge for the total energy required to capture a particular evader and are used to solve the joint virtual target selection and pursuer-evader assignment problem. This paper provides a mathematical definition of this problem, the solution approach taken, and an example.Comment: AIAA SciTech 2024 Preprin
    corecore