479 research outputs found
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Synergizing Roadway Infrastructure Investment with Digital Infrastructure for Infrastructure-Based Connected Vehicle Applications: Review of Current Status and Future Directions
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The safety, mobility, environmental and economic benefits of Connected and Autonomous Vehicles (CAVs) are potentially dramatic. However, realization of these benefits largely hinges on the timely upgrading of the existing transportation system. CAVs must be enabled to send and receive data to and from other vehicles and drivers (V2V communication) and to and from infrastructure (V2I communication). Further, infrastructure and the transportation agencies that manage it must be able to collect, process, distribute and archive these data quickly, reliably, and securely. This paper focuses on current digital roadway infrastructure initiatives and highlights the importance of including digital infrastructure investment alongside more traditional infrastructure investment to keep up with the auto industry's push towards this real time communication and data processing capability. Agencies responsible for transportation infrastructure construction and management must collaborate, establishing national and international platforms to guide the planning, deployment and management of digital infrastructure in their jurisdictions. This will help create standardized interoperable national and international systems so that CAV technology is not deployed in a haphazard and uncoordinated manner
A Comprehensive Survey on Distributed Training of Graph Neural Networks
Graph neural networks (GNNs) have been demonstrated to be a powerful
algorithmic model in broad application fields for their effectiveness in
learning over graphs. To scale GNN training up for large-scale and ever-growing
graphs, the most promising solution is distributed training which distributes
the workload of training across multiple computing nodes. At present, the
volume of related research on distributed GNN training is exceptionally vast,
accompanied by an extraordinarily rapid pace of publication. Moreover, the
approaches reported in these studies exhibit significant divergence. This
situation poses a considerable challenge for newcomers, hindering their ability
to grasp a comprehensive understanding of the workflows, computational
patterns, communication strategies, and optimization techniques employed in
distributed GNN training. As a result, there is a pressing need for a survey to
provide correct recognition, analysis, and comparisons in this field. In this
paper, we provide a comprehensive survey of distributed GNN training by
investigating various optimization techniques used in distributed GNN training.
First, distributed GNN training is classified into several categories according
to their workflows. In addition, their computational patterns and communication
patterns, as well as the optimization techniques proposed by recent work are
introduced. Second, the software frameworks and hardware platforms of
distributed GNN training are also introduced for a deeper understanding. Third,
distributed GNN training is compared with distributed training of deep neural
networks, emphasizing the uniqueness of distributed GNN training. Finally,
interesting issues and opportunities in this field are discussed.Comment: To Appear in Proceedings of the IEE
Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz,
Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft
Weiterentwicklung analytischer Datenbanksysteme
This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens
Multi-Factor Authentication: A Survey
Today, digitalization decisively penetrates all the sides of the modern society. One of the key enablers to maintain this process secure is authentication. It covers many different areas of a hyper-connected world, including online payments, communications, access right management, etc. This work sheds light on the evolution of authentication systems towards Multi-Factor Authentication (MFA) starting from Single-Factor Authentication (SFA) and through Two-Factor Authentication (2FA). Particularly, MFA is expected to be utilized for human-to-everything interactions by enabling fast, user-friendly, and reliable authentication when accessing a service. This paper surveys the already available and emerging sensors (factor providers) that allow for authenticating a user with the system directly or by involving the cloud. The corresponding challenges from the user as well as the service provider perspective are also reviewed. The MFA system based on reversed Lagrange polynomial within Shamir’s Secret Sharing (SSS) scheme is further proposed to enable more flexible authentication. This solution covers the cases of authenticating the user even if some of the factors are mismatched or absent. Our framework allows for qualifying the missing factors by authenticating the user without disclosing sensitive biometric data to the verification entity. Finally, a vision of the future trends in MFA is discussed.Peer reviewe
Review of Recent Trends
This work was partially supported by the European Regional Development Fund (FEDER), through the Regional Operational Programme of Centre (CENTRO 2020) of the Portugal 2020 framework, through projects SOCA (CENTRO-01-0145-FEDER-000010) and ORCIP (CENTRO-01-0145-FEDER-022141). Fernando P. Guiomar acknowledges a fellowship from “la Caixa” Foundation (ID100010434), code LCF/BQ/PR20/11770015. Houda Harkat acknowledges the financial support of the Programmatic Financing of the CTS R&D Unit (UIDP/00066/2020).MIMO-OFDM is a key technology and a strong candidate for 5G telecommunication systems. In the literature, there is no convenient survey study that rounds up all the necessary points to be investigated concerning such systems. The current deeper review paper inspects and interprets the state of the art and addresses several research axes related to MIMO-OFDM systems. Two topics have received special attention: MIMO waveforms and MIMO-OFDM channel estimation. The existing MIMO hardware and software innovations, in addition to the MIMO-OFDM equalization techniques, are discussed concisely. In the literature, only a few authors have discussed the MIMO channel estimation and modeling problems for a variety of MIMO systems. However, to the best of our knowledge, there has been until now no review paper specifically discussing the recent works concerning channel estimation and the equalization process for MIMO-OFDM systems. Hence, the current work focuses on analyzing the recently used algorithms in the field, which could be a rich reference for researchers. Moreover, some research perspectives are identified.publishersversionpublishe
A Systematic Review of the State of Cyber-Security in Water Systems
Critical infrastructure systems are evolving from isolated bespoke systems to those that use general-purpose computing hosts, IoT sensors, edge computing, wireless networks and artificial intelligence. Although this move improves sensing and control capacity and gives better integration with business requirements, it also increases the scope for attack from malicious entities that intend to conduct industrial espionage and sabotage against these systems. In this paper, we review the state of the cyber-security research that is focused on improving the security of the water supply and wastewater collection and treatment systems that form part of the critical national infrastructure. We cover the publication statistics of the research in this area, the aspects of security being addressed, and future work required to achieve better cyber-security for water systems
- …