Search CORE

263 research outputs found

Efficient openMP over sequentially consistent distributed shared memory systems

Author: Costa Prats Juan José
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

Nowadays clusters are one of the most used platforms in High Performance Computing and most programmers use the Message Passing Interface (MPI) library to program their applications in these distributed platforms getting their maximum performance, although it is a complex task. On the other side, OpenMP has been established as the de facto standard to program applications on shared memory platforms because it is easy to use and obtains good performance without too much effort. So, could it be possible to join both worlds? Could programmers use the easiness of OpenMP in distributed platforms? A lot of researchers think so. And one of the developed ideas is the distributed shared memory (DSM), a software layer on top of a distributed platform giving an abstract shared memory view to the applications. Even though it seems a good solution it also has some inconveniences. The memory coherence between the nodes in the platform is difficult to maintain (complex management, scalability issues, high overhead and others) and the latency of the remote-memory accesses which can be orders of magnitude greater than on a shared bus due to the interconnection network. Therefore this research improves the performance of OpenMP applications being executed on distributed memory platforms using a DSM with sequential consistency evaluating thoroughly the results from the NAS parallel benchmarks. The vast majority of designed DSMs use a relaxed consistency model because it avoids some major problems in the area. In contrast, we use a sequential consistency model because we think that showing these potential problems that otherwise are hidden may allow the finding of some solutions and, therefore, apply them to both models. The main idea behind this work is that both runtimes, the OpenMP and the DSM layer, should cooperate to achieve good performance, otherwise they interfere one each other trashing the final performance of applications. We develop three different contributions to improve the performance of these applications: (a) a technique to avoid false sharing at runtime, (b) a technique to mimic the MPI behaviour, where produced data is forwarded to their consumers and, finally, (c) a mechanism to avoid the network congestion due to the DSM coherence messages. The NAS Parallel Benchmarks are used to test the contributions. The results of this work shows that the false-sharing problem is a relative problem depending on each application. Another result is the importance to move the data flow outside of the critical path and to use techniques that forwards data as early as possible, similar to MPI, benefits the final application performance. Additionally, this data movement is usually concentrated at single points and affects the application performance due to the limited bandwidth of the network. Therefore it is necessary to provide mechanisms that allows the distribution of this data through the computation time using an otherwise idle network. Finally, results shows that the proposed contributions improve the performance of OpenMP applications on this kind of environments

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Impacts of DEM Type and Resolution on Deep Learning-Based Flood Inundation Mapping

Author: Alipour Reza Saleh
Burian Steven J.
Esmaeilzadeh Mostafa
Fereshtehpour Mohammad
Publication venue
Publication date: 23/09/2023
Field of study

This paper presents a comprehensive study focusing on the influence of DEM type and spatial resolution on the accuracy of flood inundation prediction. The research employs a state-of-the-art deep learning method using a 1D convolutional neural network (CNN). The CNN-based method employs training input data in the form of synthetic hydrographs, along with target data represented by water depth obtained utilizing a 2D hydrodynamic model, LISFLOOD-FP. The performance of the trained CNN models is then evaluated and compared with the observed flood event. This study examines the use of digital surface models (DSMs) and digital terrain models (DTMs) derived from a LIDAR-based 1m DTM, with resolutions ranging from 15 to 30 meters. The proposed methodology is implemented and evaluated in a well-established benchmark location in Carlisle, UK. The paper also discusses the applicability of the methodology to address the challenges encountered in a data-scarce flood-prone region, exemplified by Pakistan. The study found that DTM performs better than DSM at lower resolutions. Using a 30m DTM improved flood depth prediction accuracy by about 21% during the peak stage. Increasing the resolution to 15m increased RMSE and overlap index by at least 50% and 20% across all flood phases. The study demonstrates that while coarser resolution may impact the accuracy of the CNN model, it remains a viable option for rapid flood prediction compared to hydrodynamic modeling approaches

arXiv.org e-Print Archive

Multigrain shared memory

Author: Yeung Donald, 1968-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 197-203).by Donald Yeung.Ph.D

CiteSeerX

DSpace@MIT

Hardware-conscious query processing for the many-core era

Author: Pohl Constantin
Publication venue
Publication date: 01/01/2020
Field of study

Die optimale Nutzung von moderner Hardware zur Beschleunigung von Datenbank-Anfragen ist keine triviale Aufgabe. Viele DBMS als auch DSMS der letzten Jahrzehnte basieren auf Sachverhalten, die heute kaum noch Gültigkeit besitzen. Ein Beispiel hierfür sind heutige Server-Systeme, deren Hauptspeichergröße im Bereich mehrerer Terabytes liegen kann und somit den Weg für Hauptspeicherdatenbanken geebnet haben. Einer der größeren letzten Hardware Trends geht hin zu Prozessoren mit einer hohen Anzahl von Kernen, den sogenannten Manycore CPUs. Diese erlauben hohe Parallelitätsgrade für Programme durch Multithreading sowie Vektorisierung (SIMD), was die Anforderungen an die Speicher-Bandbreite allerdings deutlich erhöht. Der sogenannte High-Bandwidth Memory (HBM) versucht diese Lücke zu schließen, kann aber ebenso wie Many-core CPUs jeglichen Performance-Vorteil negieren, wenn dieser leichtfertig eingesetzt wird. Diese Arbeit stellt die Many-core CPU-Architektur zusammen mit HBM vor, um Datenbank sowie Datenstrom-Anfragen zu beschleunigen. Es wird gezeigt, dass ein hardwarenahes Kostenmodell zusammen mit einem Kalibrierungsansatz die Performance verschiedener Anfrageoperatoren verlässlich vorhersagen kann. Dies ermöglicht sowohl eine adaptive Partitionierungs und Merge-Strategie für die Parallelisierung von Datenstrom-Anfragen als auch eine ideale Konfiguration von Join-Operationen auf einem DBMS. Nichtsdestotrotz ist nicht jede Operation und Anwendung für die Nutzung einer Many-core CPU und HBM geeignet. Datenstrom-Anfragen sind oft auch an niedrige Latenz und schnelle Antwortzeiten gebunden, welche von höherer Speicher-Bandbreite kaum profitieren können. Hinzu kommen üblicherweise niedrigere Taktraten durch die hohe Kernzahl der CPUs, sowie Nachteile für geteilte Datenstrukturen, wie das Herstellen von Cache-Kohärenz und das Synchronisieren von parallelen Thread-Zugriffen. Basierend auf den Ergebnissen dieser Arbeit lässt sich ableiten, welche parallelen Datenstrukturen sich für die Verwendung von HBM besonders eignen. Des Weiteren werden verschiedene Techniken zur Parallelisierung und Synchronisierung von Datenstrukturen vorgestellt, deren Effizienz anhand eines Mehrwege-Datenstrom-Joins demonstriert wird.Exploiting the opportunities given by modern hardware for accelerating query processing speed is no trivial task. Many DBMS and also DSMS from past decades are based on fundamentals that have changed over time, e.g., servers of today with terabytes of main memory capacity allow complete avoidance of spilling data to disk, which has prepared the ground some time ago for main memory databases. One of the recent trends in hardware are many-core processors with hundreds of logical cores on a single CPU, providing an intense degree of parallelism through multithreading as well as vectorized instructions (SIMD). Their demand for memory bandwidth has led to the further development of high-bandwidth memory (HBM) to overcome the memory wall. However, many-core CPUs as well as HBM have many pitfalls that can nullify any performance gain with ease. In this work, we explore the many-core architecture along with HBM for database and data stream query processing. We demonstrate that a hardware-conscious cost model with a calibration approach allows reliable performance prediction of various query operations. Based on that information, we can, therefore, come to an adaptive partitioning and merging strategy for stream query parallelization as well as finding an ideal configuration of parameters for one of the most common tasks in the history of DBMS, join processing. However, not all operations and applications can exploit a many-core processor or HBM, though. Stream queries optimized for low latency and quick individual responses usually do not benefit well from more bandwidth and suffer from penalties like low clock frequencies of many-core CPUs as well. Shared data structures between cores also lead to problems with cache coherence as well as high contention. Based on our insights, we give a rule of thumb which data structures are suitable to parallelize with focus on HBM usage. In addition, different parallelization schemas and synchronization techniques are evaluated, based on the example of a multiway stream join operation

Digitale Bibliothek Thüringen

BALANCING PRIVACY, PRECISION AND PERFORMANCE IN DISTRIBUTED SYSTEMS

Author: Zaki Marian
Publication venue
Publication date: 16/01/2020
Field of study

Privacy, Precision, and Performance (3Ps) are three fundamental design objectives in distributed systems. However, these properties tend to compete with one another and are not considered absolute properties or functions. They must be defined and justified in terms of a system, its resources, stakeholder concerns, and the security threat model. To date, distributed systems research has only considered the trade-offs of balancing privacy, precision, and performance in a pairwise fashion. However, this dissertation formally explores the space of trade-offs among all 3Ps by examining three representative classes of distributed systems, namely Wireless Sensor Networks (WSNs), cloud systems, and Data Stream Management Systems (DSMSs). These representative systems support large part of the modern and mission-critical distributed systems. WSNs are real-time systems characterized by unreliable network interconnections and highly constrained computational and power resources. The dissertation proposes a privacy-preserving in-network aggregation protocol for WSNs demonstrating that the 3Ps could be navigated by adopting the appropriate algorithms and cryptographic techniques that are not prohibitively expensive. Next, the dissertation highlights the privacy and precision issues that arise in cloud databases due to the eventual consistency models of the cloud. To address these issues, consistency enforcement techniques across cloud servers are proposed and the trade-offs between 3Ps are discussed to help guide cloud database users on how to balance these properties. Lastly, the 3Ps properties are examined in DSMSs which are characterized by high volumes of unbounded input data streams and strict real-time processing constraints. Within this system, the 3Ps are balanced through a proposed simple and efficient technique that applies access control policies over shared operator networks to achieve privacy and precision without sacrificing the systems performance. Despite that in this dissertation, it was shown that, with the right set of protocols and algorithms, the desirable 3P properties can co-exist in a balanced way in well-established distributed systems, this dissertation is promoting the use of the new 3Ps-by-design concept. This concept is meant to encourage distributed systems designers to proactively consider the interplay among the 3Ps from the initial stages of the systems design lifecycle rather than identifying them as add-on properties to systems

D-Scholarship@Pitt

Valuing architecture for strategic purposes : comments on applying the dependency structure matrix with real options theory

Author: Sharman David M. (David Maynard), 1966-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2002
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, System Design & Management Program, 2002.Also issued in leaves.Includes bibliographical references (p. 232-236).Analysis of product and organisational architecture using dependency structure matrices to describe a series of domains, in conjunction with real options theory, assists in predicting the strategic capabilities of either existing or potential products and organisations, and likely optimal or maximal rates of change. This assists in predicting the extent to which technologically dependent organisations can realistically create and capture value from a portfolio approach as a number of technology conglomerates currently seek to do. It also goes some way towards explaining why existing organisations find it difficult to create or exploit new knowledge and thereby helps explain why many synergies remain unrealised. This suggests that strategic leadership of technology conglomerates must be by people who possess either the tacit knowledge of the financial, organisational and technical aspects of the business, or who possess explicit tools to bridge any gaps. Given that explicit financial tools are available, in the absence of unique individuals the strategic planning process needs to incorporate measures designed to a priori check that the proposed strategies will result in technical knowledge creation and organisational value capture.by David M. Sharman.S.M

DSpace@MIT

Compiler and Runtime Optimizations for Fine-Grained Distributed Shared Memory Systems

Author: Veldema R.S.
Publication venue
Publication date: 01/01/2003
Field of study

Bal, H.E. [Promotor

VU Research Portal

Proceedings Work-In-Progress Session of the 13th Real-Time and Embedded Technology and Applications Symposium

Author: Lu Chenyang
Publication venue: Washington University Open Scholarship
Publication date: 03/04/2007
Field of study

The Work-In-Progress session of the 13th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS\u2707) presents papers describing contributions both to state of the art and state of the practice in the broad field of real-time and embedded systems. The 17 accepted papers were selected from 19 submissions. This proceedings is also available as Washington University in St. Louis Technical Report WUCSE-2007-17, at http://www.cse.seas.wustl.edu/Research/FileDownload.asp?733. Special thanks go to the General Chairs – Steve Goddard and Steve Liu and Program Chairs - Scott Brandt and Frank Mueller for their support and guidance

Washington University St. Louis: Open Scholarship

Enabling effective product launch decisions

Author: Akamphon Sappinandana
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2008.Includes bibliographical references (p. 102-106).The present work looks into the question of optimizing the performance of product launch decisions-in particular, the decisions of product development duration and manufacturing ramp-up. It presents an innovative model for measuring product launch performance and optimizing the decisions by integrating a design structure matrix model for product development, a technical cost model for manufacturing, and revenue and warranty models for customer reaction to product quality into one model using net revenue as a metric. The model shows that overlooking the interactions between product development and manufacturing leads to suboptimal decisions. Furthermore, it points out that product quality is apparently the most important driver for product launch performance and that the effects of product launch decisions on resulting product quality need to be considered. Results from case studies demonstrate that improving firm's tactical strategies will help shorten product launch and improve its performance, while factors such as low reputation or high product failure rate will require lengthening product launch to minimize their impacts. Finally, the model results are analyzed to yield direction for firms relative to strategies that can be implemented to improve product launch performance. The most effective strategy is one that improves the PD capability (higher ability to find and fix problems) and the second most effective is to improve problem solving in manufacturing ramp-up.by Sappinandana Akamphon.Ph.D

DSpace@MIT

UAV or Drones for Remote Sensing Applications in GPS/GNSS Enabled and GPS/GNSS Denied Environments

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The design of novel UAV systems and the use of UAV platforms integrated with robotic sensing and imaging techniques, as well as the development of processing workflows and the capacity of ultra-high temporal and spatial resolution data, have enabled a rapid uptake of UAVs and drones across several industries and application domains.This book provides a forum for high-quality peer-reviewed papers that broaden awareness and understanding of single- and multiple-UAV developments for remote sensing applications, and associated developments in sensor technology, data processing and communications, and UAV system design and sensing capabilities in GPS-enabled and, more broadly, Global Navigation Satellite System (GNSS)-enabled and GPS/GNSS-denied environments.Contributions include:UAV-based photogrammetry, laser scanning, multispectral imaging, hyperspectral imaging, and thermal imaging;UAV sensor applications; spatial ecology; pest detection; reef; forestry; volcanology; precision agriculture wildlife species tracking; search and rescue; target tracking; atmosphere monitoring; chemical, biological, and natural disaster phenomena; fire prevention, flood prevention; volcanic monitoring; pollution monitoring; microclimates; and land use;Wildlife and target detection and recognition from UAV imagery using deep learning and machine learning techniques;UAV-based change detection

Directory of Open Access Books (DOAB)