513 research outputs found

    Probabilistic data types

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaConflict-Free Replicated Data Types (CRDTs) provide deterministic outcomes from concurrent executions. The conflict resolution mechanism uses information on the ordering of the last operations performed, which indicates if a given operation is known by a replica, typically using some variant of version vectors. This thesis will explore the construction of CRDTs that use a novel stochastic mechanism that can track with high accuracy knowledge of the occurrence of recently performed operations and with less accuracy for older operations. The aim is to obtain better scaling properties and avoid the use of metadata that is linear on the number of replicas.Conflict-Free Replicated Data Types (CRDTs) oferecem resultados determinísticos de execuções concorrentes. O mecanismo de resolução de conflitos usa informação sobre a ordenação das últimas operações realizadas, que indica se uma dada operação é conhecida por uma réplica, geralmente usando alguma variante de version vectors. Esta tese explorara a construção de CRDTs que utilizam um novo mecanismo estocástico que pode identificar com alta precisão o conhecimento sobre a ocorrência de operações realizadas recentemente e com menor precisão para operações mais antigas. O objetivo é a obtenção de melhores propriedades de escalabilidade e evitar o uso de metadados em quantidade linear em relação ao número de réplicas

    Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids

    Get PDF
    This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework. Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates. Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint

    Design and Implementation of SEMAR IoT Server Platform with Applications

    Get PDF
    Nowadays, rapid developments of Internet of Things (IoT) technologies have increased possibilities of realizing smart cities where collaborations and integrations of various IoT application systems are essential. However, IoT application systems have often been designed and deployed independently without considering the standards of devices, logics, and data communications. In this paper, we present the design and implementation of the IoT server platform called Smart Environmental Monitoring and Analytical in Real-Time (SEMAR) for integrating IoT application systems using standards. SEMAR offers Big Data environments with built-in functions for data aggregations, synchronizations, and classifications with machine learning. Moreover, plug-in functions can be easily implemented. Data from devices for different sensors can be accepted directly and through network connections, which will be used in real-time for user interfaces, text files, and access to other systems through Representational State Transfer Application Programming Interface (REST API) services. For evaluations of SEMAR, we implemented the platform and integrated five IoT application systems, namely, the air-conditioning guidance system, the fingerprint-based indoor localization system, the water quality monitoring system, the environment monitoring system, and the air quality monitoring system. When compared with existing research on IoT platforms, the proposed SEMAR IoT application server platform offers higher flexibility and interoperability with the functions for IoT device managements, data communications, decision making, synchronizations, and filters that can be easily integrated with external programs or IoT applications without changing the codes. The results confirm the effectiveness and efficiency of the proposal

    Design of Scalable Java Communication Middleware for Multi-Core Systems

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in The Computer Journal. The final authenticated version is available online at: https://doi.org/10.1093/comjnl/bxs122[Abstract] This paper presents smdev, a shared memory communication middleware for multi-core systems. smdev provides a simple and powerful messaging application program interface that is able to exploit the underlying multi-core architecture replacing inter-process and network-based communications by threads and shared memory transfers. The performance evaluation of smdev on several multi-core systems has shown noticeable improvements compared with other Java shared memory solutions, reaching and even overcoming the performance of natively compiled libraries. Thus, smdev has obtained start-up latencies around 0.76 μs and almost 90 Gbps bandwidth for point-to-point communications, as well as high performance and scalability both for collective operations and representative messaging kernels. This fact has motivated the integration of smdev in F-MPJ, our message-passing implementation in Java.Ministerio de Ciencia e Innovación; TIN2010-1673

    Fast and Efficient Classification, Tracking, and Simulation in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks are composed of large numbers of resource-lean sensors that collect low-level inputs from the physical world. The applications present challenges for programmers. On the one hand, lightweight algorithms are required given the limited capacity of the constituent devices. On the other, the algorithms must be scalable to accommodate large networks. In this thesis, we focus on the design and implementation of fast and lean (yet scalable) algorithms for classification, simulation, and target tracking in the context of wireless sensor networks. We briefly consider each of these challenges in turn. The first challenge is to achieve high precision classification of high-level events in-network using limited computational and energy resources. We present in-network implementations of a Bayesian classifier and a condensed kd-tree classifier for identifying events of interest on resource-lean embedded sensors. The first approach uses preprocessed sensor readings to derive a multi-dimensional Bayesian classifier used to classify sensor data in real-time. The second introduces an innovative condensed kd-tree to represent preprocessed sensor data and uses a fast nearest-neighbor search to determine the likelihood of class membership for incoming samples. Both classifiers consume limited resources and provide high precision classification. To evaluate each approach, two case studies are considered, in the contexts of human movement and vehicle navigation, respectively. The classification accuracy is above 85% for both classifiers across the two case studies. The second challenge is to achieve high performance parallel simulation of sensor network hardware. This is achieved by reducing the synchronization overhead among distributed simulation processes. Traditional parallel simulation strategies introduce significant synchronization overhead, reducing the simulation speed. We present an optimistic simulation algorithm with support for backtracking and re-execution. The algorithm reduces the number of synchronization cycles to the number of transmissions in the network under test. Concretely, we implement SnapSim, an extension to the popular Avrora simulator, based on this algorithm. The experimental results show that our prototype system improves the performance of Avrora by 2 to 10 times for typical network-centric sensor network applications, and up to three orders of magnitude for applications that use the radio infrequently. The third challenge is to efficiently track a moving target in a network. The difficulty again lies in the conflict between the limited resource capacity of typical sensors and the significant processing requirements of typical tracking algorithms. We introduce an in-network object tracking framework for tracking mobile objects using resource-lean sensors. The framework is based on a distributed, dynamically scoped tracking algorithm which adaptively scopes the event detection region based on object speed. A leader node records the samples across an event region (without the aid of time synchronization) and estimates the object\u27s location in situ. To minimize the number of radio transmissions, the location snapshotting rate is also adjusted based on the object speed. In this dissertation, focusing on the above challenges, we present the design, implementation, and evaluation of classification, simulation, and tracking contributions

    Implicit transactional memory in kilo-instruction multiprocessors

    Get PDF
    Although they have been the main server technology for many years, multiprocessors are undergoing a renaissance due to multi-core chips and the attractive scalability properties of combining a number of such multi-core chips into a system. The widespread use of multiprocessor systems will make performance losses due to consistency models and synchronization styles of popular programming models even more evident than they already are. Known architectural approaches to combat these losses are generally too complex, too specialized, or not transparent to software. In this article, we introduce implicit transactional memory as a generalized architectural concept to remove unnecessary performance losses caused by consistency models and synchronization styles. We show how the concept of implicit transactions can be implemented with low complexity by leveraging the multi-checkpoint mechanism of the Kilo-Instruction Processor. By relying on a general speculation substrate, this method supports even the strictest consistency model – sequential consistency – potentially as effectively as weaker models and it allows multiple threads to speculatively execute critical sections, beyond barriers and event synchronizations.Postprint (published version

    Spatial audio in small display screen devices

    Get PDF
    Our work addresses the problem of (visual) clutter in mobile device interfaces. The solution we propose involves the translation of technique-from the graphical to the audio domain-for expliting space in information representation. This article presents an illustrative example in the form of a spatialisedaudio progress bar. In usability tests, participants performed background monitoring tasks significantly more accurately using this spatialised audio (a compared with a conventional visual) progress bar. Moreover, their performance in a simultaneously running, visually demanding foreground task was significantly improved in the eye-free monitoring condition. These results have important implications for the design of multi-tasking interfaces for mobile devices

    Profiling Methodology and Performance Tuning of the Met Office Unified Model for Weather and Climate Simulations

    Get PDF
    Global weather and climate modelling is a compute-intensive task that is mission-critical to government departments concerned with meteorology and climate change. The dominant component of these models is a global atmosphere model. One such model, the Met Office Unified Model (MetUM), is widely used in both Europe and Australia for this purpose. This paper describes our experiences in developing an efficient profiling methodology and scalability analysis of the MetUM version 7.5 at both low scale and high scale atmosphere grid resolutions. Variability within the execution of the MetUM and variability of the run-time of identical jobs on a highly shared cluster are taken into account. The methodology uses a lightweight profiler internal to the MetUM which we have enhanced to have minimal overhead and enables accurate profiling with only a relatively modest usage of processor time. At high-scale resolution, the MetUM scaled to core counts of 2048, with load imbalance accounting a significant fraction the loss from ideal performance. Recent patches have removed two relatively small sources of inefficiency. Internal segment size parameters gave a modest performance improvement at low-scale resolution (such as are used in climate simulation); this however was not significant a higher scales. Near-square process grid configurations tended to give the best performance. Byte-swapping optimizations vastly improved I/O performance, which has in turn a large impact on performance in operational runs
    • …
    corecore