95 research outputs found

    A multi-port 10GbE PCIe NIC featuring UDP offload and GPUDirect capabilities

    Get PDF
    NaNet-10 is a four-ports 10GbE PCIe Network Interface Card designed for low-latency real-time operations with GPU systems. To this purpose the design includes an UDP offload module, for fast and clock-cycle deterministic handling of the transport layer protocol, plus a GPUDirect P2P/RDMA engine for low-latency communication with NVIDIA Tesla GPU devices. A dedicated module (Multi-Stream) can optionally process input UDP streams before data is delivered through PCIe DMA to their destination devices, re-organizing data from different streams guaranteeing computational optimization. NaNet-10 is going to be integrated in the NA62 CERN experiment in order to assess the suitability of GPGPU systems as real-time triggers; results and lessons learned while performing this activity will be reported herein

    Parallel Real-Time Scheduling for Latency-Critical Applications

    Get PDF
    In order to provide safety guarantees or quality of service guarantees, many of today\u27s systems consist of latency-critical applications, e.g. applications with timing constraints. The problem of scheduling multiple latency-critical jobs on a multiprocessor or multicore machine has been extensively studied for sequential (non-parallizable) jobs and different system models and different objectives have been considered. However, the computational requirement of a single job is still limited by the capacity of a single core. To provide increasingly complex functionalities of applications and to complete their higher computational demands within the same or even more stringent timing constraints, we must exploit the internal parallelism of jobs, where individual jobs are parallel programs and can potentially utilize more than one core in parallel. However, there is little work considering scheduling multiple parallel jobs that are latency-critical. This dissertation focuses on developing new scheduling strategies, analysis tools, and practical platform design techniques to enable efficient and scalable parallel real-time scheduling for latency-critical applications on multicore systems. In particular, the research is focused on two types of systems: (1) static real-time systems for tasks with deadlines where the temporal properties of the tasks that need to execute is known a priori and the goal is to guarantee the temporal correctness of the tasks prior to their executions; and (2) online systems for latency-critical jobs where multiple jobs arrive over time and the goal to optimize for a performance objective of jobs during the execution. For static real-time systems for parallel tasks, several scheduling strategies, including global earliest deadline first, global rate monotonic and a novel federated scheduling, are proposed, analyzed and implemented. These scheduling strategies have the best known theoretical performance for parallel real-time tasks under any global strategy, any fixed priority scheduling and any scheduling strategy, respectively. In addition, federated scheduling is generalized to systems with multiple criticality levels and systems with stochastic tasks. Both numerical and empirical experiments show that federated scheduling and its variations have good schedulability performance and are efficient in practice. For online systems with multiple latency-critical jobs, different online scheduling strategies are proposed and analyzed for different objectives, including maximizing the number of jobs meeting a target latency, maximizing the profit of jobs, minimizing the maximum latency and minimizing the average latency. For example, a simple First-In-First-Out scheduler is proven to be scalable for minimizing the maximum latency. Based on this theoretical intuition, a more practical work-stealing scheduler is developed, analyzed and implemented. Empirical evaluations indicate that, on both real world and synthetic workloads, this work-stealing implementation performs almost as well as an optimal scheduler

    Automated anomaly recognition in real time data streams for oil and gas industry.

    Get PDF
    There is a growing demand for computer-assisted real-time anomaly detection - from the identification of suspicious activities in cyber security, to the monitoring of engineering data for various applications across the oil and gas, automotive and other engineering industries. To reduce the reliance on field experts' knowledge for identification of these anomalies, this thesis proposes a deep-learning anomaly-detection framework that can help to create an effective real-time condition-monitoring framework. The aim of this research is to develop a real-time and re-trainable generic anomaly-detection framework, which is capable of predicting and identifying anomalies with a high level of accuracy - even when a specific anomalous event has no precedent. Machine-based condition monitoring is preferable in many practical situations where fast data analysis is required, and where there are harsh climates or otherwise life-threatening environments. For example, automated conditional monitoring systems are ideal in deep sea exploration studies, offshore installations and space exploration. This thesis firstly reviews studies about anomaly detection using machine learning. It then adopts the best practices from those studies in order to propose a multi-tiered framework for anomaly detection with heterogeneous input sources, which can deal with unseen anomalies in a real-time dynamic problem environment. The thesis then applies the developed generic multi-tiered framework to two fields of engineering: data analysis and malicious cyber attack detection. Finally, the framework is further refined based on the outcomes of those case studies and is used to develop a secure cross-platform API, capable of re-training and data classification on a real-time data feed

    Time-bounded distributed QoS-aware service configuration in heterogeneous cooperative environments

    Get PDF
    The scarcity and diversity of resources among the devices of heterogeneous computing environments may affect their ability to perform services with specific Quality of Service constraints, particularly in dynamic distributed environments where the characteristics of the computational load cannot always be predicted in advance. Our work addresses this problem by allowing resource constrained devices to cooperate with more powerful neighbour nodes, opportunistically taking advantage of global distributed resources and processing power. Rather than assuming that the dynamic configuration of this cooperative service executes until it computes its optimal output, the paper proposes an anytime approach that has the ability to tradeoff deliberation time for the quality of the solution. Extensive simulations demonstrate that the proposed anytime algorithms are able to quickly find a good initial solution and effectively optimise the rate at which the quality of the current solution improves at each iteration, with an overhead that can be considered negligible

    Coordinating in dialogue: Using compound contributions to join a party

    Get PDF
    PhDCompound contributions (CCs) – dialogue contributions that continue or complete an earlier contribution – are an important and common device conversational participants use to extend their own and each other’s turns. The organisation of these cross-turn structures is one of the defining characteristics of natural dialogue, and cross-person CCs provide the paradigm case of coordination in dialogue. This thesis combines corpus analysis, experiments and theoretical modelling to explore how CCs are used, their effects on coordination and implications for dialogue models. The syntactic and pragmatic distribution of CCs is mapped using corpora of ordinary and task-oriented dialogues. This indicates that the principal factors conditioning the distribution of CCs are pragmatic and that same- and cross-person CCs tend to occur in different contexts. In order to test the impact of CCs on other conversational participants, two experiments are presented. These systematically manipulate, for the first time, the occurrence of CCs in live dialogue using text-based communication. The results suggest that syntax does not directly constrain the interpretation of CCs, and the primary effect of a cross-person CC on third parties is to suggest to them a strong form of coordination or coalition has formed between the people producing the two parts of the CC. A third experiment explores the conditions under which people will produce a completion for a truncated turn. Manipulations of the structural and contextual predictability of the truncated turn show that while syntax provides a resource for the construction of a CC it does not place significant constraints on where the split point may occur. It also shows that people are more likely to produce continuations when they share common ground. An analysis using the Dynamic Syntax framework is proposed, which extends previous work to account for these findings, and limitations and further research possibilities are outlined

    Optimizing Collective Communication for Scalable Scientific Computing and Deep Learning

    Get PDF
    In the realm of distributed computing, collective operations involve coordinated communication and synchronization among multiple processing units, enabling efficient data exchange and collaboration. Scientific applications, such as simulations, computational fluid dynamics, and scalable deep learning, require complex computations that can be parallelized across multiple nodes in a distributed system. These applications often involve data-dependent communication patterns, where collective operations are critical for achieving high performance in data exchange. Optimizing collective operations for scientific applications and deep learning involves improving the algorithms, communication patterns, and data distribution strategies to minimize communication overhead and maximize computational efficiency. Within the context of this dissertation, the specific focus is on optimizing the alltoall operation in 3D Fast Fourier Transform (FFT) applications and the allreduce operation in parallel deep learning, particularly on High-Performance Computing (HPC) systems. Advanced communication algorithms and methods are explored and implemented to improve communication efficiency, consequently enhancing the overall performance of 3D FFT applications. Furthermore, this dissertation investigates the identification of performance bottlenecks during collective communication over Horovod on distributed systems. These bottlenecks are addressed by proposing an optimized parallel communication pattern specifically tailored to alleviate the aforementioned limitations during the training phase in distributed deep learning. The objective is to achieve faster convergence and improve the overall training efficiency. Moreover, this dissertation proposes fault tolerance and elastic scaling features for distributed deep learning by leveraging the User-Level Failure Mitigation (ULFM) from Message Passing Interface (MPI). By incorporating ULFM MPI, the dissertation aims to enhance the elastic capabilities of distributed deep learning systems. This approach enables graceful and lightweight handling of failures while facilitating seamless scaling in dynamic computing environments

    Knowledge-Intensive Processes: Characteristics, Requirements and Analysis of Contemporary Approaches

    Get PDF
    Engineering of knowledge-intensive processes (KiPs) is far from being mastered, since they are genuinely knowledge- and data-centric, and require substantial flexibility, at both design- and run-time. In this work, starting from a scientific literature analysis in the area of KiPs and from three real-world domains and application scenarios, we provide a precise characterization of KiPs. Furthermore, we devise some general requirements related to KiPs management and execution. Such requirements contribute to the definition of an evaluation framework to assess current system support for KiPs. To this end, we present a critical analysis on a number of existing process-oriented approaches by discussing their efficacy against the requirements

    Direct occlusion handling for high level image processing algorithms

    Get PDF
    Many high-level computer vision algorithms suffer in the presence of occlusions caused by multiple objects overlapping in a view. Occlusions remove the direct correspondence between visible areas of objects and the objects themselves by introducing ambiguity in the interpretation of the shape of the occluded object. Ignoring this ambiguity allows the perceived geometry of overlapping objects to be deformed or even fractured. Supplementing the raw image data with a vectorized structural representation which predicts object completions could stabilize high-level algorithms which currently disregard occlusions. Studies in the neuroscience community indicate that the feature points located at the intersection of junctions may be used by the human visual system to produce these completions. Geiger, Pao, and Rubin have successfully used these features in a purely rasterized setting to complete objects in a fashion similar to what is demonstrated by human perception. This work proposes using these features in a vectorized approach to solving the mid-level computer vision problem of object stitching. A system has been implemented which is able extract L and T-junctions directly from the edges of an image using scale-space and robust statistical techniques. The system is sensitive enough to be able to isolate the corners on polygons with 24 sides or more, provided sufficient image resolution is available. Areas of promising development have been identified and several directions for further research are proposed

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure
    corecore