641 research outputs found

    A practical approach to dynamic load balancing

    Full text link

    Weighted Bayesian Gaussian Mixture Model for Roadside LiDAR Object Detection

    Full text link
    Background modeling is widely used for intelligent surveillance systems to detect moving targets by subtracting the static background components. Most roadside LiDAR object detection methods filter out foreground points by comparing new data points to pre-trained background references based on descriptive statistics over many frames (e.g., voxel density, number of neighbors, maximum distance). However, these solutions are inefficient under heavy traffic, and parameter values are hard to transfer from one scenario to another. In early studies, the probabilistic background modeling methods widely used for the video-based system were considered unsuitable for roadside LiDAR surveillance systems due to the sparse and unstructured point cloud data. In this paper, the raw LiDAR data were transformed into a structured representation based on the elevation and azimuth value of each LiDAR point. With this high-order tensor representation, we break the barrier to allow efficient high-dimensional multivariate analysis for roadside LiDAR background modeling. The Bayesian Nonparametric (BNP) approach integrates the intensity value and 3D measurements to exploit the measurement data using 3D and intensity info entirely. The proposed method was compared against two state-of-the-art roadside LiDAR background models, computer vision benchmark, and deep learning baselines, evaluated at point, object, and path levels under heavy traffic and challenging weather. This multimodal Weighted Bayesian Gaussian Mixture Model (GMM) can handle dynamic backgrounds with noisy measurements and substantially enhances the infrastructure-based LiDAR object detection, whereby various 3D modeling for smart city applications could be created

    Accelerating MPI collective communications through hierarchical algorithms with flexible inter-node communication and imbalance awareness

    Get PDF
    This work presents and evaluates algorithms for MPI collective communication operations on high performance systems. Collective communication algorithms are extensively investigated, and a universal algorithm to improve the performance of MPI collective operations on hierarchical clusters is introduced. This algorithm exploits shared-memory buffers for efficient intra-node communication while still allowing the use of unmodified, hierarchy-unaware traditional collectives for inter-node communication. The universal algorithm shows impressive performance results with a variety of collectives, improving upon the MPICH algorithms as well as the Cray MPT algorithms. Speedups average 15x - 30x for most collectives with improved scalability up to 65536 cores.^ Further novel improvements are also proposed for inter-node communication. By utilizing algorithms which take advantage of multiple senders from the same shared memory buffer, an additional speedup of 2.5x can be achieved. The discussion also evaluates special-purpose extensions to improve intra-node communication. These extensions return a shared memory or copy-on-write protected buffer from the collective, which reduces or completely eliminates the second phase of intra-node communication.^ The second part of this work improves the performance of MPI collective communication operations in the presence of imbalanced processes arrival times. High performance collective communications are crucial for the performance and scalability of applications, and imbalanced process arrival times are common in these applications. A micro-benchmark is used to investigate the nature of process imbalance with perfectly balanced workloads, and understand the nature of inter- versus intra-node imbalance. These insights are then used to develop imbalance tolerant reduction, broadcast, and alltoall algorithms, which minimize the synchronization delay observed by early arriving processes. These algorithms have been implemented and tested on a Cray XE6 using up to 32k cores with varying buffer sizes and levels of imbalance. Results show speedups over MPICH averaging 18.9x for reduce, 5.3x for broadcast, and 6.9x for alltoall in the presence of high, but not unreasonable, imbalance

    Load-Balance and Fault-Tolerance for Massively Parallel Phylogenetic Inference

    Get PDF

    Artificial neural networks and their applications to intelligent fault diagnosis of power transmission lines

    Get PDF
    Over the past thirty years, the idea of computing based on models inspired by human brains and biological neural networks emerged. Artificial neural networks play an important role in the field of machine learning and hold the key to the success of performing many intelligent tasks by machines. They are used in various applications such as pattern recognition, data classification, stock market prediction, aerospace, weather forecasting, control systems, intelligent automation, robotics, and healthcare. Their architectures generally consist of an input layer, multiple hidden layers, and one output layer. They can be implemented on software or hardware. Nowadays, various structures with various names exist for artificial neural networks, each of which has its own particular applications. Those used types in this study include feedforward neural networks, convolutional neural networks, and general regression neural networks. Increasing the number of layers in artificial neural networks as needed for large datasets, implies increased computational expenses. Therefore, besides these basic structures in deep learning, some advanced techniques are proposed to overcome the drawbacks of original structures in deep learning such as transfer learning, federated learning, and reinforcement learning. Furthermore, implementing artificial neural networks in hardware gives scientists and engineers the chance to perform high-dimensional and big data-related tasks because it removes the constraints of memory access time defined as the von Neuman bottleneck. Accordingly, analog and digital circuits are used for artificial neural network implementations without using general-purpose CPUs. In this study, the problem of fault detection, identification, and location estimation of transmission lines is studied and various deep learning approaches are implemented and designed as solutions. This research work focuses on the transmission lines’ datasets, their faults, and the importance of identification, detection, and location estimation of them. It also includes a comprehensive review of the previous studies to perform these three tasks. The application of various artificial neural networks such as feedforward neural networks, convolutional neural networks, and general regression neural networks for identification, detection, and location estimation of transmission line datasets are also discussed in this study. Some advanced methods based on artificial neural networks are taken into account in this thesis such as the transfer learning technique. These methodologies are designed and applied on transmission line datasets to enable the scientist and engineers with using fewer data points for the training purpose and wasting less time on the training step. This work also proposes a transfer learning-based technique for distinguishing faulty and non-faulty insulators in transmission line images. Besides, an effective design for an activation function of the artificial neural networks is proposed in this thesis. Using hyperbolic tangent as an activation function in artificial neural networks has several benefits including inclusiveness and high accuracy

    Graph Processing on GPUs:A Survey

    Get PDF

    Circuit simulation using distributed waveform relaxation techniques

    Get PDF
    Simulation plays an important role in the design of integrated circuits. Due to high costs and large delays involved in their fabrication, simulation is commonly used to verify functionality and to predict performance before fabrication. This thesis describes analysis, implementation and performance evaluation of a distributed memory parallel waveform relaxation technique for the electrical circuit simulation of MOS VLSI circuits. The waveform relaxation technique exhibits inherent parallelism due to the partitioning of a circuit into a number of sub-circuits. These subcircuits can be concurrently simulated on parallel processors. Different forms of parallelism in the direct method and the waveform relaxation technique are studied. An analysis of single queue and distributed queue approaches to implement parallel waveform relaxation on distributed memory machines is performed and their performance implications are studied. The distributed queue approach selected for exploiting the coarse grain parallelism across sub-circuits is described. Parallel waveform relaxation programs based on Gauss-Seidel and Gauss-Jacobi techniques are implemented using a network of eight Transputers. Static and dynamic load balancing strategies are studied. A dynamic load balancing algorithm is developed and implemented. Results of parallel implementation are analyzed to identify sources of bottlenecks. This thesis has demonstrated the applicability of a low cost distributed memory multi-computer system for simulation of MOS VLSI circuits. Speed-up measurements prove that a five times improvement in the speed of calculations can be achieved using a full window parallel Gauss-Jacobi waveform relaxation algorithm. Analysis of overheads shows that load imbalance is the major source of overhead and that the fraction of the computation which must be performed sequentially is very low. Communication overhead depends on the nature of the parallel architecture and the design of communication mechanisms. The run-time environment (parallel processing framework) developed in this research exploits features of the Transputer architecture to reduce the effect of the communication overhead by effectively overlapping computation with communications, and running communications processes at a higher priority. This research will contribute to the development of low cost, high performance workstations for computer-aided design and analysis of VLSI circuits

    Data Challenges and Data Analytics Solutions for Power Systems

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Filtering geometric imperfection patterns for analysis and design of composite shell structures

    Get PDF
    [no abstract
    • …
    corecore