187 research outputs found

    Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

    Full text link
    General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reliability. In this paper, we present a design for a high-performance GEMM with algorithm-based fault tolerance for use on GPUs. We describe fault-tolerant designs for GEMM at the thread, warp, and threadblock levels, and also provide a baseline GEMM implementation that is competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM. We present a kernel fusion strategy to overlap and mitigate the memory latency due to fault tolerance with the original GEMM computation. To support a wide range of input matrix shapes and reduce development costs, we present a template-based approach for automatic code generation for both fault-tolerant and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA Tesla T4 and A100 server GPUs. Experimental results demonstrate that our baseline GEMM presents comparable or superior performance compared to the closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead (8.89\% on average) compared to cuBLAS even with hundreds of errors injected per minute. For irregularly shaped inputs, the code generator-generated kernels show remarkable speedups of 160%∼183.5%160\% \sim 183.5\% and 148.55%∼165.12%148.55\% \sim 165.12\% for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to 41.40%41.40\%.Comment: 11 pages, 2023 International Conference on Supercomputin

    Design of Hypervelocity-Impact Damage Evaluation Technique Based on Bayesian Classifier of Transient Temperature Attributes

    Get PDF
    With the rapid increasement of space debris on earth orbit, the hypervelocity-impact (HVI) of space debris can cause some serious damages to the spacecraft, which can affect the operation security and reliability of spacecraft. Therefore, the damage detection of the spacecrafts has become an urgent problem to be solved. In this paper, a method is proposed to detect the damage of spacecraft. Firstly, a variable-interval method is proposed to extract the effective information from the infrared image sequence. Secondly, in order to mine the physical meaning of the thermal image sequence, five attributes are used to construct a feature space. After that, a Naive Bayesian classifier is established to mine the information of different damaged areas. Then, a maximum interclass distance function is used choose the representative of each class. Finally, in order to visualize damaged areas, the Canny operator is used to extract the edge of the damage. In the experiment, ground tests are used to simulate hypervelocity impacts in space. Historical data of natural damaged material and artificial damaged material are used to build different classifiers. After that, the effective of classifiers is illustrated by accuracy, F-score and AUC. Then, two different types of materials are detected by proposed method, Independent Component Analysis (ICA) and Fuzzy C-means (FCM). The results show that the proposed method is more accurate than other methods

    Information Dissemination Model Based on User Attitude and Public Opinion Environment

    Full text link
    Modeling the information dissemination process in social networks is a challenging problem. Despite numerous attempts to address this issue, existing studies often assume that user attitudes have only one opportunity to alter during the information dissemination process. Additionally, these studies tend to consider the transformation of user attitudes as solely influenced by a single user, overlooking the dynamic and evolving nature of user attitudes and the impact of the public opinion environment. In this paper, we propose a novel model, UAPE, which considers the influence of the aforementioned factors on the information dissemination process. Specifically, UAPE regards the user's attitude towards the topic as dynamically changing, with the change jointly affected by multiple users simultaneously. Furthermore, the joint influence of multiple users can be considered as the impact of the public opinion environment. Extensive experimental results demonstrate that the model achieves an accuracy range of 91.62% to 94.01%, surpassing the performance of existing research

    Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach

    Full text link
    Due to the scale and complexity of cloud systems, a system failure would trigger an "alert storm", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typically utilize semantic similarity-based methods or statistical methods to aggregate alerts. However, semantic similarity-based methods overlook the causal rationale of alerts, while statistical methods can hardly handle infrequent alerts. To tackle these limitations, we introduce leveraging external knowledge, i.e., Standard Operation Procedure (SOP) of alerts as a supplement. We propose COLA, a novel hybrid approach based on correlation mining and LLM (Large Language Model) reasoning for online alert aggregation. The correlation mining module effectively captures the temporal and spatial relations between alerts, measuring their correlations in an efficient manner. Subsequently, only uncertain pairs with low confidence are forwarded to the LLM reasoning module for detailed analysis. This hybrid design harnesses both statistical evidence for frequent alerts and the reasoning capabilities of computationally intensive LLMs, ensuring the overall efficiency of COLA in handling large volumes of alerts in practical scenarios. We evaluate COLA on three datasets collected from the production environment of a large-scale cloud platform. The experimental results show COLA achieves F1-scores from 0.901 to 0.930, outperforming state-of-the-art methods and achieving comparable efficiency. We also share our experience in deploying COLA in our real-world cloud system, Cloud X.Comment: Accepted by Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE SEIP 2024

    High sensitivity multi-axes rotation sensing using large momentum transfer point source atom interferometry

    Get PDF
    A point source interferometer (PSI) is a device where atoms are split and recombined by applying a temporal sequence of Raman pulses during the expansion of a cloud of cold atoms behaving approximately as a point source. The PSI can work as a sensitive multi-axes gyroscope that can automatically filter out the signal from accelerations. The phase shift arising from rotations is proportional to the momentum transferred to each atom from the Raman pulses. Therefore, by increasing the momentum transfer, it should be possibly to enhance the sensitivity of the PSI. Here, we investigate the degree of enhancement in sensitivity that could be achieved by augmenting the PSI with large momentum transfer (LMT) employing a sequence of many Raman pulses with alternating directions. Contrary to typical approaches used for describing a PSI, we employ a model under which the motion of the center of mass of each atom is described quantum mechanically. We show how increasing Doppler shifts lead to imperfections, thereby limiting the visibility of the signal fringes, and identify ways to suppress this effect by increasing the effective, two-photon Rabi frequencies of the Raman pulses. Taking into account the effect of spontaneous emission, we show that, for a given value of the one-photon Rabi frequency, there is an optimum value for the number of pulses employed, beyond which the net enhancement in sensitivity begins to decrease. For a one-photon Rabi frequency of 200 MHz, for example, the peak value of the factor of enhancement in sensitivity is ~39, for a momentum transfer that is ~69 times as large as that for a conventional PSI. We also find that this peak value scales as the one-photon Rabi frequency to the power of 4/5

    Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems

    Full text link
    Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create instances of hardware resources, such as virtual machines. However, virtualization hinders the observability of cloud systems, making it challenging to diagnose platform-level issues. To improve system observability, we propose to infer functional clusters of instances, i.e., groups of instances having similar functionalities. We first conduct a pilot study on a large-scale cloud system, i.e., Huawei Cloud, demonstrating that instances having similar functionalities share similar communication and resource usage patterns. Motivated by these findings, we formulate the identification of functional clusters as a clustering problem and propose a non-intrusive solution called Prism. Prism adopts a coarse-to-fine clustering strategy. It first partitions instances into coarse-grained chunks based on communication patterns. Within each chunk, Prism further groups instances with similar resource usage patterns to produce fine-grained functional clusters. Such a design reduces noises in the data and allows Prism to process massive instances efficiently. We evaluate Prism on two datasets collected from the real-world production environment of Huawei Cloud. Our experiments show that Prism achieves a v-measure of ~0.95, surpassing existing state-of-the-art solutions. Additionally, we illustrate the integration of Prism within monitoring systems for enhanced cloud reliability through two real-world use cases.Comment: The paper was accepted by the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

    Go Static: Contextualized Logging Statement Generation

    Full text link
    Logging practices have been extensively investigated to assist developers in writing appropriate logging statements for documenting software behaviors. Although numerous automatic logging approaches have been proposed, their performance remains unsatisfactory due to the constraint of the single-method input, without informative programming context outside the method. Specifically, we identify three inherent limitations with single-method context: limited static scope of logging statements, inconsistent logging styles, and missing type information of logging variables. To tackle these limitations, we propose SCLogger, the first contextualized logging statement generation approach with inter-method static contexts. First, SCLogger extracts inter-method contexts with static analysis to construct the contextualized prompt for language models to generate a tentative logging statement. The contextualized prompt consists of an extended static scope and sampled similar methods, ordered by the chain-of-thought (COT) strategy. Second, SCLogger refines the access of logging variables by formulating a new refinement prompt for language models, which incorporates detailed type information of variables in the tentative logging statement. The evaluation results show that SCLogger surpasses the state-of-the-art approach by 8.7% in logging position accuracy, 32.1% in level accuracy, 19.6% in variable precision, and 138.4% in text BLEU-4 score. Furthermore, SCLogger consistently boosts the performance of logging statement generation across a range of large language models, thereby showcasing the generalizability of this approach.Comment: This paper was accepted by The ACM International Conference on the Foundations of Software Engineering (FSE 2024
    • …
    corecore