187 research outputs found
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
General Matrix Multiplication (GEMM) is a crucial algorithm for various
applications such as machine learning and scientific computing, and an
efficient GEMM implementation is essential for the performance of these
systems. While researchers often strive for faster performance by using large
compute platforms, the increased scale of these systems can raise concerns
about hardware and software reliability. In this paper, we present a design for
a high-performance GEMM with algorithm-based fault tolerance for use on GPUs.
We describe fault-tolerant designs for GEMM at the thread, warp, and
threadblock levels, and also provide a baseline GEMM implementation that is
competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM.
We present a kernel fusion strategy to overlap and mitigate the memory latency
due to fault tolerance with the original GEMM computation. To support a wide
range of input matrix shapes and reduce development costs, we present a
template-based approach for automatic code generation for both fault-tolerant
and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA
Tesla T4 and A100 server GPUs. Experimental results demonstrate that our
baseline GEMM presents comparable or superior performance compared to the
closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead
(8.89\% on average) compared to cuBLAS even with hundreds of errors injected
per minute. For irregularly shaped inputs, the code generator-generated kernels
show remarkable speedups of and
for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to
.Comment: 11 pages, 2023 International Conference on Supercomputin
Design of Hypervelocity-Impact Damage Evaluation Technique Based on Bayesian Classifier of Transient Temperature Attributes
With the rapid increasement of space debris on earth orbit, the hypervelocity-impact (HVI) of space debris can cause some serious damages to the spacecraft, which can affect the operation security and reliability of spacecraft. Therefore, the damage detection of the spacecrafts has become an urgent problem to be solved. In this paper, a method is proposed to detect the damage of spacecraft. Firstly, a variable-interval method is proposed to extract the effective information from the infrared image sequence. Secondly, in order to mine the physical meaning of the thermal image sequence, five attributes are used to construct a feature space. After that, a Naive Bayesian classifier is established to mine the information of different damaged areas. Then, a maximum interclass distance function is used choose the representative of each class. Finally, in order to visualize damaged areas, the Canny operator is used to extract the edge of the damage. In the experiment, ground tests are used to simulate hypervelocity impacts in space. Historical data of natural damaged material and artificial damaged material are used to build different classifiers. After that, the effective of classifiers is illustrated by accuracy, F-score and AUC. Then, two different types of materials are detected by proposed method, Independent Component Analysis (ICA) and Fuzzy C-means (FCM). The results show that the proposed method is more accurate than other methods
Information Dissemination Model Based on User Attitude and Public Opinion Environment
Modeling the information dissemination process in social networks is a
challenging problem. Despite numerous attempts to address this issue, existing
studies often assume that user attitudes have only one opportunity to alter
during the information dissemination process. Additionally, these studies tend
to consider the transformation of user attitudes as solely influenced by a
single user, overlooking the dynamic and evolving nature of user attitudes and
the impact of the public opinion environment. In this paper, we propose a novel
model, UAPE, which considers the influence of the aforementioned factors on the
information dissemination process. Specifically, UAPE regards the user's
attitude towards the topic as dynamically changing, with the change jointly
affected by multiple users simultaneously. Furthermore, the joint influence of
multiple users can be considered as the impact of the public opinion
environment. Extensive experimental results demonstrate that the model achieves
an accuracy range of 91.62% to 94.01%, surpassing the performance of existing
research
Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach
Due to the scale and complexity of cloud systems, a system failure would
trigger an "alert storm", i.e., massive correlated alerts. Although these
alerts can be traced back to a few root causes, the overwhelming number makes
it infeasible for manual handling. Alert aggregation is thus critical to help
engineers concentrate on the root cause and facilitate failure resolution.
Existing methods typically utilize semantic similarity-based methods or
statistical methods to aggregate alerts. However, semantic similarity-based
methods overlook the causal rationale of alerts, while statistical methods can
hardly handle infrequent alerts.
To tackle these limitations, we introduce leveraging external knowledge,
i.e., Standard Operation Procedure (SOP) of alerts as a supplement. We propose
COLA, a novel hybrid approach based on correlation mining and LLM (Large
Language Model) reasoning for online alert aggregation. The correlation mining
module effectively captures the temporal and spatial relations between alerts,
measuring their correlations in an efficient manner. Subsequently, only
uncertain pairs with low confidence are forwarded to the LLM reasoning module
for detailed analysis. This hybrid design harnesses both statistical evidence
for frequent alerts and the reasoning capabilities of computationally intensive
LLMs, ensuring the overall efficiency of COLA in handling large volumes of
alerts in practical scenarios. We evaluate COLA on three datasets collected
from the production environment of a large-scale cloud platform. The
experimental results show COLA achieves F1-scores from 0.901 to 0.930,
outperforming state-of-the-art methods and achieving comparable efficiency. We
also share our experience in deploying COLA in our real-world cloud system,
Cloud X.Comment: Accepted by Proceedings of the 46th International Conference on
Software Engineering: Software Engineering in Practice (ICSE SEIP 2024
High sensitivity multi-axes rotation sensing using large momentum transfer point source atom interferometry
A point source interferometer (PSI) is a device where atoms are split and
recombined by applying a temporal sequence of Raman pulses during the expansion
of a cloud of cold atoms behaving approximately as a point source. The PSI can
work as a sensitive multi-axes gyroscope that can automatically filter out the
signal from accelerations. The phase shift arising from rotations is
proportional to the momentum transferred to each atom from the Raman pulses.
Therefore, by increasing the momentum transfer, it should be possibly to
enhance the sensitivity of the PSI. Here, we investigate the degree of
enhancement in sensitivity that could be achieved by augmenting the PSI with
large momentum transfer (LMT) employing a sequence of many Raman pulses with
alternating directions. Contrary to typical approaches used for describing a
PSI, we employ a model under which the motion of the center of mass of each
atom is described quantum mechanically. We show how increasing Doppler shifts
lead to imperfections, thereby limiting the visibility of the signal fringes,
and identify ways to suppress this effect by increasing the effective,
two-photon Rabi frequencies of the Raman pulses. Taking into account the effect
of spontaneous emission, we show that, for a given value of the one-photon Rabi
frequency, there is an optimum value for the number of pulses employed, beyond
which the net enhancement in sensitivity begins to decrease. For a one-photon
Rabi frequency of 200 MHz, for example, the peak value of the factor of
enhancement in sensitivity is ~39, for a momentum transfer that is ~69 times as
large as that for a conventional PSI. We also find that this peak value scales
as the one-photon Rabi frequency to the power of 4/5
Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems
Ensuring the reliability of cloud systems is critical for both cloud vendors
and customers. Cloud systems often rely on virtualization techniques to create
instances of hardware resources, such as virtual machines. However,
virtualization hinders the observability of cloud systems, making it
challenging to diagnose platform-level issues. To improve system observability,
we propose to infer functional clusters of instances, i.e., groups of instances
having similar functionalities. We first conduct a pilot study on a large-scale
cloud system, i.e., Huawei Cloud, demonstrating that instances having similar
functionalities share similar communication and resource usage patterns.
Motivated by these findings, we formulate the identification of functional
clusters as a clustering problem and propose a non-intrusive solution called
Prism. Prism adopts a coarse-to-fine clustering strategy. It first partitions
instances into coarse-grained chunks based on communication patterns. Within
each chunk, Prism further groups instances with similar resource usage patterns
to produce fine-grained functional clusters. Such a design reduces noises in
the data and allows Prism to process massive instances efficiently. We evaluate
Prism on two datasets collected from the real-world production environment of
Huawei Cloud. Our experiments show that Prism achieves a v-measure of ~0.95,
surpassing existing state-of-the-art solutions. Additionally, we illustrate the
integration of Prism within monitoring systems for enhanced cloud reliability
through two real-world use cases.Comment: The paper was accepted by the 38th IEEE/ACM International Conference
on Automated Software Engineering (ASE 2023
Go Static: Contextualized Logging Statement Generation
Logging practices have been extensively investigated to assist developers in
writing appropriate logging statements for documenting software behaviors.
Although numerous automatic logging approaches have been proposed, their
performance remains unsatisfactory due to the constraint of the single-method
input, without informative programming context outside the method.
Specifically, we identify three inherent limitations with single-method
context: limited static scope of logging statements, inconsistent logging
styles, and missing type information of logging variables. To tackle these
limitations, we propose SCLogger, the first contextualized logging statement
generation approach with inter-method static contexts. First, SCLogger extracts
inter-method contexts with static analysis to construct the contextualized
prompt for language models to generate a tentative logging statement. The
contextualized prompt consists of an extended static scope and sampled similar
methods, ordered by the chain-of-thought (COT) strategy. Second, SCLogger
refines the access of logging variables by formulating a new refinement prompt
for language models, which incorporates detailed type information of variables
in the tentative logging statement. The evaluation results show that SCLogger
surpasses the state-of-the-art approach by 8.7% in logging position accuracy,
32.1% in level accuracy, 19.6% in variable precision, and 138.4% in text BLEU-4
score. Furthermore, SCLogger consistently boosts the performance of logging
statement generation across a range of large language models, thereby
showcasing the generalizability of this approach.Comment: This paper was accepted by The ACM International Conference on the
Foundations of Software Engineering (FSE 2024
- …