44,373 research outputs found
ESRO: Experience Assisted Service Reliability against Outages
Modern cloud services are prone to failures due to their complex
architecture, making diagnosis a critical process. Site Reliability Engineers
(SREs) spend hours leveraging multiple sources of data, including the alerts,
error logs, and domain expertise through past experiences to locate the root
cause(s). These experiences are documented as natural language text in outage
reports for previous outages. However, utilizing the raw yet rich
semi-structured information in the reports systematically is time-consuming.
Structured information, on the other hand, such as alerts that are often used
during fault diagnosis, is voluminous and requires expert knowledge to discern.
Several strategies have been proposed to use each source of data separately for
root cause analysis. In this work, we build a diagnostic service called ESRO
that recommends root causes and remediation for failures by utilizing
structured as well as semi-structured sources of data systematically. ESRO
constructs a causal graph using alerts and a knowledge graph using outage
reports, and merges them in a novel way to form a unified graph during
training. A retrieval-based mechanism is then used to search the unified graph
and rank the likely root causes and remediation techniques based on the alerts
fired during an outage at inference time. Not only the individual alerts, but
their respective importance in predicting an outage group is taken into account
during recommendation. We evaluated our model on several cloud service outages
of a large SaaS enterprise over the course of ~2 years, and obtained an average
improvement of 27% in rouge scores after comparing the likely root causes
against the ground truth over state-of-the-art baselines. We further establish
the effectiveness of ESRO through qualitative analysis on multiple real outage
examples.Comment: Accepted to 38th IEEE/ACM International Conference on Automated
Software Engineering (ASE 2023
Answering Complex Questions Using Open Information Extraction
While there has been substantial progress in factoid question-answering (QA),
answering complex questions remains challenging, typically requiring both a
large body of knowledge and inference techniques. Open Information Extraction
(Open IE) provides a way to generate semi-structured knowledge for QA, but to
date such knowledge has only been used to answer simple questions with
retrieval-based methods. We overcome this limitation by presenting a method for
reasoning with Open IE knowledge, allowing more complex questions to be
handled. Using a recently proposed support graph optimization framework for QA,
we develop a new inference model for Open IE, in particular one that can work
effectively with multiple short facts, noise, and the relational structure of
tuples. Our model significantly outperforms a state-of-the-art structured
solver on complex questions of varying difficulty, while also removing the
reliance on manually curated knowledge.Comment: Accepted as short paper at ACL 201
Structural Regularities in Text-based Entity Vector Spaces
Entity retrieval is the task of finding entities such as people or products
in response to a query, based solely on the textual documents they are
associated with. Recent semantic entity retrieval algorithms represent queries
and experts in finite-dimensional vector spaces, where both are constructed
from text sequences.
We investigate entity vector spaces and the degree to which they capture
structural regularities. Such vector spaces are constructed in an unsupervised
manner without explicit information about structural aspects. For concreteness,
we address these questions for a specific type of entity: experts in the
context of expert finding. We discover how clusterings of experts correspond to
committees in organizations, the ability of expert representations to encode
the co-author graph, and the degree to which they encode academic rank. We
compare latent, continuous representations created using methods based on
distributional semantics (LSI), topic models (LDA) and neural networks
(word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as
doc2vec and SERT, systematically perform better at clustering than LSI, LDA and
word2vec. When it comes to encoding entity relations, SERT performs best.Comment: ICTIR2017. Proceedings of the 3rd ACM International Conference on the
Theory of Information Retrieval. 201
Investigation into Indexing XML Data Techniques
The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues.
Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the
size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow
Term-Specific Eigenvector-Centrality in Multi-Relation Networks
Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim
- …