20 research outputs found
Detecting (Un)Important Content for Single-Document News Summarization
We present a robust approach for detecting intrinsic sentence importance in
news, by training on two corpora of document-summary pairs. When used for
single-document summarization, our approach, combined with the "beginning of
document" heuristic, outperforms a state-of-the-art summarizer and the
beginning-of-article baseline in both automatic and manual evaluations. These
results represent an important advance because in the absence of cross-document
repetition, single document summarizers for news have not been able to
consistently outperform the strong beginning-of-article baseline.Comment: Accepted By EACL 201
Robust causal inference using directed acyclic graphs: the R package ‘dagitty’
Directed acyclic graphs (DAGs), which offer systematic representations of causal relationships, have become an established framework for the analysis of causal inference in epidemiology, often being used to determine covariate adjustment sets for minimizing confounding bias. DAGitty is a popular web application for drawing and analysing DAGs. Here we introduce the R package ‘dagitty’, which provides access to all of the capabilities of the DAGitty web application within the R platform for statistical computing, and also offers several new functions. We describe how the R package ‘dagitty’ can be used to: evaluate whether a DAG is consistent with the dataset it is intended to represent; enumerate ‘statistically equivalent’ but causally different DAGs; and identify exposure outcome adjustment sets that are valid for causally different but statistically equivalent DAGs. This functionality enables epidemiologists to detect causal misspecifications in DAGs and make robust inferences that remain valid for a range of different DAGs. The R package ‘dagitty’ is available through the comprehensive R archive network (CRAN) at
[https://cran.r-project.org/web/packages/dagitty/]. The source code is available on github at [https://github.com/jtextor/dagitty]. The web application ‘DAGitty’ is free software, licensed under the GNU general public licence (GPL) version 2 and is available at [http://
dagitty.net/]
Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models
Latent variable models (LVMs) learn probabilistic models of data manifolds
lying in an \emph{ambient} Euclidean space. In a number of applications, a
priori known spatial constraints can shrink the ambient space into a
considerably smaller manifold. Additionally, in these applications the
Euclidean geometry might induce a suboptimal similarity measure, which could be
improved by choosing a different metric. Euclidean models ignore such
information and assign probability mass to data points that can never appear as
data, and vastly different likelihoods to points that are similar under the
desired metric. We propose the wrapped Gaussian process latent variable model
(WGPLVM), that extends Gaussian process latent variable models to take values
strictly on a given ambient Riemannian manifold, making the model blind to
impossible data points. This allows non-linear, probabilistic inference of
low-dimensional Riemannian submanifolds from data. Our evaluation on diverse
datasets show that we improve performance on several tasks, including encoding,
visualization and uncertainty quantification
Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models
Latent variable models (LVMs) learn probabilistic models of data manifolds
lying in an \emph{ambient} Euclidean space. In a number of applications, a
priori known spatial constraints can shrink the ambient space into a
considerably smaller manifold. Additionally, in these applications the
Euclidean geometry might induce a suboptimal similarity measure, which could be
improved by choosing a different metric. Euclidean models ignore such
information and assign probability mass to data points that can never appear as
data, and vastly different likelihoods to points that are similar under the
desired metric. We propose the wrapped Gaussian process latent variable model
(WGPLVM), that extends Gaussian process latent variable models to take values
strictly on a given ambient Riemannian manifold, making the model blind to
impossible data points. This allows non-linear, probabilistic inference of
low-dimensional Riemannian submanifolds from data. Our evaluation on diverse
datasets show that we improve performance on several tasks, including encoding,
visualization and uncertainty quantification
Efficient and Parsimonious Agnostic Active Learning
Abstract We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise. 2) It is efficiently implementable with an ERM oracle. 3) It is more aggressive than all previous approaches satisfying 1 and 2. To do this, we create an algorithm based on a newly defined optimization problem and analyze it. We also conduct the first experimental analysis of all efficient agnostic active learning algorithms, evaluating their strengths and weaknesses in different settings
An Adaptive Resilience Testing Framework for Microservice Systems
Resilience testing, which measures the ability to minimize service
degradation caused by unexpected failures, is crucial for microservice systems.
The current practice for resilience testing relies on manually defining rules
for different microservice systems. Due to the diverse business logic of
microservices, there are no one-size-fits-all microservice resilience testing
rules. As the quantity and dynamic of microservices and failures largely
increase, manual configuration exhibits its scalability and adaptivity issues.
To overcome the two issues, we empirically compare the impacts of common
failures in the resilient and unresilient deployments of a benchmark
microservice system. Our study demonstrates that the resilient deployment can
block the propagation of degradation from system performance metrics (e.g.,
memory usage) to business metrics (e.g., response latency). In this paper, we
propose AVERT, the first AdaptiVE Resilience Testing framework for microservice
systems. AVERT first injects failures into microservices and collects available
monitoring metrics. Then AVERT ranks all the monitoring metrics according to
their contributions to the overall service degradation caused by the injected
failures. Lastly, AVERT produces a resilience index by how much the degradation
in system performance metrics propagates to the degradation in business
metrics. The higher the degradation propagation, the lower the resilience of
the microservice system. We evaluate AVERT on two open-source benchmark
microservice systems. The experimental results show that AVERT can accurately
and efficiently test the resilience of microservice systems