727 research outputs found
Right for the Right Reason: Training Agnostic Networks
We consider the problem of a neural network being requested to classify
images (or other inputs) without making implicit use of a "protected concept",
that is a concept that should not play any role in the decision of the network.
Typically these concepts include information such as gender or race, or other
contextual information such as image backgrounds that might be implicitly
reflected in unknown correlations with other variables, making it insufficient
to simply remove them from the input features. In other words, making accurate
predictions is not good enough if those predictions rely on information that
should not be used: predictive performance is not the only important metric for
learning systems. We apply a method developed in the context of domain
adaptation to address this problem of "being right for the right reason", where
we request a classifier to make a decision in a way that is entirely 'agnostic'
to a given protected concept (e.g. gender, race, background etc.), even if this
could be implicitly reflected in other attributes via unknown correlations.
After defining the concept of an 'agnostic model', we demonstrate how the
Domain-Adversarial Neural Network can remove unwanted information from a model
using a gradient reversal layer.Comment: Author's original versio
An XML Query Engine for Network-Bound Data
XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats.
However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability
Detecting Inspiring Content on Social Media
Inspiration moves a person to see new possibilities and transforms the way
they perceive their own potential. Inspiration has received little attention in
psychology, and has not been researched before in the NLP community. To the
best of our knowledge, this work is the first to study inspiration through
machine learning methods. We aim to automatically detect inspiring content from
social media data. To this end, we analyze social media posts to tease out what
makes a post inspiring and what topics are inspiring. We release a dataset of
5,800 inspiring and 5,800 non-inspiring English-language public post unique ids
collected from a dump of Reddit public posts made available by a third party
and use linguistic heuristics to automatically detect which social media
English-language posts are inspiring.Comment: accepted at ACII 202
Piazza: Data Management Infrastructure for Semantic Web Applications
The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world\u27s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it
Absence of Magnetism in Hcp Iron-Nickel at 11 K
Synchrotron Mössbauer spectroscopy (SMS) was performed on an hcp-phase alloy of composition Fe92Ni8 at a pressure of 21 GPa and a temperature of 11 K. Density functional theoretical calculations predict antiferromagnetism in both hcp Fe and hcp Fe-Ni. For hcp Fe, these calculations predict no hyperfine magnetic field, consistent with previous experiments. For hcp Fe-Ni, however, substantial hyperfine magnetic fields are predicted, but these were not observed in the SMS spectra. Two possible explanations are suggested. First, small but significant errors in the generalized gradient approximation density functional may lead to an erroneous prediction of magnetic order or of erroneous hyperfine magnetic fields in antiferromagnetic hcp Fe-Ni. Alternately, quantum fluctuations with periods much shorter than the lifetime of the nuclear excited state would prohibit the detection of moments by SMS
Network-wide Configuration Synthesis
Computer networks are hard to manage. Given a set of high-level requirements
(e.g., reachability, security), operators have to manually figure out the
individual configuration of potentially hundreds of devices running complex
distributed protocols so that they, collectively, compute a compatible
forwarding state. Not surprisingly, operators often make mistakes which lead to
downtimes. To address this problem, we present a novel synthesis approach that
automatically computes correct network configurations that comply with the
operator's requirements. We capture the behavior of existing routers along with
the distributed protocols they run in stratified Datalog. Our key insight is to
reduce the problem of finding correct input configurations to the task of
synthesizing inputs for a stratified Datalog program. To solve this synthesis
task, we introduce a new algorithm that synthesizes inputs for stratified
Datalog programs. This algorithm is applicable beyond the domain of networks.
We leverage our synthesis algorithm to construct the first network-wide
configuration synthesis system, called SyNET, that support multiple interacting
routing protocols (OSPF and BGP) and static routes. We show that our system is
practical and can infer correct input configurations, in a reasonable amount
time, for networks of realistic size (> 50 routers) that forward packets for
multiple traffic classes.Comment: 24 Pages, short version published in CAV 201
Eliciting Risk Preferences using Choice Lists
We study the effect of embedding pairwise choices between lotteries
within a choice list on measured risk attitude. Using an experiment with online
workers, we find that subjects choose the risky lottery rather than a sure payment
significantly more often when responding to a choice list. This behavior can be rationalized
by the interaction between non-expected utility and the random incentive
system, as suggested by Karni and Safra (1987)
Dynamics of iron atoms across the pressure-induced Invar transition in Pd_3Fe
The ^(57)Fe phonon partial density of states (PDOS) in L1_2-ordered Pd_3Fe was studied at high pressures by nuclear resonant inelastic x-ray scattering (NRIXS) measurements and density functional theory (DFT) calculations. The NRIXS spectra showed that the stiffening of the ^(57)Fe PDOS with decreasing volume was slower from 12 to 24 GPa owing to the pressure-induced Invar transition in Pd_3Fe, with a change from a high-moment ferromagnetic (FM) state to a low-moment (LM) state observed by nuclear forward scattering. Force constants obtained from fitting to a Born–von Kármán model showed a relative softening of the first-nearest-neighbor (1NN) Fe-Pd longitudinal force constants at the magnetic transition. For the FM low-pressure state, the DFT calculations gave a PDOS and 1NN longitudinal force constants in good agreement with experiment, but discrepancies for the high-pressure LM state suggest the presence of short-range magnetic order
TimelineQA: A Benchmark for Question Answering over Timelines
Lifelogs are descriptions of experiences that a person had during their life.
Lifelogs are created by fusing data from the multitude of digital services,
such as online photos, maps, shopping and content streaming services. Question
answering over lifelogs can offer personal assistants a critical resource when
they try to provide advice in context. However, obtaining answers to questions
over lifelogs is beyond the current state of the art of question answering
techniques for a variety of reasons, the most pronounced of which is that
lifelogs combine free text with some degree of structure such as temporal and
geographical information.
We create and publicly release TimelineQA1, a benchmark for accelerating
progress on querying lifelogs. TimelineQA generates lifelogs of imaginary
people. The episodes in the lifelog range from major life episodes such as high
school graduation to those that occur on a daily basis such as going for a run.
We describe a set of experiments on TimelineQA with several state-of-the-art QA
models. Our experiments reveal that for atomic queries, an extractive QA system
significantly out-performs a state-of-the-art retrieval-augmented QA system.
For multi-hop queries involving aggregates, we show that the best result is
obtained with a state-of-the-art table QA technique, assuming the ground truth
set of episodes for deriving the answer is available
- …