Search CORE

727 research outputs found

Right for the Right Reason: Training Agnostic Networks

Author: A Halevy
Aylin Caliskan
J Li
MD Zeiler
N Cristianini
O Russakovsky
W Chu
Y Ganin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/06/2018
Field of study

We consider the problem of a neural network being requested to classify images (or other inputs) without making implicit use of a "protected concept", that is a concept that should not play any role in the decision of the network. Typically these concepts include information such as gender or race, or other contextual information such as image backgrounds that might be implicitly reflected in unknown correlations with other variables, making it insufficient to simply remove them from the input features. In other words, making accurate predictions is not good enough if those predictions rely on information that should not be used: predictive performance is not the only important metric for learning systems. We apply a method developed in the context of domain adaptation to address this problem of "being right for the right reason", where we request a classifier to make a decision in a way that is entirely 'agnostic' to a given protected concept (e.g. gender, race, background etc.), even if this could be implicitly reflected in other attributes via unknown correlations. After defining the concept of an 'agnostic model', we demonstrate how the Domain-Adversarial Neural Network can remove unwanted information from a model using a gradient reversal layer.Comment: Author's original versio

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

An XML Query Engine for Network-Bound Data

Author: Halevy Alon Y
Ives Zachary G
Weld Daniel S
Publication venue: ScholarlyCommons
Publication date: 01/01/2001
Field of study

XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability

CiteSeerX

ScholarlyCommons@Penn

Detecting Inspiring Content on Social Media

Author: Boureau Y-Lan
Halevy Alon
Ignat Oana
Yu Jane A.
Publication venue
Publication date: 29/05/2023
Field of study

Inspiration moves a person to see new possibilities and transforms the way they perceive their own potential. Inspiration has received little attention in psychology, and has not been researched before in the NLP community. To the best of our knowledge, this work is the first to study inspiration through machine learning methods. We aim to automatically detect inspiring content from social media data. To this end, we analyze social media posts to tease out what makes a post inspiring and what topics are inspiring. We release a dataset of 5,800 inspiring and 5,800 non-inspiring English-language public post unique ids collected from a dump of Reddit public posts made available by a third party and use linguistic heuristics to automatically detect which social media English-language posts are inspiring.Comment: accepted at ACII 202

arXiv.org e-Print Archive

Piazza: Data Management Infrastructure for Semantic Web Applications

Author: Halevy Alon Y
Ives Zachary G
Mork Peter
Tatarinov Igor
Publication venue: ScholarlyCommons
Publication date: 20/05/2003
Field of study

The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world\u27s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it

CiteSeerX

ScholarlyCommons@Penn

Absence of Magnetism in Hcp Iron-Nickel at 11 K

Author: Chow P.
Cohen R. E.
Fultz B.
Halevy I.
Hu M. Y.
Lucas M. S.
Papandrew A. B.
Somayazulu M.
Stevens R.
Publication venue: 'American Physical Society (APS)'
Publication date: 25/08/2006
Field of study

Synchrotron Mössbauer spectroscopy (SMS) was performed on an hcp-phase alloy of composition Fe92Ni8 at a pressure of 21 GPa and a temperature of 11 K. Density functional theoretical calculations predict antiferromagnetism in both hcp Fe and hcp Fe-Ni. For hcp Fe, these calculations predict no hyperfine magnetic field, consistent with previous experiments. For hcp Fe-Ni, however, substantial hyperfine magnetic fields are predicted, but these were not observed in the SMS spectra. Two possible explanations are suggested. First, small but significant errors in the generalized gradient approximation density functional may lead to an erroneous prediction of magnetic order or of erroneous hyperfine magnetic fields in antiferromagnetic hcp Fe-Ni. Alternately, quantum fluctuations with periods much shorter than the lifetime of the nuclear excited state would prohibit the detection of moments by SMS

Caltech Authors

Network-wide Configuration Synthesis

Author: AY Halevy
B Fortz
C Cadar
D Kroening
E Clarke
EK Jackson
IS Mumick
J Doyle
J Gottlieb
JD Ullman
S Knight
S Narain
Y Smaragdakis
Publication venue
Publication date: 30/05/2017
Field of study

Computer networks are hard to manage. Given a set of high-level requirements (e.g., reachability, security), operators have to manually figure out the individual configuration of potentially hundreds of devices running complex distributed protocols so that they, collectively, compute a compatible forwarding state. Not surprisingly, operators often make mistakes which lead to downtimes. To address this problem, we present a novel synthesis approach that automatically computes correct network configurations that comply with the operator's requirements. We capture the behavior of existing routers along with the distributed protocols they run in stratified Datalog. Our key insight is to reduce the problem of finding correct input configurations to the task of synthesizing inputs for a stratified Datalog program. To solve this synthesis task, we introduce a new algorithm that synthesizes inputs for stratified Datalog programs. This algorithm is applicable beyond the domain of networks. We leverage our synthesis algorithm to construct the first network-wide configuration synthesis system, called SyNET, that support multiple interacting routing protocols (OSPF and BGP) and static routes. We show that our system is practical and can infer correct input configurations, in a reasonable amount time, for networks of realistic size (> 50 routers) that forward packets for multiple traffic classes.Comment: 24 Pages, short version published in CAV 201

arXiv.org e-Print Archive

Crossref

Eliciting Risk Preferences using Choice Lists

Author: Freeman D
Halevy Y
Kneeland TL
Publication venue
Publication date: 01/01/2019
Field of study

We study the effect of embedding pairwise choices between lotteries within a choice list on measured risk attitude. Using an experiment with online workers, we find that subjects choose the risky lottery rather than a sure payment significantly more often when responding to a choice list. This behavior can be rationalized by the interaction between non-expected utility and the random incentive system, as suggested by Karni and Safra (1987)

UCL Discovery

Dynamics of iron atoms across the pressure-induced Invar transition in Pd_3Fe

Author: Alp E. E.
Chow P.
Fultz B.
Halevy I.
Lucas M. S.
Mauger L.
Muñoz J. A.
Sturhahn W.
Tan Hongjin
Toellner T. S.
Winterrose M. L.
Xiao Y.
Yue A. F.
Publication venue: 'American Physical Society (APS)'
Publication date: 20/04/2011
Field of study

The ^(57)Fe phonon partial density of states (PDOS) in L1_2-ordered Pd_3Fe was studied at high pressures by nuclear resonant inelastic x-ray scattering (NRIXS) measurements and density functional theory (DFT) calculations. The NRIXS spectra showed that the stiffening of the ^(57)Fe PDOS with decreasing volume was slower from 12 to 24 GPa owing to the pressure-induced Invar transition in Pd_3Fe, with a change from a high-moment ferromagnetic (FM) state to a low-moment (LM) state observed by nuclear forward scattering. Force constants obtained from fitting to a Born–von Kármán model showed a relative softening of the first-nearest-neighbor (1NN) Fe-Pd longitudinal force constants at the magnetic transition. For the FM low-pressure state, the DFT calculations gave a PDOS and 1NN longitudinal force constants in good agreement with experiment, but discrepancies for the high-pressure LM state suggest the presence of short-range magnetic order

Caltech Authors

TimelineQA: A Benchmark for Question Answering over Timelines

Author: Dwivedi-Yu Jane
Halevy Alon Y.
Li Yuliang
Mathias Lambert
Saeidi Marzieh
Tan Wang-Chiew
Yan Jing Nathan
Publication venue
Publication date: 01/06/2023
Field of study

Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over lifelogs is beyond the current state of the art of question answering techniques for a variety of reasons, the most pronounced of which is that lifelogs combine free text with some degree of structure such as temporal and geographical information. We create and publicly release TimelineQA1, a benchmark for accelerating progress on querying lifelogs. TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high school graduation to those that occur on a daily basis such as going for a run. We describe a set of experiments on TimelineQA with several state-of-the-art QA models. Our experiments reveal that for atomic queries, an extractive QA system significantly out-performs a state-of-the-art retrieval-augmented QA system. For multi-hop queries involving aggregates, we show that the best result is obtained with a state-of-the-art table QA technique, assuming the ground truth set of episodes for deriving the answer is available

arXiv.org e-Print Archive