32,396 research outputs found
Truth discovery via exploiting implications from multi-source data
Data veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide con icting information about the same real-world entities, truth discovery is emerging as a counter- measure of resolving the con icts by discovering the truth, which conforms to the reality, from the multi-source data. A major challenge related to truth discovery is that different data items may have varying numbers of true values (or multi-truth), which counters the assumption of existing truth discovery methods that each data item should have exactly one true value. In this paper, we address this challenge by exploiting and leveraging the implications from multi-source data. In particular, we exploit three types of implications, namely the implicit negative claims, the distribution of positive/negative claims, and the co-occurrence of values in sources' claims, to facilitate multi-truth discovery. We propose a probabilistic approach with improvement measures that incorporate the three implications in all stages of truth discovery process. In particular, incorporating the negative claims enables multi-truth discovery, considering the distribution of positive/negative claims relieves truth discovery from the impact of sources' behavioral features in the specific datasets, and considering values' co-occurrence relationship compensates the information lost from evaluating each value in the same claims individually. Experimental results on three real-world datasets demonstrate the effectiveness of our approach.Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, Xiaofei Xu, and Boualem Benatalla
Invariant Models for Causal Transfer Learning
Methods of transfer learning try to combine knowledge from several related
tasks (or domains) to improve performance on a test task. Inspired by causal
methodology, we relax the usual covariate shift assumption and assume that it
holds true for a subset of predictor variables: the conditional distribution of
the target variable given this subset of predictors is invariant over all
tasks. We show how this assumption can be motivated from ideas in the field of
causality. We focus on the problem of Domain Generalization, in which no
examples from the test task are observed. We prove that in an adversarial
setting using this subset for prediction is optimal in Domain Generalization;
we further provide examples, in which the tasks are sufficiently diverse and
the estimator therefore outperforms pooling the data, even on average. If
examples from the test task are available, we also provide a method to transfer
knowledge from the training tasks and exploit all available features for
prediction. However, we provide no guarantees for this method. We introduce a
practical method which allows for automatic inference of the above subset and
provide corresponding code. We present results on synthetic data sets and a
gene deletion data set
The benefits of in silico modeling to identify possible small-molecule drugs and their off-target interactions
Accepted for publication in a future issue of Future Medicinal Chemistry.The research into the use of small molecules as drugs continues to be a key driver in the development of molecular databases, computer-aided drug design software and collaborative platforms. The evolution of computational approaches is driven by the essential criteria that a drug molecule has to fulfill, from the affinity to targets to minimal side effects while having adequate absorption, distribution, metabolism, and excretion (ADME) properties. A combination of ligand- and structure-based drug development approaches is already used to obtain consensus predictions of small molecule activities and their off-target interactions. Further integration of these methods into easy-to-use workflows informed by systems biology could realize the full potential of available data in the drug discovery and reduce the attrition of drug candidates.Peer reviewe
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
- …