151,226 research outputs found
On Reasoning with RDF Statements about Statements using Singleton Property Triples
The Singleton Property (SP) approach has been proposed for representing and
querying metadata about RDF triples such as provenance, time, location, and
evidence. In this approach, one singleton property is created to uniquely
represent a relationship in a particular context, and in general, generates a
large property hierarchy in the schema. It has become the subject of important
questions from Semantic Web practitioners. Can an existing reasoner recognize
the singleton property triples? And how? If the singleton property triples
describe a data triple, then how can a reasoner infer this data triple from the
singleton property triples? Or would the large property hierarchy affect the
reasoners in some way? We address these questions in this paper and present our
study about the reasoning aspects of the singleton properties. We propose a
simple mechanism to enable existing reasoners to recognize the singleton
property triples, as well as to infer the data triples described by the
singleton property triples. We evaluate the effect of the singleton property
triples in the reasoning processes by comparing the performance on RDF datasets
with and without singleton properties. Our evaluation uses as benchmark the
LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal
information added through singleton properties
Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data
Image data are increasingly encountered and are of growing importance in many
areas of science. Much of these data are quantitative image data, which are
characterized by intensities that represent some measurement of interest in the
scanned images. The data typically consist of multiple images on the same
domain and the goal of the research is to combine the quantitative information
across images to make inference about populations or interventions. In this
paper we present a unified analysis framework for the analysis of quantitative
image data using a Bayesian functional mixed model approach. This framework is
flexible enough to handle complex, irregular images with many local features,
and can model the simultaneous effects of multiple factors on the image
intensities and account for the correlation between images induced by the
design. We introduce a general isomorphic modeling approach to fitting the
functional mixed model, of which the wavelet-based functional mixed model is
one special case. With suitable modeling choices, this approach leads to
efficient calculations and can result in flexible modeling and adaptive
smoothing of the salient features in the data. The proposed method has the
following advantages: it can be run automatically, it produces inferential
plots indicating which regions of the image are associated with each factor, it
simultaneously considers the practical and statistical significance of
findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Empirical Bayesian Approach to Testing Multiple Hypotheses with Separate Priors for Left and Right Alternatives
We consider a multiple hypotheses problem with directional alternatives in a decision theoretic framework. We obtain an empirical Bayes rule subject to a constraint on mixed directional false discovery rate (mdFDR≤α) under the semiparametric setting where the distribution of the test statistic is parametric, but the prior distribution is nonparametric. We proposed separate priors for the left tail and right tail alternatives as it may be required for many applications. The proposed Bayes rule is compared through simulation against rules proposed by Benjamini and Yekutieli and Efron. We illustrate the proposed methodology for two sets of data from biological experiments: HIV-transfected cell-line mRNA expression data, and a quantitative trait genome-wide SNP data set. We have developed a user-friendly web-based shiny App for the proposed method which is available through URL https://npseb.shinyapps.io/npseb/. The HIV and SNP data can be directly accessed, and the results presented in this paper can be executed
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Temporal fuzzy association rule mining with 2-tuple linguistic representation
This paper reports on an approach that contributes towards the problem of discovering fuzzy association rules that exhibit a temporal pattern. The novel application of the 2-tuple linguistic representation identifies fuzzy association rules in a temporal context, whilst maintaining the interpretability of linguistic terms. Iterative Rule Learning (IRL) with a Genetic Algorithm (GA) simultaneously induces rules and tunes the membership functions. The discovered rules were compared with those from a traditional method of discovering fuzzy association rules and results demonstrate how the traditional method can loose information because rules occur at the intersection of membership function boundaries. New information can be mined from the proposed approach by improving upon rules discovered with the traditional method and by discovering new rules
Empirical Bayesian Approach to Testing Multiple Hypotheses with Separate Priors for Left and Right Alternatives
We consider a multiple hypotheses problem with directional alternatives in a decision theoretic framework. We obtain an empirical Bayes rule subject to a constraint on mixed directional false discovery rate (mdFDR≤α) under the semiparametric setting where the distribution of the test statistic is parametric, but the prior distribution is nonparametric. We proposed separate priors for the left tail and right tail alternatives as it may be required for many applications. The proposed Bayes rule is compared through simulation against rules proposed by Benjamini and Yekutieli and Efron. We illustrate the proposed methodology for two sets of data from biological experiments: HIV-transfected cell-line mRNA expression data, and a quantitative trait genome-wide SNP data set. We have developed a user-friendly web-based shiny App for the proposed method which is available through URL https://npseb.shinyapps.io/npseb/. The HIV and SNP data can be directly accessed, and the results presented in this paper can be executed
Negative Statements Considered Useful
Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities
DNA-inspired online behavioral modeling and its application to spambot detection
We propose a strikingly novel, simple, and effective approach to model online
user behavior: we extract and analyze digital DNA sequences from user online
actions and we use Twitter as a benchmark to test our proposal. We obtain an
incisive and compact DNA-inspired characterization of user actions. Then, we
apply standard DNA analysis techniques to discriminate between genuine and
spambot accounts on Twitter. An experimental campaign supports our proposal,
showing its effectiveness and viability. To the best of our knowledge, we are
the first ones to identify and adapt DNA-inspired techniques to online user
behavioral modeling. While Twitter spambot detection is a specific use case on
a specific social media, our proposed methodology is platform and technology
agnostic, hence paving the way for diverse behavioral characterization tasks
- …