168 research outputs found
Etude de la Maximisation de l'Influence dans les RĂ©seaux Sociaux
National audienceInfluence maximization is a NP-hard problem depending on the diffusion of information in social networks. The Greedy hill climbing algorithm have been proved a good approximation if the influence fonction we try to optimize is submodular, which is the case for standard diffusion models.We present a diffusion model not equivalent to standard models for which the influence function is not submodular. Then we propose, using toy graphs and a real social network, a study of different influence maximization algorithms on this model and on the standard model IC: some basic heuristics, the greedy hill climbing method, a generalization of the greedy method and an optimization method for submodular functions. We show that even if the influence function is not submodular, the greedy algorithm obtain good results while being able to scale efficiently
Un modèle de RI basé sur des critères d'obligation et de certitude
International audienceIl existe un grand nombre de modèles de recherche d'information chacun ayant pour but de répondre au mieux aux attentes des utilisateurs. Le modèle que nous proposons se base sur une formulation précise de la requête reflétant le besoin de l'utilisateur : Chaque terme de la requête est augmenté par deux critères, l'un exprimant l'obligation ou non de l'apparition du terme dans les documents et l'autre exprimant la certitude de l'utilisateur quand au terme utilisé. Des expérimentations nous ont permis de vérifier qu'une telle formulation permet de gagner en précision
Learning Multiple Temporal Matching for Time Series Classification
12International audienceIn real applications, time series are generally of complex structure, exhibiting different global behaviors within classes. To discriminate such challenging time series, we propose a multiple temporal matching approach that reveals the commonly shared features within classes, and the most differential ones across classes. For this, we rely on a new framework based on the variance/covariance criterion to strengthen or weaken matched observations according to the induced variability within and between classes. The experiments performed on real and synthetic datasets demonstrate the ability of the multiple temporal matching approach to capture fine-grained distinctions between time series
Uncertain Trees: Dealing with Uncertain Inputs in Regression Trees
Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees,
have been successfully used for regression in many applications and research
studies. Furthermore, these methods have been extended in order to deal with
uncertainty in the output variable, using for example a quantile loss in Random
Forests (Meinshausen, 2006). To the best of our knowledge, no extension has
been provided yet for dealing with uncertainties in the input variables, even
though such uncertainties are common in practical situations. We propose here
such an extension by showing how standard regression trees optimizing a
quadratic loss can be adapted and learned while taking into account the
uncertainties in the inputs. By doing so, one no longer assumes that an
observation lies into a single region of the regression tree, but rather that
it belongs to each region with a certain probability. Experiments conducted on
several data sets illustrate the good behavior of the proposed extension.Comment: 9 page
Terminology-based Text Embedding for Computing Document Similarities on Technical Content
We propose in this paper a new, hybrid document embedding approach in order
to address the problem of document similarities with respect to the technical
content. To do so, we employ a state-of-the-art graph techniques to first
extract the keyphrases (composite keywords) of documents and, then, use them to
score the sentences. Using the ranked sentences, we propose two approaches to
embed documents and show their performances with respect to two baselines. With
domain expert annotations, we illustrate that the proposed methods can find
more relevant documents and outperform the baselines up to 27% in terms of
NDCG
Predicting Information Diffusion in Social Networks using Content and User's Profiles
International audiencePredicting the diffusion of information on social networks is a key problem for applications like Opinion Leader Detection, Buzz Detection or Viral Marketing. Many recent diffusion models are direct extensions of the Cascade and Threshold models, initially proposed for epidemiology and social studies. In such models, the diffusion process is based on the dynamics of interactions between neighbor nodes in the network (the social pressure), and largely ignores important dimensions as the content of the piece of information diffused. We propose here a new family of probabilistic models that aims at predicting how a con- tent diffuses in a network by making use of additional dimensions: the content of the piece of information diffused, user's profile and willing- ness to diffuse. These models are illustrated and compared with other approaches on two blog datasets. The experimental results obtained on these datasets show that taking into account the content of the piece of information diffused is important to accurately model the diffusion process
- …