290 research outputs found
The role of research in public relations
Thesis (M.S.)--Boston Universit
A maximum entropy approach to multiple classifiers combination
In this paper,we present amaximumentropy (maxent) approach to the fusion
of experts opinions, or classifiers outputs, problem. Themaxent approach is quite
versatile and allows us to express in a clear, rigorous,way the a priori knowledge
that is available on the problem. For instance, our knowledge about the reliability
of the experts and the correlations between these experts can be easily integrated:
Each piece of knowledge is expressed in the form of a linear constraint.
An iterative scaling algorithm is used in order to compute the maxent solution
of the problem. The maximum entropy method seeks the joint probability density
of a set of random variables that has maximum entropy while satisfying the
constraints. It is therefore the “most honest” characterization of our knowledge
given the available facts (constraints). In the case of conflicting constraints, we
propose to minimise the “lack of constraints satisfaction” or to relax some constraints
and recompute the maximum entropy solution. The maxent fusion rule
is illustrated by some simulations
Personalized PageRank with Node-dependent Restart
Personalized PageRank is an algorithm to classify the improtance of web pages
on a user-dependent basis. We introduce two generalizations of Personalized
PageRank with node-dependent restart. The first generalization is based on the
proportion of visits to nodes before the restart, whereas the second
generalization is based on the probability of visited node just before the
restart. In the original case of constant restart probability, the two measures
coincide. We discuss interesting particular cases of restart probabilities and
restart distributions. We show that the both generalizations of Personalized
PageRank have an elegant expression connecting the so-called direct and reverse
Personalized PageRanks that yield a symmetry property of these Personalized
PageRanks
SciRecSys: A Recommendation System for Scientific Publication by Discovering Keyword Relationships
In this work, we propose a new approach for discovering various relationships
among keywords over the scientific publications based on a Markov Chain model.
It is an important problem since keywords are the basic elements for
representing abstract objects such as documents, user profiles, topics and many
things else. Our model is very effective since it combines four important
factors in scientific publications: content, publicity, impact and randomness.
Particularly, a recommendation system (called SciRecSys) has been presented to
support users to efficiently find out relevant articles
Large Scale Spectral Clustering Using Approximate Commute Time Embedding
Spectral clustering is a novel clustering method which can detect complex
shapes of data clusters. However, it requires the eigen decomposition of the
graph Laplacian matrix, which is proportion to and thus is not
suitable for large scale systems. Recently, many methods have been proposed to
accelerate the computational time of spectral clustering. These approximate
methods usually involve sampling techniques by which a lot information of the
original data may be lost. In this work, we propose a fast and accurate
spectral clustering approach using an approximate commute time embedding, which
is similar to the spectral embedding. The method does not require using any
sampling technique and computing any eigenvector at all. Instead it uses random
projection and a linear time solver to find the approximate embedding. The
experiments in several synthetic and real datasets show that the proposed
approach has better clustering quality and is faster than the state-of-the-art
approximate spectral clustering methods
Spanning Forests and the Golden Ratio
For a graph G, let f_{ij} be the number of spanning rooted forests in which
vertex j belongs to a tree rooted at i. In this paper, we show that for a path,
the f_{ij}'s can be expressed as the products of Fibonacci numbers; for a
cycle, they are products of Fibonacci and Lucas numbers. The {\em doubly
stochastic graph matrix} is the matrix F=(f_{ij})/f, where f is the total
number of spanning rooted forests of G and n is the number of vertices in G. F
provides a proximity measure for graph vertices. By the matrix forest theorem,
F^{-1}=I+L, where L is the Laplacian matrix of G. We show that for the paths
and the so-called T-caterpillars, some diagonal entries of F (which provides a
measure of the self-connectivity of vertices) converge to \phi^{-1} or to
1-\phi^{-1}, where \phi is the golden ratio, as the number of vertices goes to
infinity. Thereby, in the asymptotic, the corresponding vertices can be
metaphorically considered as "golden introverts" and "golden extroverts,"
respectively. This metaphor is reinforced by a Markov chain interpretation of
the doubly stochastic graph matrix, according to which F equals the overall
transition matrix of a random walk with a random number of steps on G.Comment: 12 pages, 2 figures, 25 references. As accepted by Disc. Appl. Math.
(2007
Do logarithmic proximity measures outperform plain ones in graph clustering?
We consider a number of graph kernels and proximity measures including
commute time kernel, regularized Laplacian kernel, heat kernel, exponential
diffusion kernel (also called "communicability"), etc., and the corresponding
distances as applied to clustering nodes in random graphs and several
well-known datasets. The model of generating random graphs involves edge
probabilities for the pairs of nodes that belong to the same class or different
predefined classes of nodes. It turns out that in most cases, logarithmic
measures (i.e., measures resulting after taking logarithm of the proximities)
perform better while distinguishing underlying classes than the "plain"
measures. A comparison in terms of reject curves of inter-class and intra-class
distances confirms this conclusion. A similar conclusion can be made for
several well-known datasets. A possible origin of this effect is that most
kernels have a multiplicative nature, while the nature of distances used in
cluster algorithms is an additive one (cf. the triangle inequality). The
logarithmic transformation is a tool to transform the first nature to the
second one. Moreover, some distances corresponding to the logarithmic measures
possess a meaningful cutpoint additivity property. In our experiments, the
leader is usually the logarithmic Communicability measure. However, we indicate
some more complicated cases in which other measures, typically, Communicability
and plain Walk, can be the winners.Comment: 11 pages, 5 tables, 9 figures. Accepted for publication in the
Proceedings of 6th International Conference on Network Analysis, May 26-28,
2016, Nizhny Novgorod, Russi
Semantic distillation: a method for clustering objects by their contextual specificity
Techniques for data-mining, latent semantic analysis, contextual search of
databases, etc. have long ago been developed by computer scientists working on
information retrieval (IR). Experimental scientists, from all disciplines,
having to analyse large collections of raw experimental data (astronomical,
physical, biological, etc.) have developed powerful methods for their
statistical analysis and for clustering, categorising, and classifying objects.
Finally, physicists have developed a theory of quantum measurement, unifying
the logical, algebraic, and probabilistic aspects of queries into a single
formalism. The purpose of this paper is twofold: first to show that when
formulated at an abstract level, problems from IR, from statistical data
analysis, and from physical measurement theories are very similar and hence can
profitably be cross-fertilised, and, secondly, to propose a novel method of
fuzzy hierarchical clustering, termed \textit{semantic distillation} --
strongly inspired from the theory of quantum measurement --, we developed to
analyse raw data coming from various types of experiments on DNA arrays. We
illustrate the method by analysing DNA arrays experiments and clustering the
genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence,
Springer-Verla
- …