4,787 research outputs found
Neural Networks retrieving Boolean patterns in a sea of Gaussian ones
Restricted Boltzmann Machines are key tools in Machine Learning and are
described by the energy function of bipartite spin-glasses. From a statistical
mechanical perspective, they share the same Gibbs measure of Hopfield networks
for associative memory. In this equivalence, weights in the former play as
patterns in the latter. As Boltzmann machines usually require real weights to
be trained with gradient descent like methods, while Hopfield networks
typically store binary patterns to be able to retrieve, the investigation of a
mixed Hebbian network, equipped with both real (e.g., Gaussian) and discrete
(e.g., Boolean) patterns naturally arises. We prove that, in the challenging
regime of a high storage of real patterns, where retrieval is forbidden, an
extra load of Boolean patterns can still be retrieved, as long as the ratio
among the overall load and the network size does not exceed a critical
threshold, that turns out to be the same of the standard
Amit-Gutfreund-Sompolinsky theory. Assuming replica symmetry, we study the case
of a low load of Boolean patterns combining the stochastic stability and
Hamilton-Jacobi interpolating techniques. The result can be extended to the
high load by a non rigorous but standard replica computation argument.Comment: 16 pages, 1 figur
Sound ranking algorithms for XML search
Ranking algorithms for XML should reflect the actual combined content and structure constraints of queries, while at the same time producing equal rankings for queries that are semantically equal. Ranking algorithms that produce different rankings for queries that are semantically equal are easily detected by tests on large databases: We call such algorithms not sound. We report the behavior of different approaches to ranking content-and-structure queries on pairs of queries for which we expect equal ranking results from the query semantics. We show that most of these approaches are not sound. Of the remaining approaches, only 3 adhere to the W3C XQuery Full-Text standard
A probabilistic justification for using tf.idf term weighting in information retrieval
This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf.idf term weighting. The paper shows that the new probabilistic interpretation of tf.idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the TREC collection shows that the linguistically motivated weighting algorithm outperforms the popular BM25 weighting algorithm
Information Retrieval Models
Many applications that handle information on the internet would be completely\ud
inadequate without the support of information retrieval technology. How would\ud
we find information on the world wide web if there were no web search engines?\ud
How would we manage our email without spam filtering? Much of the development\ud
of information retrieval technology, such as web search engines and spam\ud
filters, requires a combination of experimentation and theory. Experimentation\ud
and rigorous empirical testing are needed to keep up with increasing volumes of\ud
web pages and emails. Furthermore, experimentation and constant adaptation\ud
of technology is needed in practice to counteract the effects of people that deliberately\ud
try to manipulate the technology, such as email spammers. However,\ud
if experimentation is not guided by theory, engineering becomes trial and error.\ud
New problems and challenges for information retrieval come up constantly.\ud
They cannot possibly be solved by trial and error alone. So, what is the theory\ud
of information retrieval?\ud
There is not one convincing answer to this question. There are many theories,\ud
here called formal models, and each model is helpful for the development of\ud
some information retrieval tools, but not so helpful for the development others.\ud
In order to understand information retrieval, it is essential to learn about these\ud
retrieval models. In this chapter, some of the most important retrieval models\ud
are gathered and explained in a tutorial style
The Relativistic Hopfield network: rigorous results
The relativistic Hopfield model constitutes a generalization of the standard
Hopfield model that is derived by the formal analogy between the
statistical-mechanic framework embedding neural networks and the Lagrangian
mechanics describing a fictitious single-particle motion in the space of the
tuneable parameters of the network itself. In this analogy the cost-function of
the Hopfield model plays as the standard kinetic-energy term and its related
Mattis overlap (naturally bounded by one) plays as the velocity. The
Hamiltonian of the relativisitc model, once Taylor-expanded, results in a
P-spin series with alternate signs: the attractive contributions enhance the
information-storage capabilities of the network, while the repulsive
contributions allow for an easier unlearning of spurious states, conferring
overall more robustness to the system as a whole. Here we do not deepen the
information processing skills of this generalized Hopfield network, rather we
focus on its statistical mechanical foundation. In particular, relying on
Guerra's interpolation techniques, we prove the existence of the infinite
volume limit for the model free-energy and we give its explicit expression in
terms of the Mattis overlaps. By extremizing the free energy over the latter we
get the generalized self-consistent equations for these overlaps, as well as a
picture of criticality that is further corroborated by a fluctuation analysis.
These findings are in full agreement with the available previous results.Comment: 11 pages, 1 figur
Regression and Learning to Rank Aggregation for User Engagement Evaluation
User engagement refers to the amount of interaction an instance (e.g., tweet,
news, and forum post) achieves. Ranking the items in social media websites
based on the amount of user participation in them, can be used in different
applications, such as recommender systems. In this paper, we consider a tweet
containing a rating for a movie as an instance and focus on ranking the
instances of each user based on their engagement, i.e., the total number of
retweets and favorites it will gain.
For this task, we define several features which can be extracted from the
meta-data of each tweet. The features are partitioned into three categories:
user-based, movie-based, and tweet-based. We show that in order to obtain good
results, features from all categories should be considered. We exploit
regression and learning to rank methods to rank the tweets and propose to
aggregate the results of regression and learning to rank methods to achieve
better performance. We have run our experiments on an extended version of
MovieTweeting dataset provided by ACM RecSys Challenge 2014. The results show
that learning to rank approach outperforms most of the regression models and
the combination can improve the performance significantly.Comment: In Proceedings of the 2014 ACM Recommender Systems Challenge,
RecSysChallenge '1
A Database Approach to Content-based XML retrieval
This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments
Data modelling for emergency response
Emergency response is one of the most demanding phases in disaster management. The fire brigade, paramedics, police and municipality are the organisations involved in the first response to the incident. They coordinate their work based on welldefined policies and procedures, but they also need the most complete and up-todate information about the incident, which would allow a reliable decision-making.\ud
There is a variety of systems answering the needs of different emergency responders, but they have many drawbacks: the systems are developed for a specific sector; it is difficult to exchange information between systems; the systems offer too much or little information, etc. Several systems have been developed to share information during emergencies but usually they maintain the nformation that is coming from field operations in an unstructured way.\ud
This report presents a data model for organisation of dynamic data (operational and situational data) for emergency response. The model is developed within the RGI-239 project ‘Geographical Data Infrastructure for Disaster Management’ (GDI4DM)
- …