7,770,075 research outputs found
Hybrid Search: Effectively Combining Keywords and Semantic Searches
This paper describes hybrid search, a search method supporting both document and knowledge retrieval via the flexible combination of ontologybased search and keyword-based matching. Hybrid search smoothly copes with
lack of semantic coverage of document content, which is one of the main limitations of current semantic search methods. In this paper we define hybrid search formally, discuss its compatibility with the current semantic trends and present a reference implementation: K-Search. We then show how the method outperforms both keyword-based search and pure semantic search in terms of precision and recall in a set of experiments performed on a collection of about 18.000 technical documents. Experiments carried out with professional users show that users understand the paradigm and consider it very powerful and reliable. K-Search has been ported to two applications released at Rolls-Royce
plc for searching technical documentation about jet engines
Describing Papers and Reviewers' Competences by Taxonomy of Keywords
This article focuses on the importance of the precise calculation of
similarity factors between papers and reviewers for performing a fair and
accurate automatic assignment of reviewers to papers. It suggests that papers
and reviewers' competences should be described by taxonomy of keywords so that
the implied hierarchical structure allows similarity measures to take into
account not only the number of exactly matching keywords, but in case of
non-matching ones to calculate how semantically close they are. The paper also
suggests a similarity measure derived from the well-known and widely-used
Dice's coefficient, but adapted in a way it could be also applied between sets
whose elements are semantically related to each other (as concepts in taxonomy
are). It allows a non-zero similarity factor to be accurately calculated
between a paper and a reviewer even if they do not share any keyword in common
Beyond Keywords
The potential of social media to give insight into the dynamic evolution of public conversations, and into their reactive and constitutive role in political activities, has to date been underdeveloped. While topic modeling can give static insight into the structure of a conversation, and keyword volume tracking can show how engagement with a specific idea varies over time, there is need for a method of analysis able to understand how conversations about societal values evolve and react to events in the world by incorporating new ideas and relating them to existing themes. In this article, we propose a method for analyzing social media messages that formalizes the structure of public conversations and allows the sociologist to study the evolution of public discourse in a rigorous, replicable, and data-driven fashion. This approach may be useful to those studying the social construction of meaning, the origins of factionalism and internecine conflict, or boundary-setting and group-identification exercises and has potential implications. Keywords: social media, framing, public conversation, analysis tools, visualizatio
Comparing the hierarchy of keywords in on-line news portals
The tagging of on-line content with informative keywords is a widespread
phenomenon from scientific article repositories through blogs to on-line news
portals. In most of the cases, the tags on a given item are free words chosen
by the authors independently. Therefore, relations among keywords in a
collection of news items is unknown. However, in most cases the topics and
concepts described by these keywords are forming a latent hierarchy, with the
more general topics and categories at the top, and more specialised ones at the
bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction
method to sets of keywords obtained from four different on-line news portals.
The resulting hierarchies show substantial differences not just in the topics
rendered as important (being at the top of the hierarchy) or of less interest
(categorised low in the hierarchy), but also in the underlying network
structure. This reveals discrepancies between the plausible keyword association
frameworks in the studied news portals
TK: The Twitter Top-K Keywords Benchmark
Information retrieval from textual data focuses on the construction of
vocabularies that contain weighted term tuples. Such vocabularies can then be
exploited by various text analysis algorithms to extract new knowledge, e.g.,
top-k keywords, top-k documents, etc. Top-k keywords are casually used for
various purposes, are often computed on-the-fly, and thus must be efficiently
computed. To compare competing weighting schemes and database implementations,
benchmarking is customary. To the best of our knowledge, no benchmark currently
addresses these problems. Hence, in this paper, we present a top-k keywords
benchmark, TK, which features a real tweet dataset and queries with
various complexities and selectivities. TK helps evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate TK's relevance and genericity, we successfully performed
tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on
different relational (Oracle, PostgreSQL) and document-oriented (MongoDB)
database implementations, on the other hand
Predicting financial markets with Google Trends and not so random keywords
We check the claims that data from Google Trends contain enough data to
predict future financial index returns. We first discuss the many subtle (and
less subtle) biases that may affect the backtest of a trading strategy,
particularly when based on such data. Expectedly, the choice of keywords is
crucial: by using an industry-grade backtesting system, we verify that random
finance-related keywords do not to contain more exploitable predictive
information than random keywords related to illnesses, classic cars and arcade
games. We however show that other keywords applied on suitable assets yield
robustly profitable strategies, thereby confirming the intuition of Preis et
al. (2013)Comment: 8 pages, 4 figures. First names and last names swappe
- …
