11,291 research outputs found
Can Google Trends search queries contribute to risk diversification?
Portfolio diversification and active risk management are essential parts of
financial analysis which became even more crucial (and questioned) during and
after the years of the Global Financial Crisis. We propose a novel approach to
portfolio diversification using the information of searched items on Google
Trends. The diversification is based on an idea that popularity of a stock
measured by search queries is correlated with the stock riskiness. We penalize
the popular stocks by assigning them lower portfolio weights and we bring
forward the less popular, or peripheral, stocks to decrease the total riskiness
of the portfolio. Our results indicate that such strategy dominates both the
benchmark index and the uniformly weighted portfolio both in-sample and
out-of-sample.Comment: 11 pages, 3 figure
A Vertical PRF Architecture for Microblog Search
In microblog retrieval, query expansion can be essential to obtain good
search results due to the short size of queries and posts. Since information in
microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance
feedback (PRF) with an external corpus has a higher chance of retrieving more
relevant documents and improving ranking. In this paper, we focus on the
research question:how can we reduce the query expansion computational cost
while maintaining the same retrieval precision as standard PRF? Therefore, we
propose to accelerate the query expansion step of pseudo-relevance feedback.
The hypothesis is that using an expansion corpus organized into verticals for
expanding the query, will lead to a more efficient query expansion process and
improved retrieval effectiveness. Thus, the proposed query expansion method
uses a distributed search architecture and resource selection algorithms to
provide an efficient query expansion process. Experiments on the TREC Microblog
datasets show that the proposed approach can match or outperform standard PRF
in MAP and NDCG@30, with a computational cost that is three orders of magnitude
lower.Comment: To appear in ICTIR 201
Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR
The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light
term conation step and useful in case of few language-specific resources. For English, the corpusbased
stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR.
Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from
selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness
compared to using a fixed number of terms for different languages
Incremental View Maintenance For Collection Programming
In the context of incremental view maintenance (IVM), delta query derivation
is an essential technique for speeding up the processing of large, dynamic
datasets. The goal is to generate delta queries that, given a small change in
the input, can update the materialized view more efficiently than via
recomputation. In this work we propose the first solution for the efficient
incrementalization of positive nested relational calculus (NRC+) on bags (with
integer multiplicities). More precisely, we model the cost of NRC+ operators
and classify queries as efficiently incrementalizable if their delta has a
strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large
fragment of NRC+ that is efficiently incrementalizable and we provide a
semantics-preserving translation that takes any NRC+ query to a collection of
IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is
within the complexity class NC0 and we showcase how recursive IVM, a technique
that has provided significant speedups over traditional IVM in the case of flat
queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix
Echinobase: an expanding resource for echinoderm genomic information
Echinobase, a web accessible information system of diverse genomics and biological data for the echinoderm clade, grew out of SpBase, the first echinoderm genome project for sea urchin, Strongylocentrotus purpuratus. Sea urchins and their relatives are utilitarian research models in fields ranging from marine biology to developmental biology and gene regulatory systems. Echinobase is a user-friendly web interface that links an array of biological data that would otherwise have been tedious and frustrating for researchers to extract and organize. The system hosts a powerful gene search engine, genomics browser and other bioinformatics tools to investigate genomics and high throughput data. The Echinobase information system now serves genomic information for eight echinoderm species: S. purpuratus, Strongylocentrotus fransciscanus, Allocentrotus fragilis, Lytechinus variegatus, Patiria miniata, Parastichopus parvimensis and Ophiothrix spiculata, Eucidaris tribuloides. Herein lies a description of the web information system, genomics data types and content hosted by Echinobase.org. The goal of Echinobase is to connect genomic information to various experimental data and accelerate the research in field of molecular biology, developmental process, gene regulatory networks and more recently engineering biological systems0.
Database URL:http://www.echinobase.or
- âŠ