2,814 research outputs found
Design Patterns for Fusion-Based Object Retrieval
We address the task of ranking objects (such as people, blogs, or verticals)
that, unlike documents, do not have direct term-based representations. To be
able to match them against keyword queries, evidence needs to be amassed from
documents that are associated with the given object. We present two design
patterns, i.e., general reusable retrieval strategies, which are able to
encompass most existing approaches from the past. One strategy combines
evidence on the term level (early fusion), while the other does it on the
document level (late fusion). We demonstrate the generality of these patterns
by applying them to three different object retrieval tasks: expert finding,
blog distillation, and vertical ranking.Comment: Proceedings of the 39th European conference on Advances in
Information Retrieval (ECIR '17), 201
MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality. In: Advances in Information Retrieval
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-30671-1 83The increasing availability of text information coded in many different languages poses new challenges to modern information retrieval and mining systems in order to discover and exchange knowledge at a larger world-wide scale. The 1st International Workshop on Modeling, Learning and Mining for Cross/Multilinguality (dubbed MultiLingMine 2016) provides a venue to discuss research advances in cross-/multilingual related topics, focusing on new multidisciplinary research questions that have not been deeply investigated so far (e.g., in CLEF and related events relevant to CLIR). This includes theoretical and experimental on-going works about novel representation models, learning algorithms, and knowledge-based methodologies for emerging trends and applications, such as, e.g., cross-view cross-/multilingual information retrieval and document mining, (knowledge-based) translation-independent cross-/multilingual corpora, applications in social network contexts, and more.Ienco, D.; Roche, M.; Romeo, S.; Rosso, P.; Tagarelli, A. (2016). MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality. In: Advances in Information Retrieval. En Advances in Information Retrieval. Springer Verlag (Germany). 869-873. doi:10.1007/978-3-319-30671-1_83S869873Bandyopadhyay, S., Poibeau, T., Saggion, H., Yangarber, R.: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization (MMIES). ACL (2008)Chiarcos, C., McCrae J.P., Montiel, E., Simov, K., Branco, A., Calzolari, N., Osenova, P., Slavcheva, M., Vertan, C.: Proceedings of the 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and NLP (LDL) (2014)McCrae, J.P., Vulcu, G.: CEUR Proceedings of the 4th Workshop on the Multilingual Semantic Web (MSW4), vol. 1532 (2015)Moens, M.-F., Vulié, I.: Multilingual probabilistic topic modeling and its applications in web mining and search. In: Proceedings of the 7th ACM WSDM Conference (2014)Steichen, B., Ferro, N., Lewis, D., Chi, E.E.: Proceedings of the International Workshop on Multilingual Web Access (MWA) (2015)The CLEF Initiative. http://www.clef-initiative.eu
Third International Workshop on Gamification for Information Retrieval (GamifIR'16)
Stronger engagement and greater participation is often crucial
to reach a goal or to solve an issue. Issues like the emerging
employee engagement crisis, insufficient knowledge sharing,
and chronic procrastination. In many cases we need and
search for tools to beat procrastination or to change people’s
habits. Gamification is the approach to learn from often fun,
creative and engaging games. In principle, it is about understanding
games and applying game design elements in a
non-gaming environments. This offers possibilities for wide
area improvements. For example more accurate work, better
retention rates and more cost effective solutions by relating
motivations for participating as more intrinsic than conventional
methods. In the context of Information Retrieval (IR)
it is not hard to imagine that many tasks could benefit from
gamification techniques. Besides several manual annotation
tasks of data sets for IR research, user participation is important
in order to gather implicit or even explicit feedback
to feed the algorithms. Gamification, however, comes with
its own challenges and its adoption in IR is still in its infancy.
Given the enormous response to the first and second
GamifIR workshops that were both co-located with ECIR,
and the broad range of topics discussed, we now organized
the third workshop at SIGIR 2016 to address a range of
emerging challenges and opportunities
A Vertical PRF Architecture for Microblog Search
In microblog retrieval, query expansion can be essential to obtain good
search results due to the short size of queries and posts. Since information in
microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance
feedback (PRF) with an external corpus has a higher chance of retrieving more
relevant documents and improving ranking. In this paper, we focus on the
research question:how can we reduce the query expansion computational cost
while maintaining the same retrieval precision as standard PRF? Therefore, we
propose to accelerate the query expansion step of pseudo-relevance feedback.
The hypothesis is that using an expansion corpus organized into verticals for
expanding the query, will lead to a more efficient query expansion process and
improved retrieval effectiveness. Thus, the proposed query expansion method
uses a distributed search architecture and resource selection algorithms to
provide an efficient query expansion process. Experiments on the TREC Microblog
datasets show that the proposed approach can match or outperform standard PRF
in MAP and NDCG@30, with a computational cost that is three orders of magnitude
lower.Comment: To appear in ICTIR 201
Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks
Online media outlets, in a bid to expand their reach and subsequently
increase revenue through ad monetisation, have begun adopting clickbait
techniques to lure readers to click on articles. The article fails to fulfill
the promise made by the headline. Traditional methods for clickbait detection
have relied heavily on feature engineering which, in turn, is dependent on the
dataset it is built for. The application of neural networks for this task has
only been explored partially. We propose a novel approach considering all
information found in a social media post. We train a bidirectional LSTM with an
attention mechanism to learn the extent to which a word contributes to the
post's clickbait score in a differential manner. We also employ a Siamese net
to capture the similarity between source and target information. Information
gleaned from images has not been considered in previous approaches. We learn
image embeddings from large amounts of data using Convolutional Neural Networks
to add another layer of complexity to our model. Finally, we concatenate the
outputs from the three separate components, serving it as input to a fully
connected layer. We conduct experiments over a test corpus of 19538 social
media posts, attaining an F1 score of 65.37% on the dataset bettering the
previous state-of-the-art, as well as other proposed approaches, feature
engineering or otherwise.Comment: Accepted at SIGIR 2018 as Short Pape
Application and evaluation of multi-dimensional diversity
Traditional information retrieval (IR) systems mostly focus on finding documents relevant to queries without considering other documents in the search results. This approach works quite well in general cases; however, this also means that the set of returned documents in a result list can be very similar to each other. This can be an undesired system property from a user's perspective. The creation of IR systems that support the search result diversification present many challenges, indeed current evaluation measures and methodologies are still unclear with regards to specific search domains and dimensions of diversity. In this paper, we highlight various issues in relation to image search diversification for the ImageClef 2009 collection and tasks. Furthermore, we discuss the problem of defining clusters/subtopics by mixing diversity dimensions regardless of which dimension is important in relation to information need or circumstances. We also introduce possible applications and evaluation metrics for diversity based retrieval
- …