15 research outputs found

    A latent variable ranking model for content-based retrieval

    Get PDF
    34th European Conference on IR Research, ECIR 2012, Barcelona, Spain, April 1-5, 2012. ProceedingsSince their introduction, ranking SVM models [11] have become a powerful tool for training content-based retrieval systems. All we need for training a model are retrieval examples in the form of triplet constraints, i.e. examples specifying that relative to some query, a database item a should be ranked higher than database item b. These types of constraints could be obtained from feedback of users of the retrieval system. Most previous ranking models learn either a global combination of elementary similarity functions or a combination defined with respect to a single database item. Instead, we propose a “coarse to fine” ranking model where given a query we first compute a distribution over “coarse” classes and then use the linear combination that has been optimized for queries of that class. These coarse classes are hidden and need to be induced by the training algorithm. We propose a latent variable ranking model that induces both the latent classes and the weights of the linear combination for each class from ranking triplets. Our experiments over two large image datasets and a text retrieval dataset show the advantages of our model over learning a global combination as well as a combination for each test point (i.e. transductive setting). Furthermore, compared to the transductive approach our model has a clear computational advantages since it does not need to be retrained for each test query.Spanish Ministry of Science and Innovation (JCI-2009-04240)EU PASCAL2 Network of Excellence (FP7-ICT-216886

    The Information Needs of Mobile Searchers: A Framework

    Get PDF
    The growing use of Internet-connected mobile devices demands that we reconsider search user interface design in light of the context and information needs specific to mobile users. In this paper the authors present a framework of mobile information needs, juxtaposing search motives—casual, lookup, learn, and investigate—with search types—informational, geographic, personal information management, and transactional

    Podify : a podcast streaming platform with automatic logging of user behaviour for academic research

    Get PDF
    Podcasts are spoken documents that, in recent years, have gained widespread popularity. Despite the growing research interest in this domain, conducting user studies remains challenging due to the lack of datasets that include user behaviour. In particular, there is a need for a podcast streaming platform that reduces the overhead of conducting user studies. To address these issues, in this work, we present Podify. It is the first web-based platform for podcast streaming and consumption specifically designed for research. The platform highly resembles existing streaming systems to provide users with a high level of familiarity on both desktop and mobile. A catalogue of podcast episodes can be easily created via RSS feeds. The platform also offers Elasticsearch-based indexing and search that is highly customisable, allowing research and experimentation in podcast search. Users can manually curate playlists of podcast episodes for consumption. With mechanisms to collect explicit feedback from users (i.e., liking and disliking behaviour), Podify also automatically collects implicit feedback (i.e., all user interactions). Users' behaviour can be easily exported to a readable format for subsequent experimental analysis. A demonstration of the platform is available at https://youtu.be/k9Z5w_KKHr8, with the code and documentation available at https://github.com/NeuraSearch/Podify

    E-Learning Courses Evaluation on the Basis of Trainees' Feedback on Open Questions Text Analysis

    Get PDF
    Life-long learning is a necessity associated with the requirements of the fourth industrial revolution. Although distance online education played a major role in the evolution of the modern education system, this share grew dramatically because of the COVID-19 pandemic outbreak and the social distancing measures that were imposed. However, the quick and extensive adoption of online learning tools also highlighted the multidimensional weaknesses of online education and the needs that arise when considering such practices. To this end, the ease of collecting digital data, as well as the overall evolution of data analytics, enables researchers, and by extension educators, to systematically evaluate the pros and cons of such systems. For instance, advanced data mining methods can be used to find potential areas of concern or to confirm elements of excellence. In this work, we used text analysis methods on data that have emerged from participants' feedback in online lifelong learning programmes for professional development. We analysed 1890 Greek text-based answers of participants to open evaluation questions using standard text analysis processes. We finally produced 7-gram tokens from the words in the texts, from which we constructed meaningful sentences and characterized them as positive or negative. We introduced a new metric, called acceptance grade, to quantitatively evaluate them as far as their positive or negative content for the online courses is concerned. We finally based our evaluation on the top 10 sentences of each category (positive, negative). Validation of the results via two external experts and data triangulation showed an accuracy of 80%

    Retrieval Enhancements for Task-Based Web Search

    Get PDF
    The task-based view of web search implies that retrieval should take the user perspective into account. Going beyond merely retrieving the most relevant result set for the current query, the retrieval system should aim to surface results that are actually useful to the task that motivated the query. This dissertation explores how retrieval systems can better understand and support their users’ tasks from three main angles: First, we study and quantify search engine user behavior during complex writing tasks, and how task success and behavior are associated in such settings. Second, we investigate search engine queries formulated as questions, and explore patterns in a large query log that may help search engines to better support this increasingly prevalent interaction pattern. Third, we propose a novel approach to reranking the search result lists produced by web search engines, taking into account retrieval axioms that formally specify properties of a good ranking.Die Task-basierte Sicht auf Websuche impliziert, dass die Benutzerperspektive berĂŒcksichtigt werden sollte. Über das bloße Abrufen der relevantesten Ergebnismenge fĂŒr die aktuelle Anfrage hinaus, sollten Suchmaschinen Ergebnisse liefern, die tatsĂ€chlich fĂŒr die Aufgabe (Task) nĂŒtzlich sind, die diese Anfrage motiviert hat. Diese Dissertation untersucht, wie Retrieval-Systeme die Aufgaben ihrer Benutzer besser verstehen und unterstĂŒtzen können, und leistet ForschungsbeitrĂ€ge unter drei Hauptaspekten: Erstens untersuchen und quantifizieren wir das Verhalten von Suchmaschinenbenutzern wĂ€hrend komplexer Schreibaufgaben, und wie Aufgabenerfolg und Verhalten in solchen Situationen zusammenhĂ€ngen. Zweitens untersuchen wir Suchmaschinenanfragen, die als Fragen formuliert sind, und untersuchen ein Suchmaschinenlog mit fast einer Milliarde solcher Anfragen auf Muster, die Suchmaschinen dabei helfen können, diesen zunehmend verbreiteten Anfragentyp besser zu unterstĂŒtzen. Drittens schlagen wir einen neuen Ansatz vor, um die von Web-Suchmaschinen erstellten Suchergebnislisten neu zu sortieren, wobei Retrieval-Axiome berĂŒcksichtigt werden, die die Eigenschaften eines guten Rankings formal beschreiben

    Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

    Get PDF
    To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach shares several issues with the test collection approach. The aim of this work is to: (i) develop and improve some state-of-the-art work on the evaluation of IR effectiveness while saving resources, and (ii) propose a novel, more principled and engineered, overall approach to test collection based effectiveness evaluation. [...

    Relevance-based language models : new estimations and applications

    Get PDF
    [Abstratc] Relevance-Based Language Models introduced in the Language Modelling framework the concept of relevance, which is explicit in other retrieval models such as the Probabilistic models. Relevance Models have been mainly used for a specific task within Information Retrieval called Pseudo-Relevance Feedback, a kind of local query expansion technique where relevance is assumed over a top of documents from the initial retrieval and where those documents are used to select expansion terms for the original query and produce a, hopefully more effective, second retrieval. In this thesis we investigate some new estimations for Relevance Models for both Pseudo-Relevance Feedback and other tasks beyond retrieval, particularly, constrained text clustering and item recommendation in Recommender Systems. We study the benefits of our proposals for those tasks in comparison with existing estimations. This new modellings are able not only to improve the effectiveness of the existing estimations and methods but also to outperform their robustness, a critical factor when dealing with Pseudo-Relevance Feedback methods. These objectives are pursued by different means: promoting divergent terms in the estimation of the Relevance Models, presenting new cluster-based retrieval models, introducing new methods for automatically determine the size of the pseudo-relevant set on a query-basis, and originally producing new modellings under the Relevance-Based Language Modelling framework for the constrained text clustering and the item recommendation problems
    corecore