121 research outputs found
Organization and Usage of Learning Objects within Personal Computers
Research report of the ProLearn Network of Excellence (IST 507310), Deliverable 7.6To promote the integration of Desktop related Knowledge Management and Technology Enhanced Learning this deliverable aims at increasing the awareness of Desktop research within the Professional Learning community and at familiarizing the e-Learning researchers with the state-of-the-art in the relevant areas of Personal Information Management (PIM), as well as with the currently on-going activities and some of the regular PIM publication venues
Approximate information filtering in structured peer-to-peer networks
Today';s content providers are naturally distributed and produce large amounts of information every day, making peer-to-peer data management a promising approach offering scalability, adaptivity to dynamics, and failure resilience. In such systems, subscribing with a continuous query is of equal importance as one-time querying since it allows the user to cope with the high rate of information production and avoid the cognitive overload of repeated searches. In the information filtering setting users specify continuous queries, thus subscribing to newly appearing documents satisfying the query conditions. Contrary to existing approaches providing exact information filtering functionality, this doctoral thesis introduces the concept of approximate information filtering, where users subscribe to only a few selected sources most likely to satisfy their information demand. This way, efficiency and scalability are enhanced by trading a small reduction in recall for lower message traffic. This thesis contains the following contributions: (i) the first architecture to support approximate information filtering in structured peer-to-peer networks, (ii) novel strategies to select the most appropriate publishers by taking into account correlations among keywords, (iii) a prototype implementation for approximate information retrieval and filtering, and (iv) a digital library use case to demonstrate the integration of retrieval and filtering in a unified system.Heutige Content-Anbieter sind verteilt und produzieren riesige Mengen an Daten jeden Tag. Daher wird die Datenhaltung in Peer-to-Peer Netzen zu einem vielversprechenden Ansatz, der Skalierbarkeit, Anpassbarkeit an Dynamik und Ausfallsicherheit bietet. FĂŒr solche Systeme besitzt das Abonnieren mit Daueranfragen die gleiche Wichtigkeit wie einmalige Anfragen, da dies dem Nutzer erlaubt, mit der hohen Datenrate umzugehen und gleichzeitig die Ăberlastung durch erneutes Suchen verhindert. Im Information Filtering Szenario legen Nutzer Daueranfragen fest und abonnieren dadurch neue Dokumente, die die Anfrage erfĂŒllen. Im Gegensatz zu vorhandenen AnsĂ€tzen fĂŒr exaktes Information Filtering fĂŒhrt diese Doktorarbeit das Konzept von approximativem Information Filtering ein. Ein Nutzer abonniert nur wenige ausgewĂ€hlte Quellen, die am ehesten die Anfrage erfĂŒllen werden. Effizienz und Skalierbarkeit werden verbessert, indem Recall gegen einen geringeren Nachrichtenverkehr eingetauscht wird. Diese Arbeit beinhaltet folgende BeitrĂ€ge: (i) die erste Architektur fĂŒr approximatives Information Filtering in strukturierten Peer-to-Peer Netzen, (ii) Strategien zur Wahl der besten Anbieter unter BerĂŒcksichtigung von SchlĂŒsselwörter-Korrelationen, (iii) ein Prototyp, der approximatives Information Retrieval und Filtering realisiert und (iv) ein Anwendungsfall fĂŒr Digitale Bibliotheken, der beide FunktionalitĂ€ten in einem vereinten System aufzeigt
Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments
To evaluate Information Retrieval (IR) effectiveness, a possible approach is
to use test collections, which are composed of a collection of documents, a set
of description of information needs (called topics), and a set of relevant
documents to each topic. Test collections are modelled in a competition
scenario: for example, in the well known TREC initiative, participants run
their own retrieval systems over a set of topics and they provide a ranked list
of retrieved documents; some of the retrieved documents (usually the first
ranked) constitute the so called pool, and their relevance is evaluated by
human assessors; the document list is then used to compute effectiveness
metrics and rank the participant systems. Private Web Search companies also run
their in-house evaluation exercises; although the details are mostly unknown,
and the aims are somehow different, the overall approach shares several issues
with the test collection approach.
The aim of this work is to: (i) develop and improve some state-of-the-art
work on the evaluation of IR effectiveness while saving resources, and (ii)
propose a novel, more principled and engineered, overall approach to test
collection based effectiveness evaluation.
[...
Expert Finding in Disparate Environments
Providing knowledge workers with access to experts and communities-of-practice is central to expertise sharing, and crucial to effective organizational performance, adaptation, and even survival. However, in complex work environments, it is difficult to know who knows what across heterogeneous groups, disparate locations, and asynchronous work. As such, where expert finding has traditionally been a manual operation there is increasing interest in policy and technical infrastructure that makes work visible and supports automated tools for locating expertise.
Expert finding, is a multidisciplinary problem that cross-cuts knowledge management, organizational analysis, and information retrieval. Recently, a number of expert finders have emerged; however, many tools are limited in that they are extensions of traditional information retrieval systems and exploit artifact information primarily. This thesis explores a new class of expert finders that use organizational context as a basis for assessing expertise and for conferring trust in the system. The hypothesis here is that expertise can be inferred through assessments of work behavior and work derivatives (e.g., artifacts).
The Expert Locator, developed within a live organizational environment, is a model-based prototype that exploits organizational work context. The system associates expertise ratings with expertâs signaling behavior and is extensible so that signaling behavior from multiple activity space contexts can be fused into aggregate retrieval scores. Post-retrieval analysis supports evidence review and personal network browsing, aiding users in both detection and selection. During operational evaluation, the prototype generated high-precision searches across a range of topics, and was sensitive to organizational role; ranking true experts (i.e., authorities) higher than brokers providing referrals. Precision increased with the number of activity spaces used in the model, but varied across queries. The highest performing queries are characterized by high specificity terms, and low organizational diffusion amongst retrieved experts; essentially, the highest rated experts are situated within organizational niches
Linking archival data to location A case study at the UK National Archives
Purpose
The National Archives (TNA) is the UK Government's official archive. It stores and maintains records spanning over a 1,000 years in both physical and digital form. Much of the information held by TNA includes references to place and frequently user queries to TNA's online catalogue involve searches for location. The purpose of this paper is to illustrate how TNA have extracted the geographic references in their historic data to improve access to the archives.
Design/methodology/approach
To be able to quickly enhance the existing archival data with geographic information, existing technologies from Natural Language Processing (NLP) and Geographical Information Retrieval (GIR) have been utilised and adapted to historical archives.
Findings
Enhancing the archival records with geographic information has enabled TNA to quickly develop a number of case studies highlighting how geographic information can improve access to largeâscale archival collections. The use of existing methods from the GIR domain and technologies, such as OpenLayers, enabled one to quickly implement this process in a way that is easily transferable to other institutions.
Practical implications
The methods and technologies described in this paper can be adapted, by other archives, to similarly enhance access to their historic data. Also the dataâsharing methods described can be used to enable the integration of knowledge held at different archival institutions.
Originality/value
Place is one of the core dimensions for TNA's archival data. Many of the records which are held make reference to place data (wills, legislation, court cases), and approximately one fifth of users' searches involve place names. However, there are still a number of open questions regarding the adaptation of existing GIR methods to the history domain. This paper presents an overview over available GIR methods and the challenges in applying them to historical data
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading
- âŠ