1,442 research outputs found

    Modeling Documents as Mixtures of Persons for Expert Finding

    Get PDF
    In this paper we address the problem of searching for knowledgeable persons within the enterprise, known as the expert finding (or expert search) task. We present a probabilistic algorithm using the assumption that terms in documents are produced by people who are mentioned in them.We represent documents retrieved to a query as mixtures of candidate experts language models. Two methods of personal language models extraction are proposed, as well as the way of combining them with other evidences of expertise. Experiments conducted with the TREC Enterprise collection demonstrate the superiority of our approach in comparison with the best one among existing solutions

    Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding

    Get PDF
    Modern expert nding algorithms are developed under the assumption that all possible expertise evidence for a person is concentrated in a company that currently employs the person. The evidence that can be acquired outside of an enterprise is traditionally unnoticed. At the same time, the Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only

    Using the Global Web as an Expertise Evidence Source

    Get PDF
    This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track. The presented study demonstrates the predicting potential of the expertise evidence that can be found outside of the organization. We discovered that combining the ranking built solely on the Enterprise data with the Global Web based ranking may produce significant increases in performance. However, our main goal was to explore whether this result can be further improved by using various quality measures to distinguish among web result items. While, indeed, it was beneficial to use some of these measures, especially those measuring relevance of URL strings and titles, it stayed unclear whether they are decisively important

    The Mirror DBMS at TREC-8

    Get PDF
    The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking

    Niet lineaire trillingen van de Webb-vering

    Get PDF

    Entity Ranking on Graphs: Studies on Expert Finding

    Get PDF
    Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models

    The SIKS/BiGGrid Big Data Tutorial

    Get PDF
    The School for Information and Knowledge Systems SIKS and the Dutch e-science grid BiG Grid organized a new two-day tutorial on Big Data at the University of Twente on 30 November and 1 December 2011, just preceding the Dutch-Belgian Database Day. The tutorial is on top of some exciting new developments in large-scale data processing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, Microsoft, and Facebook. The course teaches how to process terabytes of data on large clusters, and discusses several core computer science topics adapted for big data, such as new file systems (Google File System and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig Latin), and new 'noSQL' databases (BigTable, Cassandra and Dynamo)

    Snowex 2017 Community Snow Depth Measurements: A Quality-Controlled, Georeferenced Product

    Get PDF
    Snow depth was one of the core ground measurements required to validate remotely-sensed data collected during SnowEx Year 1, which occurred in Colorado. The use of a single, common protocol was fundamental to produce a community reference dataset of high quality. Most of the nearly 100 Grand Mesa and Senator Beck Basin SnowEx ground crew participants contributed to this crucial dataset during 6-25 February 2017. Snow depths were measured along ~300 m transects, whose locations were determined according to a random-stratified approach using snowfall and tree-density gradients. Two-person teams used snowmobiles, skis, or snowshoes to travel to staked transect locations and to conduct measurements. Depths were measured with a 1-cm incremented probe every 3 meters along transects. In shallow areas of Grand Mesa, depth measurements were also collected with GPS snow-depth probes (a.k.a. MagnaProbes) at ~1-m intervals. During summer 2017, all reference stake positions were surveyed with <10 cm accuracy to improve overall snow depth location accuracy. During the campaign, 193 transects were measured over three weeks at Grand Mesa and 40 were collected over two weeks in Senator Beck Basin, representing more than 27,000 depth values. Each day of the campaign depth measurements were written in waterproof field books and photographed by National Snow and Ice Data Center (NSIDC) participants. The data were later transcribed and prepared for extensive quality assessment and control. Common issues such as protocol errors (e.g., survey in reverse direction), notebook image issues (e.g., halo in the center of digitized picture), and data-entry errors (sloppy writing and transcription errors) were identified and fixed on a point-by-point basis. In addition, we strove to produce a georeferenced product of fine quality, so we calculated and interpolated coordinates for every depth measurement based on surveyed stakes and the number of measurements made per transect. The product has been submitted to NSIDC in csv format. To educate data users, we present the study design and processing steps that have improved the quality and usability of this product. Also, we will address measurement and design uncertainties, which are different in open vs. forest areas
    corecore