56 research outputs found
On the Impact of Entity Linking in Microblog Real-Time Filtering
Microblogging is a model of content sharing in which the temporal locality of
posts with respect to important events, either of foreseeable or unforeseeable
nature, makes applica- tions of real-time filtering of great practical
interest. We propose the use of Entity Linking (EL) in order to improve the
retrieval effectiveness, by enriching the representation of microblog posts and
filtering queries. EL is the process of recognizing in an unstructured text the
mention of relevant entities described in a knowledge base. EL of short pieces
of text is a difficult task, but it is also a scenario in which the information
EL adds to the text can have a substantial impact on the retrieval process. We
implement a start-of-the-art filtering method, based on the best systems from
the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
extend it with a Wikipedia-based EL method. Results show that the use of EL
significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 -
17, 201
The Feasibility of Brute Force Scans for Real-Time Tweet Search
The real-time search problem requires making ingested doc-uments immediately searchable, which presents architectural challenges for systems built around inverted indexing. In this paper, we explore a radical proposition: What if we abandon document inversion and instead adopt an architec-ture based on brute force scans of document representations? In such a design, “indexing ” simply involves appending the parsed representation of an ingested document to an exist-ing buffer, which is simple and fast. Quite surprisingly, ex-periments with TREC Microblog test collections show that query evaluation with brute force scans is feasible and per-formance compares favorably to a traditional search archi-tecture based on an inverted index, especially if we take ad-vantage of vectorized SIMD instructions and multiple cores in modern processor architectures. We believe that such a novel design is worth further exploration by IR researchers and practitioners
Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols
<p>Abstract</p> <p>Background</p> <p>The evaluation of information retrieval techniques has traditionally relied on human judges to determine which documents are relevant to a query and which are not. This protocol is used in the Text Retrieval Evaluation Conference (TREC), organized annually for the past 15 years, to support the unbiased evaluation of novel information retrieval approaches. The TREC Genomics Track has recently been introduced to measure the performance of information retrieval for biomedical applications.</p> <p>Results</p> <p>We describe two protocols for evaluating biomedical information retrieval techniques without human relevance judgments. We call these protocols No Title Evaluation (NT Evaluation). The first protocol measures performance for focused searches, where only one relevant document exists for each query. The second protocol measures performance for queries expected to have potentially many relevant documents per query (high-recall searches). Both protocols take advantage of the clear separation of titles and abstracts found in Medline. We compare the performance obtained with these evaluation protocols to results obtained by reusing the relevance judgments produced in the 2004 and 2005 TREC Genomics Track and observe significant correlations between performance rankings generated by our approach and TREC. Spearman's correlation coefficients in the range of 0.79–0.92 are observed comparing bpref measured with NT Evaluation or with TREC evaluations. For comparison, coefficients in the range 0.86–0.94 can be observed when evaluating the same set of methods with data from two independent TREC Genomics Track evaluations. We discuss the advantages of NT Evaluation over the TRels and the data fusion evaluation protocols introduced recently.</p> <p>Conclusion</p> <p>Our results suggest that the NT Evaluation protocols described here could be used to optimize some search engine parameters before human evaluation. Further research is needed to determine if NT Evaluation or variants of these protocols can fully substitute for human evaluations.</p
Cervical lymph node metastasis in adenoid cystic carcinoma of the larynx: a collective international review
Adenoid cystic carcinoma (AdCC) of the head and neck is a well-recognized pathologic entity that rarely occurs in the larynx. Although the 5-year locoregional control rates are high, distant metastasis has a tendency to appear more than 5 years post treatment. Because AdCC of the larynx is uncommon, it is difficult to standardize a treatment protocol. One of the controversial points is the decision whether or not to perform an elective neck dissection on these patients. Because there is contradictory information about this issue, we have critically reviewed the literature from 1912 to 2015 on all reported cases of AdCC of the larynx in order to clarify this issue. During the most recent period of our review (1991-2015) with a more exact diagnosis of the tumor histology, 142 cases were observed of AdCC of the larynx, of which 91 patients had data pertaining to lymph node status. Eleven of the 91 patients (12.1%) had nodal metastasis and, based on this low proportion of patients, routine elective neck dissection is therefore not recommended
Humans optional? Automatic large-scale test collections for entity, passage, and entity-passage retrieval
Manually creating test collections is a time-, effort-, and cost-intensive process. This paper describes a fully automatic alternative for deriving large-scale test collections, where no human assessments are needed. The empirical experiments confirm that automatic test collection and manual assessments agree on the best performing systems. The collection includes relevance judgments for both text passages and knowledge base entities. Since test collections with relevance data for both entity and text passages are rare, this approach provides a cost-efficient way for training and evaluating ad hoc passage retrieval, entity retrieval, and entity-aware text retrieval methods
Problems with Kendall's Tau
This poster describes a potential problem with a relatively well used measure in Information Retrieval research: Kendall's Tau rank correlation coefficient. The coefficient is best known for its use in determining the similarity of test collections when ranking sets of retrieval runs. Threshold values for the coefficient have been defined and used in a number of published studies in information retrieval. However, this poster presents results showing that basing decisions on such thresholds is not as reliableas has been assumed
Problems with Kendall's Tau
This poster describes a potential problem with a relatively well used measure in Information Retrieval research: Kendall's Tau rank correlation coefficient. The coefficient is best known for its use in determining the similarity of test collections when ranking sets of retrieval runs. Threshold values for the coefficient have been defined and used in a number of published studies in information retrieval. However, this poster presents results showing that basing decisions on such thresholds is not as reliable as has been assumed
Bibliotòpics
This infographic report seeks to check what part of truth is hidden behind librarian stereotypes. The bun, the glasses, the antipathy, the introversion... Are Catalan librarians like this in the 21st century? Do they share any personal traits that have led them to the profession? To answer this questions, more than 500 qualified professionals from Catalonia (Spain) in working age were surveyed. The results are presented in an informal but rigorous way
- …