Search CORE

7 research outputs found

Incorporating window-based passage-level evidence in document retrieval

Author: A. Moffat
A. Singhal
Christopher S.G. Khoo
D. Harman
E. Mittendorf
Ee-Peng Lim
G. Salton
J.P. Callan
M. Fuller
M. Fuller
M. Fuller
M. Kaszkiel
M.A. Hearst
R. Wilkinson
Richard Xu-Rong
S.E. Robertson
W. Frakes
Wensi Xi
Publication venue: 'SAGE Publications'
Publication date: 01/01/2001
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Recommended from our members

Window based Enterprise Expert Search

Author: Lu W.
MacFarlane A.
Robertson S.
Zhao L.
Publication venue
Publication date: 01/01/2007
Field of study

This is the first year for the participation of the City University Centre of Interactive System Research (CISR) in the Expert Search Task. In this paper, we describe an expert search experiment based on window-based techniques, that is, we build profile for each expert by using information around the expert’s name and email address in the documents. We then use the traditional IR techniques to search and rank experts. Our experiment is done on Okapi and BM25 is used as the ranking model. Results show that parameter b does have an effect on the retrieval effectiveness and using a smaller value for b produces better results

City Research Online

Filtered-page ranking: uma abordagem para ranqueamento de documentos HTML previamente filtrados

Author: Costa José Henrique Calenzo
Publication venue
Publication date: 01/01/2016
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2016.Algoritmos de ranking de páginas Web podem ser criados usando técnicas baseadas em elementos estruturais da página Web, em segmentação da página ou na busca personalizada. Esta pesquisa aborda um método de ranking de documentos previamente filtrados, que segmenta a página Web em blocos de três categorias para delas eliminar conteúdo irrelevante. O método de ranking proposto, chamado Filtered-Page Ranking (FPR), consta de duas etapas principais: (i) segmentação da página web e eliminação de conteúdo irrelevante e (ii) ranking de páginas Web. O foco da extração de conteúdo irrelevante é eliminar conteúdos não relacionados à consulta do usuário, através do algoritmo proposto Query-Based Blocks Mining (QBM), para que o ranking considere somente conteúdo relevante. O foco da etapa de ranking é calcular quão relevante cada página Web é para determinada consulta, usando critérios considerados em estudos de recuperação da informação. Com a presente pesquisa pretende-se demonstrar que o QBM extrai eficientemente o conteúdo irrelevante e que os critérios utilizados para calcular quão próximo uma página Web é da consulta são relevantes, produzindo uma média de resultados de ranking de páginas Web de qualidade melhor que a do clássico modelo vetorial.Abstract : Web page ranking algorithms can be created using content-based, structure-based or user search-based techniques. This research addresses an user search-based approach applied over previously filtered documents ranking, which relies in a segmentation process to extract irrelevante content from documents before ranking. The process splits the document into three categories of blocks in order to fragment the document and eliminate irrelevante content. The ranking method, called Page Filtered Ranking, has two main steps: (i) irrelevante content extraction; and (ii) document ranking. The focus of the extraction step is to eliminate irrelevante content from the document, by means of the Query-Based Blocks Mining algorithm, creating a tree that is evaluated in the ranking process. During the ranking step, the focus is to calculate the relevance of each document for a given query, using criteria that give importance to specific parts of the document and to the highlighted features of some HTML elements. Our proposal is compared to two baselines: the classic vectorial model, and the CETR noise removal algorithm, and the results demonstrate that our irrelevante content removal algorithm improves the results and our relevance criteria are relevant to the process

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

Implementing Passage Retrieval in an Information Retrieval System

Author: Susi Petteri
Publication venue: Helsingin yliopisto
Publication date: 01/01/2007
Field of study

Information retrieval of concise and consistent text passages is called passage retrieval. Passages can be used in an information retrieval system to improve its user interface and performance. In this thesis passage retrieval is compared to other forms of information retrieval. Implementation of passage retrieval as a feature of an information retrieval system is discussed. Various existing passage retrieval methods, their implementation and their efficiency are compared. I evaluated two different implementations of passage retrieval: direct passage retrieval and combined passage retrieval. In comparison combined passage retrieval turned out to be more efficient.Tekstikatkelmahaku on tiedonhaun muoto, jossa käsiteltävä tietoyksikkö on lyhyt ja aiheeltaan yhtenäinen katkelma tekstiä. Tekstikatkelmia käyttämällä voidaan pyrkiä parantamaan esimerkiksi tiedonhakujärjestelmän käyttöliittymää ja tiedonhakujärjestelmän suorituskykyä. Tässä tutkielmassa vertaillaan tekstikatkelmahaun ominaisuuksia muihin tiedonhaun muotoihin. Lisäksi tutustutaan tarkemmin siihen, miten tekstikatkelmahaku voidaan toteuttaa osaksi tiedonhakujärjestelmää. Tutkielmassa käydään läpi erilaisia tekstikatkelmahaun menetelmiä, joita on käsitelty tiedonhaun kirjallisuudessa. Lisäksi käsitellään näiden menetelmien soveltamista käytännössä ja näiden käytännön toteutusten suorituskykyä. Tutkielmassa esitellään kaksi tekstikatkelmahaun toteutustapaa, suora katkelmahaku ja yhdistetty katkelmahaku. Näiden toteutustapojen suorituskykyä verrattiin kokeellisesti toisiinsa. Yhdistetty katkelmahaku osoittautui kokeen perusteella suorituskyvyltään paremmaksi toteutustavaksi

Helsingin yliopiston digitaalinen arkisto

Analysis and study on text representation to improve the accuracy of the Normalized Compression Distance

Author: Granados Ana
Publication venue
Publication date: 01/01/2012
Field of study

The huge amount of information stored in text form makes methods that deal with texts really interesting. This thesis focuses on dealing with texts using compression distances. More specifically, the thesis takes a small step towards understanding both the nature of texts and the nature of compression distances. Broadly speaking, the way in which this is done is exploring the effects that several distortion techniques have on one of the most successful distances in the family of compression distances, the Normalized Compression Distance -NCD-.Comment: PhD Thesis; 202 page

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CERN Document Server

Biblos-e Archivo

A comparison of statistical machine learning methods in heartbeat detection and classification

Author: A.L. Goldberger
G.J. McLachlan
H. Feichtinger
J.A. Freeman
P. Chazal de
R.A. Johnson
R.O. Duda
T. Ince
Y.H. Hu
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/2012
Field of study

In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

Crossref

Research Archive of Indian Institute of Technology Hyderabad

A Comparison of Statistical Machine Learning Methods in Heartbeat Detection and Classification

Author: A.L. Goldberger
G.J. McLachlan
H. Feichtinger
J.A. Freeman
P. Chazal de
R.A. Johnson
R.O. Duda
T. Ince
Y.H. Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref