Search CORE

144,441 research outputs found

The Effect of the Multi-Layer Text Summarization Model on the Efficiency and Relevancy of the Vector Space-based Information Retrieval

Author: Ababneh Ahmad Hussein
Lu Joan
Xu Qiang
Publication venue
Publication date: 01/03/2020
Field of study

The massive upload of text on the internet creates a huge inverted index in information retrieval systems, which hurts their efficiency. The purpose of this research is to measure the effect of the Multi-Layer Similarity model of the automatic text summarization on building an informative and condensed invert index in the IR systems. To achieve this purpose, we summarized a considerable number of documents using the Multi-Layer Similarity model, and we built the inverted index from the automatic summaries that were generated from this model. A series of experiments were held to test the performance in terms of efficiency and relevancy. The experiments include comparisons with three existing text summarization models; the Jaccard Coefficient Model, the Vector Space Model, and the Latent Semantic Analysis model. The experiments examined three groups of queries with manual and automatic relevancy assessment. The positive effect of the Multi-Layer Similarity in the efficiency of the IR system was clear without noticeable loss in the relevancy results. However, the evaluation showed that the traditional statistical models without semantic investigation failed to improve the information retrieval efficiency. Comparing with the previous publications that addressed the use of summaries as a source of the index, the relevancy assessment of our work was higher, and the Multi-Layer Similarity retrieval constructed an inverted index that was 58% smaller than the main corpus inverted index

arXiv.org e-Print Archive

Huddersfield Research Portal

HII: Histogram Inverted Index For Fast Images Retrieval

Author: Eko Minarno Agus
Munarko Yuda
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2018
Field of study

This work aims to improve the speed of search by creating an indexing structure in CBIR system. We utilised an inverted index structure that usually used in text retrieval with a modification. The modified inverted index is built based on histogram data that generated using Multi Texton Histogram (MTH) and Multi Texton Co-Occurrence Descriptor (MTCD) from 10,000 images of Corel dataset. When building the inverted index, we normalised value of each feature into a real number and considered pairs of feature and value that owned by a particular number of images. Based on our investigation, on MTCD histogram of 5,000 data test, we found that by considering histogram variable values which owned by maximum 12% of images, the number of comparison for each query can be reduced by 67.47% in a rate, the precision is 82.2%, and the rate of access to disk is 32.83%. Furthermore, we named our approach as Histogram Inverted Index (HII).

IAES journal

ZENODO

UMM Institutional Repository

Institute of Advanced Engineering and Science

The Spectrum and Variability of Circular Polarization in Sagittarius A* from 1.4 to 15 GHz

Author: Donald C. Backer
Duschl W. J.
Falcke H.
Falcke H.
Geoffrey C. Bower
Heino Falcke
Krichbaum T. P.
Robert J. Sault
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2002
Field of study

We report here multi-epoch, multi-frequency observations of the circular polarization in Sagittarius A*, the compact radio source in the Galactic Center. Data taken from the VLA archive indicate that the fractional circular polarization at 4.8 GHz was -0.31% with an rms scatter of 0.13% from 1981 to 1998, in spite of a factor of 2 change in the total intensity. The sign remained negative over the entire time range, indicating a stable magnetic field polarity. In the Summer of 1999 we obtained 13 epochs of VLA A-array observations at 1.4, 4.8, 8.4 and 15 GHz. In May, September and October of 1999 we obtained 11 epochs of Australia Telescope Compact Array observations at 4.8 and 8.5 GHz. In all three of the data sets, we find no evidence for linear polarization greater than 0.1% in spite of strong circular polarization detections. Both VLA and ATCA data sets support three conclusions regarding the fractional circular polarization: the average spectrum is inverted with a spectral index ~0.5 +/- 0.2; the degree of variability is roughly constant on timescales of days to years; and, the degree of variability increases with frequency. We also observed that the largest increase in fractional circular polarization was coincident with the brightest flare in total intensity. Significant variability in the total intensity and fractional circular polarization on a timescale of 1 hour was observed during this flare, indicating an upper limit to the size of 70 AU at 15 GHz. The fractional circular polarization at 15 GHz reached -1.1% and the spectral index is strongly inverted during this flare. We conclude that the spectrum has two components that match the high and low frequency total intensity components. (abridged)Comment: Accepted for publication in ApJ, 40 pages, 18 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval

Author: Liu Ziqiong
Tian Qi
Wang Shengjin
Zheng Liang
Publication venue
Publication date: 01/01/2014
Field of study

In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low discriminative power, so false positive matches occur prevalently. Apart from the information loss during quantization, another cause is that the SIFT feature only describes the local gradient distribution. To address this problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform feature fusion at indexing level. Basically, complementary features are coupled into a multi-dimensional inverted index. Each dimension of c-MI corresponds to one kind of feature, and the retrieval process votes for images similar in both SIFT and other feature spaces. Specifically, we exploit the fusion of local color feature into c-MI. While the precision of visual match is greatly enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation of SIFT and color features significantly reduces the impact of false positive matches. Extensive experiments on several benchmark datasets demonstrate that c-MI improves the retrieval accuracy significantly, while consuming only half of the query time compared to the baseline. Importantly, we show that c-MI is well complementary to many prior techniques. Assembling these methods, we have obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench datasets, respectively, which compare favorably with the state-of-the-arts.Comment: 8 pages, 7 figures, 6 tables. Accepted to CVPR 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

OPUS - University of Technology Sydney

Ранжирование документов при полнотекстовом поиске с учетом расстояния с использованием индексов с многокомпонентными ключами

Author: Veretennikov A. B.
Publication venue: 'Udmurt State University'
Publication date: 01/01/2021
Field of study

The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes. © 2021 Udmurt State University. All rights reserved

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Recommended from our members

Parallel methods for the generation of partitioned inverted files

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue: 'Emerald'
Publication date: 01/10/2005
Field of study

Purpose – The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi‐gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach – We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings – The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications – The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value – The paper is of value to database administrators who manage large‐scale text collections, and who need to use parallel computing to implement their text retrieval services

City Research Online

Crossref