144,441 research outputs found
The Effect of the Multi-Layer Text Summarization Model on the Efficiency and Relevancy of the Vector Space-based Information Retrieval
The massive upload of text on the internet creates a huge inverted index in
information retrieval systems, which hurts their efficiency. The purpose of
this research is to measure the effect of the Multi-Layer Similarity model of
the automatic text summarization on building an informative and condensed
invert index in the IR systems. To achieve this purpose, we summarized a
considerable number of documents using the Multi-Layer Similarity model, and we
built the inverted index from the automatic summaries that were generated from
this model. A series of experiments were held to test the performance in terms
of efficiency and relevancy. The experiments include comparisons with three
existing text summarization models; the Jaccard Coefficient Model, the Vector
Space Model, and the Latent Semantic Analysis model. The experiments examined
three groups of queries with manual and automatic relevancy assessment. The
positive effect of the Multi-Layer Similarity in the efficiency of the IR
system was clear without noticeable loss in the relevancy results. However, the
evaluation showed that the traditional statistical models without semantic
investigation failed to improve the information retrieval efficiency. Comparing
with the previous publications that addressed the use of summaries as a source
of the index, the relevancy assessment of our work was higher, and the
Multi-Layer Similarity retrieval constructed an inverted index that was 58%
smaller than the main corpus inverted index
HII: Histogram Inverted Index For Fast Images Retrieval
This work aims to improve the speed of search by creating an indexing structure in CBIR system. We utilised an inverted index structure that usually used in text retrieval with a modification. The modified inverted index is built based on histogram data that generated using Multi Texton Histogram (MTH) and Multi Texton Co-Occurrence Descriptor (MTCD) from 10,000 images of Corel dataset. When building the inverted index, we normalised value of each feature into a real number and considered pairs of feature and value that owned by a particular number of images. Based on our investigation, on MTCD histogram of 5,000 data test, we found that by considering histogram variable values which owned by maximum 12% of images, the number of comparison for each query can be reduced by 67.47% in a rate, the precision is 82.2%, and the rate of access to disk is 32.83%. Furthermore, we named our approach as Histogram Inverted Index (HII).
The Spectrum and Variability of Circular Polarization in Sagittarius A* from 1.4 to 15 GHz
We report here multi-epoch, multi-frequency observations of the circular
polarization in Sagittarius A*, the compact radio source in the Galactic
Center. Data taken from the VLA archive indicate that the fractional circular
polarization at 4.8 GHz was -0.31% with an rms scatter of 0.13% from 1981 to
1998, in spite of a factor of 2 change in the total intensity. The sign
remained negative over the entire time range, indicating a stable magnetic
field polarity. In the Summer of 1999 we obtained 13 epochs of VLA A-array
observations at 1.4, 4.8, 8.4 and 15 GHz. In May, September and October of 1999
we obtained 11 epochs of Australia Telescope Compact Array observations at 4.8
and 8.5 GHz. In all three of the data sets, we find no evidence for linear
polarization greater than 0.1% in spite of strong circular polarization
detections. Both VLA and ATCA data sets support three conclusions regarding the
fractional circular polarization: the average spectrum is inverted with a
spectral index ~0.5 +/- 0.2; the degree of variability is roughly constant on
timescales of days to years; and, the degree of variability increases with
frequency. We also observed that the largest increase in fractional circular
polarization was coincident with the brightest flare in total intensity.
Significant variability in the total intensity and fractional circular
polarization on a timescale of 1 hour was observed during this flare,
indicating an upper limit to the size of 70 AU at 15 GHz. The fractional
circular polarization at 15 GHz reached -1.1% and the spectral index is
strongly inverted during this flare. We conclude that the spectrum has two
components that match the high and low frequency total intensity components.
(abridged)Comment: Accepted for publication in ApJ, 40 pages, 18 figure
Packing and Padding: Coupled Multi-index for Accurate Image Retrieval
In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low
discriminative power, so false positive matches occur prevalently. Apart from
the information loss during quantization, another cause is that the SIFT
feature only describes the local gradient distribution. To address this
problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform
feature fusion at indexing level. Basically, complementary features are coupled
into a multi-dimensional inverted index. Each dimension of c-MI corresponds to
one kind of feature, and the retrieval process votes for images similar in both
SIFT and other feature spaces. Specifically, we exploit the fusion of local
color feature into c-MI. While the precision of visual match is greatly
enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation
of SIFT and color features significantly reduces the impact of false positive
matches.
Extensive experiments on several benchmark datasets demonstrate that c-MI
improves the retrieval accuracy significantly, while consuming only half of the
query time compared to the baseline. Importantly, we show that c-MI is well
complementary to many prior techniques. Assembling these methods, we have
obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench
datasets, respectively, which compare favorably with the state-of-the-arts.Comment: 8 pages, 7 figures, 6 tables. Accepted to CVPR 201
Ранжирование документов при полнотекстовом поиске с учетом расстояния с использованием индексов с многокомпонентными ключами
The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes. © 2021 Udmurt State University. All rights reserved
Recommended from our members
Parallel methods for the generation of partitioned inverted files
Purpose
– The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi‐gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId.
Design/methodology/approach
– We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments.
Findings
– The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method.
Practical implications
– The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration.
Originality/value
– The paper is of value to database administrators who manage large‐scale text collections, and who need to use parallel computing to implement their text retrieval services
- …