Search CORE

86,299 research outputs found

A comparison of a novel neural spell checker and standard spell checking algorithms

Author: Cherkassky
Hodge
Jim Austin
Kukich
Turner
Ullman
Victoria J. Hodge
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/11/2002
Field of study

In this paper, we propose a simple and flexible spell checker using efficient associative matching in the AURA modular neural system. Our approach aims to provide a pre-processor for an information retrieval (IR) system allowing the user's query to be checked against a lexicon and any spelling errors corrected, to prevent wasted searching. IR searching is computationally intensive so much so that if we can prevent futile searches we can minimise computational cost. We evaluate our approach against several commonly used spell checking techniques for memory-use, retrieval speed and recall accuracy. The proposed methodology has low memory use, high speed for word presence checking, reasonable speed for spell checking and a high recall rate

Crossref

White Rose Research Online

AN OPTIMIZED FEATURE EXTRACTION TECHNIQUE FOR CONTENT BASED IMAGE RETRIEVAL

Author: MARY.I THUSNAVIS BELLA
SIVAN SAVITHA
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 26/08/2020
Field of study

Content-based image retrieval (CBIR) is an active research area with the development of multimedia technologies and has become a source of exact and fast retrieval. The aim of CBIR is to search and retrieve images from a large database and find out the best match for the given query. Accuracy and efficiency for high dimensional datasets with enormous number of samples is a challenging arena. In this paper, Content Based Image Retrieval using various features such as color, shape, texture is made and a comparison is made among them. The performance of the retrieval system is evaluated depending upon the features extracted from an image. The performance was evaluated using precision and recall rates. Haralick texture features were analyzed at 0 o, 45 o, 90 o, 180 o using gray level co-occurrence matrix. Color feature extraction was done using color moments. Structured features and multiple feature fusion are two main technologies to ensure the retrieval accuracy in the system. GIST is considered as one of the main structured features. It was experimentally observed that combination of these techniques yielded superior performance than individual features. The results for the most efficient combination of techniques have also been presented and optimized for each class of query

Interscience Research Network

Configurable indexing and ranking for XML information retrieval

Author: Qinghua Zou
Shaorong Liu
Wesley W. Chu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Indexing and ranking are two key factors for efficient and effective XML information retrieval. Inappropriate indexing may result in false negatives and false positives, and improper ranking may lead to low precisions. In this paper, we propose a configurable XML information retrieval system, in which users can configure appropriate index types for XML tags and text contents. Based on users ’ index configurations, the system transforms XML structures into a compact tree representation, Ctree, and indexes XML text contents. To support XML ranking, we propose the concepts of “weighted term frequency ” and “inverted element frequency, ” where the weight of a term depends on its frequency and location within an XML element as well as its popularity among similar elements in an XML dataset. We evaluate the effectiveness of our system through extensive experiments on the INEX 03 dataset and 30 content and structure (CAS) topics. The experimental results reveal that our system has significantly high precision at low recall regions and achieves the highest average precision (0.3309) as compared with 38 official INEX 03 submissions using the strict evaluation metric

CiteSeerX

Crossref

Semantic-based medical records retrieval via medical-context aware query expansion and ranking.

Author: Madzin Hizmawati
Mohd Sharef Nurfadhlina
Publication venue: Asian Research Publication Network
Publication date: 31/12/2013
Field of study

Efficient retrieval of medical records involves contextual understanding of both the query and the records contents. This will enhance the searching effectiveness beyond merely keyword matching and is assisted by analyzing its semantics notion such as by the utilization of the MeSH thesaurus .The query is annotated and expanded by information from the deep medical contextual understanding.This is because typically medical records contain medical terminologies which may not be included in the user query but is important for accurate search hit. Besides, the terminologies have synonyms which should be utilized for richer and expanded query.The main contribution of the paper is the semantic-based retrieval technique by utilizing context-aware query expansion and search ranking method. Medical domain is chosen as a proof of concept and a medical record retrieval application was developed.The source of medical records are obtained from the ImageCLEF 2010 dataset which also houses a series of evaluation campaign such as photo annotation, robot vision and Wikipedia retrieval. This paper addresses the following problems: (i)semantic-based query expansion technique which increase the content awareness ability,(ii) MeSH- manipulated indexer which entails medical terminologies and their synonym,(iii) adoption of extended Boolean matching to measure similarity between query and documents, and (iv)ranking method which prioritizes matched expanded query size.The results were measured using precision, recall and mean average precision (MAP) score.Comparing against other approaches, our method has several achievements including; (i) more efficient access of MeSH thesaurus through the manipulated indexer compared to its original form; (ii) enrichment of query expansion using synonym term can improve mean average precision (MAP) value as opposed to standard query expansion; (iii) our comprehensive ranking method achieved high recall. According to MAP score we are in the top five run system amongst submitted run systems in ImageCLEF2010 medical task

Universiti Putra Malaysia Institutional Repository

Content-based indexing of low resolution documents

Author: Md Nor Danial
Publication venue
Publication date: 01/09/2016
Field of study

In any multimedia presentation, the trend for attendees taking pictures of slides that interest them during the presentation using capturing devices is gaining popularity. To enhance the image usefulness, the images captured could be linked to image or video database. The database can be used for the purpose of file archiving, teaching and learning, research and knowledge management, which concern image search. However, the above-mentioned devices include cameras or mobiles phones have low resolution resulted from poor lighting and noise. Content-Based Image Retrieval (CBIR) is considered among the most interesting and promising fields as far as image search is concerned. Image search is related with finding images that are similar for the known query image found in a given image database. This thesis concerns with the methods used for the purpose of identifying documents that are captured using image capturing devices. In addition, the thesis also concerns with a technique that can be used to retrieve images from an indexed image database. Both concerns above apply digital image processing technique. To build an indexed structure for fast and high quality content-based retrieval of an image, some existing representative signatures and the key indexes used have been revised. The retrieval performance is very much relying on how the indexing is done. The retrieval approaches that are currently in existence including making use of shape, colour and texture features. Putting into consideration these features relative to individual databases, the majority of retrievals approaches have poor results on low resolution documents, consuming a lot of time and in the some cases, for the given query image, irrelevant images are obtained. The proposed identification and indexing method in the thesis uses a Visual Signature (VS). VS consists of the captures slides textual layout’s graphical information, shape’s moment and spatial distribution of colour. This approach, which is signature-based are considered for fast and efficient matching to fulfil the needs of real-time applications. The approach also has the capability to overcome the problem low resolution document such as noisy image, the environment’s varying lighting conditions and complex backgrounds. We present hierarchy indexing techniques, whose foundation are tree and clustering. K-means clustering are used for visual features like colour since their spatial distribution give a good image’s global information. Tree indexing for extracted layout and shape features are structured hierarchically and Euclidean distance is used to get similarity image for CBIR. The assessment of the proposed indexing scheme is conducted based on recall and precision, a standard CBIR retrieval performance evaluation. We develop CBIR system and conduct various retrieval experiments with the fundamental aim of comparing the accuracy during image retrieval. A new algorithm that can be used with integrated visual signatures, especially in late fusion query was introduced. The algorithm has the capability of reducing any shortcoming associated with normalisation in initial fusion technique. Slides from conferences, lectures and meetings presentation are used for comparing the proposed technique’s performances with that of the existing approaches with the help of real data. This finding of the thesis presents exciting possibilities as the CBIR systems is able to produce high quality result even for a query, which uses low resolution documents. In the future, the utilization of multimodal signatures, relevance feedback and artificial intelligence technique are recommended to be used in CBIR system to further enhance the performance

UTHM Institutional Repository

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Author: Andoni A.
Beyer K.
Broder A. Z.
Brown P. F.
Fried D.
Le Q.
Mikolov T.
Mu Y.
Muja M.
Petrović S.
Riezler S.
Salton G.
Wang J.
Weber R.
Yang L.
Yao X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2016
Field of study

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

arXiv.org e-Print Archive

Crossref

Scipedia

Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization

Author: Cadena Cesar
Debraine Frédéric
Dymczyk Marcin
Sarlin Paul-Edouard
Siegwart Roland
Publication venue
Publication date: 01/01/2018
Field of study

Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations

arXiv.org e-Print Archive

Repository for Publications and Research Data

TopSig: Topology Preserving Document Signatures

Author: De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2011
Field of study

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive