2,190 research outputs found

    Evaluation campaigns and TRECVid

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    N-Grams Assisted Long Web Search Query Optimization

    Get PDF
    Commercial search engines do not return optimal search results when the query is a long or multi-topic one [1]. Long queries are used extensively. While the creator of the long query would most likely use natural language to describe the query, it contains extra information. This information dilutes the results of a web search, and hence decreases the performance as well as quality of the results returned. Kumaran et al. [22] showed that shorter queries extracted from longer user generated queries are more effective for ad-hoc retrieval. Hence reducing these queries by removing extra terms, the quality of the search results can be improved. There are numerous approaches used to address this shortfall. Our approach evaluates various versions of the query, thus trying to find the optimal one. This variation is achieved by reducing the query length using a combination of n-grams assisted query selection as well as a random keyword combination generator. We look at existing approaches and try to improve upon them. We propose a hybrid model that tries to address the shortfalls of an existing technique by incorporating established methods along with new ideas. We use the existing models and plug in information with the help of n-grams as well as randomization to improve the overall performance while keeping any overhead calculations in check

    An effective Chinese indexing method based on partitioned signature files.

    Get PDF
    Wong Chi Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 107-114).Abstract also in Chinese.Abstract --- p.iiAcknowledgements --- p.viChapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction to Chinese IR --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Organization of this Thesis --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Indexing methods --- p.6Chapter 2.1.1 --- Full-text scanning --- p.7Chapter 2.1.2 --- Inverted files --- p.7Chapter 2.1.3 --- Signature files --- p.9Chapter 2.1.4 --- Clustering --- p.10Chapter 2.2 --- Information Retrieval Models --- p.10Chapter 2.2.1 --- Boolean model --- p.11Chapter 2.2.2 --- Vector space model --- p.11Chapter 2.2.3 --- Probabilistic model --- p.13Chapter 2.2.4 --- Logical model --- p.14Chapter 3 --- Investigation of Segmentation on the Vector Space Retrieval Model --- p.15Chapter 3.1 --- Segmentation of Chinese Texts --- p.16Chapter 3.1.1 --- Character-based segmentation --- p.16Chapter 3.1.2 --- Word-based segmentation --- p.18Chapter 3.1.3 --- N-Gram segmentation --- p.21Chapter 3.2 --- Performance Evaluation of Three Segmentation Approaches --- p.23Chapter 3.2.1 --- Experimental Setup --- p.23Chapter 3.2.2 --- Experimental Results --- p.24Chapter 3.2.3 --- Discussion --- p.29Chapter 4 --- Signature File Background --- p.32Chapter 4.1 --- Superimposed coding --- p.34Chapter 4.2 --- False drop probability --- p.36Chapter 5 --- Partitioned Signature File Based On Chinese Word Length --- p.39Chapter 5.1 --- Fixed Weight Block (FWB) Signature File --- p.41Chapter 5.2 --- Overview of PSFC --- p.45Chapter 5.3 --- Design Considerations --- p.50Chapter 6 --- New Hashing Techniques for Partitioned Signature Files --- p.59Chapter 6.1 --- Direct Division Method --- p.61Chapter 6.2 --- Random Number Assisted Division Method --- p.62Chapter 6.3 --- Frequency-based hashing method --- p.64Chapter 6.4 --- Chinese character-based hashing method --- p.68Chapter 7 --- Experiments and Results --- p.72Chapter 7.1 --- Performance evaluation of partitioned signature file based on Chi- nese word length --- p.74Chapter 7.1.1 --- Retrieval Performance --- p.75Chapter 7.1.2 --- Signature Reduction Ratio --- p.77Chapter 7.1.3 --- Storage Requirement --- p.79Chapter 7.1.4 --- Discussion --- p.81Chapter 7.2 --- Performance evaluation of different dynamic signature generation methods --- p.82Chapter 7.2.1 --- Collision --- p.84Chapter 7.2.2 --- Retrieval Performance --- p.86Chapter 7.2.3 --- Discussion --- p.89Chapter 8 --- Conclusions and Future Work --- p.91Chapter 8.1 --- Conclusions --- p.91Chapter 8.2 --- Future work --- p.95Chapter A --- Notations of Signature Files --- p.96Chapter B --- False Drop Probability --- p.98Chapter C --- Experimental Results --- p.103Bibliography --- p.10

    Real Time Web Search Framework for Performing Efficient Retrieval of Data

    Get PDF
    With the rapidly growing amount of information on the internet, real-time system is one of the key strategies to cope with the information overload and to help users in finding highly relevant information. Real-time events and domain-specific information are important knowledge base references on the Web that frequently accessed by millions of users. Real-time system is a vital to product and a technique must resolve the context of challenges to be more reliable, e.g. short data life-cycles, heterogeneous user interests, strict time constraints, and context-dependent article relevance. Since real-time data have only a short time to live, real-time models have to be continuously adapted, ensuring that real-time data are always up-to-date. The focal point of this manuscript is for designing a real-time web search approach that aggregates several web search algorithms at query time to tune search results for relevancy. We learn a context-aware delegation algorithm that allows choosing the best real-time algorithms for each query request. The evaluation showed that the proposed approach outperforms the traditional models, in which it allows us to adapt the specific properties of the considered real-time resources. In the experiments, we found that it is highly relevant for most recently searched queries, consistent in its performance, and resilient to the drawbacks faced by other algorithms

    The Effect of the Multi-Layer Text Summarization Model on the Efficiency and Relevancy of the Vector Space-based Information Retrieval

    Full text link
    The massive upload of text on the internet creates a huge inverted index in information retrieval systems, which hurts their efficiency. The purpose of this research is to measure the effect of the Multi-Layer Similarity model of the automatic text summarization on building an informative and condensed invert index in the IR systems. To achieve this purpose, we summarized a considerable number of documents using the Multi-Layer Similarity model, and we built the inverted index from the automatic summaries that were generated from this model. A series of experiments were held to test the performance in terms of efficiency and relevancy. The experiments include comparisons with three existing text summarization models; the Jaccard Coefficient Model, the Vector Space Model, and the Latent Semantic Analysis model. The experiments examined three groups of queries with manual and automatic relevancy assessment. The positive effect of the Multi-Layer Similarity in the efficiency of the IR system was clear without noticeable loss in the relevancy results. However, the evaluation showed that the traditional statistical models without semantic investigation failed to improve the information retrieval efficiency. Comparing with the previous publications that addressed the use of summaries as a source of the index, the relevancy assessment of our work was higher, and the Multi-Layer Similarity retrieval constructed an inverted index that was 58% smaller than the main corpus inverted index

    Knowledge-based document retrieval with application to TEXPROS

    Get PDF
    Document retrieval in an information system is most often accomplished through keyword search. The common technique behind keyword search is indexing. The major drawback of such a search technique is its lack of effectiveness and accuracy. It is very common in a typical keyword search over the Internet to identify hundreds or even thousands of records as the potentially desired records. However, often few of them are relevant to users\u27 interests. This dissertation presents knowledge-based document retrieval architecture with application to TEXPROS. The architecture is based on a dual document model that consists of a document type hierarchy and, a folder organization. Using the knowledge collected during document filing, the search space can be narrowed down significantly. Combining the classical text-based retrieval methods with the knowledge-based retrieval can improve tremendously both search efficiency and effectiveness. With the proposed predicate-based query language, users can more precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. To assist users formulate a query, a guided search is presented as part of an intelligent user interface. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users\u27 particular interests. A knowledge-based query processing and search engine is presented as the core component in this architecture. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query. Cache is introduced to speed up the process of query refinement. Theoretical proof and performance analysis are performed to prove the efficiency and effectiveness of this knowledge-based document retrieval approach
    • 

    corecore