809 research outputs found

    Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces

    No full text
    Semantic spaces encode similarity relationships between objects as a function of position in a mathematical space. This paper discusses three different formulations for building semantic spaces which allow the automatic-annotation and semantic retrieval of images. The models discussed in this paper require that the image content be described in the form of a series of visual-terms, rather than as a continuous feature-vector. The paper also discusses how these term-based models compare to the latest state-of-the-art continuous feature models for auto-annotation and retrieval

    Seeing the Intangible: Surveying Automatic High-Level Visual Understanding from Still Images

    Full text link
    The field of Computer Vision (CV) was born with the single grand goal of complete image understanding: providing a complete semantic interpretation of an input image. What exactly this goal entails is not immediately straightforward, but theoretical hierarchies of visual understanding point towards a top level of full semantics, within which sits the most complex and subjective information humans can detect from visual data. In particular, non-concrete concepts including emotions, social values and ideologies seem to be protagonists of this "high-level" visual semantic understanding. While such "abstract concepts" are critical tools for image management and retrieval, their automatic recognition is still a challenge, exactly because they rest at the top of the "semantic pyramid": the well-known semantic gap problem is worsened given their lack of unique perceptual referents, and their reliance on more unspecific features than concrete concepts. Given that there seems to be very scarce explicit work within CV on the task of abstract social concept (ASC) detection, and that many recent works seem to discuss similar non-concrete entities by using different terminology, in this survey we provide a systematic review of CV work that explicitly or implicitly approaches the problem of abstract (specifically social) concept detection from still images. Specifically, this survey performs and provides: (1) A study and clustering of high level visual understanding semantic elements from a multidisciplinary perspective (computer science, visual studies, and cognitive perspectives); (2) A study and clustering of high level visual understanding computer vision tasks dealing with the identified semantic elements, so as to identify current CV work that implicitly deals with AC detection

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Robust methods for Chinese spoken document retrieval.

    Get PDF
    Hui Pui Yu.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 158-169).Abstracts in English and Chinese.Abstract --- p.2Acknowledgements --- p.6Chapter 1 --- Introduction --- p.23Chapter 1.1 --- Spoken Document Retrieval --- p.24Chapter 1.2 --- The Chinese Language and Chinese Spoken Documents --- p.28Chapter 1.3 --- Motivation --- p.33Chapter 1.3.1 --- Assisting the User in Query Formation --- p.34Chapter 1.4 --- Goals --- p.34Chapter 1.5 --- Thesis Organization --- p.35Chapter 2 --- Multimedia Repository --- p.37Chapter 2.1 --- The Cantonese Corpus --- p.37Chapter 2.1.1 --- The RealMedia®ёØCollection --- p.39Chapter 2.1.2 --- The MPEG-1 Collection --- p.40Chapter 2.2 --- The Multimedia Markup Language --- p.42Chapter 2.3 --- Chapter Summary --- p.44Chapter 3 --- Monolingual Retrieval Task --- p.45Chapter 3.1 --- Properties of Cantonese Video Archive --- p.45Chapter 3.2 --- Automatic Speech Transcription --- p.46Chapter 3.2.1 --- Transcription of Cantonese Spoken Documents --- p.47Chapter 3.2.2 --- Indexing Units --- p.48Chapter 3.3 --- Known-Item Retrieval Task --- p.49Chapter 3.3.1 --- Evaluation ÂŽŰ€ Average Inverse Rank --- p.50Chapter 3.4 --- Retrieval Model --- p.51Chapter 3.5 --- Experimental Results --- p.52Chapter 3.6 --- Chapter Summary --- p.53Chapter 4 --- The Use of Audio and Video Information for Monolingual Spoken Document Retrieval --- p.55Chapter 4.1 --- Video-based Segmentation --- p.56Chapter 4.1.1 --- Metric Computation --- p.57Chapter 4.1.2 --- Shot Boundary Detection --- p.58Chapter 4.1.3 --- Shot Transition Detection --- p.67Chapter 4.2 --- Audio-based Segmentation --- p.69Chapter 4.2.1 --- Gaussian Mixture Models --- p.69Chapter 4.2.2 --- Transition Detection --- p.70Chapter 4.3 --- Performance Evaluation --- p.72Chapter 4.3.1 --- Automatic Story Segmentation --- p.72Chapter 4.3.2 --- Video-based Segmentation Algorithm --- p.73Chapter 4.3.3 --- Audio-based Segmentation Algorithm --- p.74Chapter 4.4 --- Fusion of Video- and Audio-based Segmentation --- p.75Chapter 4.5 --- Retrieval Performance --- p.76Chapter 4.6 --- Chapter Summary --- p.78Chapter 5 --- Document Expansion for Monolingual Spoken Document Retrieval --- p.79Chapter 5.1 --- Document Expansion using Selected Field Speech Segments --- p.81Chapter 5.1.1 --- Annotations from MmML --- p.81Chapter 5.1.2 --- Selection of Cantonese Field Speech --- p.83Chapter 5.1.3 --- Re-weighting Different Retrieval Units --- p.84Chapter 5.1.4 --- Retrieval Performance with Document Expansion using Selected Field Speech --- p.84Chapter 5.2 --- Document Expansion using N-best Recognition Hypotheses --- p.87Chapter 5.2.1 --- Re-weighting Different Retrieval Units --- p.90Chapter 5.2.2 --- Retrieval Performance with Document Expansion using TV-best Recognition Hypotheses --- p.90Chapter 5.3 --- Document Expansion using Selected Field Speech and N-best Recognition Hypotheses --- p.92Chapter 5.3.1 --- Re-weighting Different Retrieval Units --- p.92Chapter 5.3.2 --- Retrieval Performance with Different Indexed Units --- p.93Chapter 5.4 --- Chapter Summary --- p.94Chapter 6 --- Query Expansion for Cross-language Spoken Document Retrieval --- p.97Chapter 6.1 --- The TDT-2 Corpus --- p.99Chapter 6.1.1 --- English Textual Queries --- p.100Chapter 6.1.2 --- Mandarin Spoken Documents --- p.101Chapter 6.2 --- Query Processing --- p.101Chapter 6.2.1 --- Query Weighting --- p.101Chapter 6.2.2 --- Bigram Formation --- p.102Chapter 6.3 --- Cross-language Retrieval Task --- p.103Chapter 6.3.1 --- Indexing Units --- p.104Chapter 6.3.2 --- Retrieval Model --- p.104Chapter 6.3.3 --- Performance Measure --- p.105Chapter 6.4 --- Relevance Feedback --- p.106Chapter 6.4.1 --- Pseudo-Relevance Feedback --- p.107Chapter 6.5 --- Retrieval Performance --- p.107Chapter 6.6 --- Chapter Summary --- p.109Chapter 7 --- Conclusions and Future Work --- p.111Chapter 7.1 --- Future Work --- p.114Chapter A --- XML Schema for Multimedia Markup Language --- p.117Chapter B --- Example of Multimedia Markup Language --- p.128Chapter C --- Significance Tests --- p.135Chapter C.1 --- Selection of Cantonese Field Speech Segments --- p.135Chapter C.2 --- Fusion of Video- and Audio-based Segmentation --- p.137Chapter C.3 --- Document Expansion with Reporter Speech --- p.137Chapter C.4 --- Document Expansion with N-best Recognition Hypotheses --- p.140Chapter C.5 --- Document Expansion with Reporter Speech and N-best Recognition Hypotheses --- p.140Chapter C.6 --- Query Expansion with Pseudo Relevance Feedback --- p.142Chapter D --- Topic Descriptions of TDT-2 Corpus --- p.145Chapter E --- Speech Recognition Output from Dragon in CLSDR Task --- p.148Chapter F --- Parameters Estimation --- p.152Chapter F.1 --- "Estimating the Number of Relevant Documents, Nr" --- p.152Chapter F.2 --- "Estimating the Number of Terms Added from Relevant Docu- ments, Nrt , to Original Query" --- p.153Chapter F.3 --- "Estimating the Number of Non-relevant Documents, Nn , from the Bottom-scoring Retrieval List" --- p.153Chapter F.4 --- "Estimating the Number of Terms, Selected from Non-relevant Documents (Nnt), to be Removed from Original Query" --- p.154Chapter G --- Abbreviations --- p.155Bibliography --- p.15

    An investigation into weighted data fusion for content-based multimedia information retrieval

    Get PDF
    Content Based Multimedia Information Retrieval (CBMIR) is characterised by the combination of noisy sources of information which, in unison, are able to achieve strong performance. In this thesis we focus on the combination of ranked results from the independent retrieval experts which comprise a CBMIR system through linearly weighted data fusion. The independent retrieval experts are low-level multimedia features, each of which contains an indexing function and ranking algorithm. This thesis is comprised of two halves. In the ïŹrst half, we perform a rigorous empirical investigation into the factors which impact upon performance in linearly weighted data fusion. In the second half, we leverage these ïŹnding to create a new class of weight generation algorithms for data fusion which are capable of determining weights at query-time, such that the weights are topic dependent

    Self-organizing distributed digital library supporting audio-video

    Get PDF
    The StreamOnTheFly network combines peer-to-peer networking and open-archive principles for community radio channels and TV stations in Europe. StreamOnTheFly demonstrates new methods of archive management and personalization technologies for both audio and video. It also provides a collaboration platform for community purposes that suits the flexible activity patterns of these kinds of broadcaster communities

    Semantic Knowledge Graphs for the News: A Review

    Get PDF
    ICT platforms for news production, distribution, and consumption must exploit the ever-growing availability of digital data. These data originate from different sources and in different formats; they arrive at different velocities and in different volumes. Semantic knowledge graphs (KGs) is an established technique for integrating such heterogeneous information. It is therefore well-aligned with the needs of news producers and distributors, and it is likely to become increasingly important for the news industry. This article reviews the research on using semantic knowledge graphs for production, distribution, and consumption of news. The purpose is to present an overview of the field; to investigate what it means; and to suggest opportunities and needs for further research and development.publishedVersio
    • 

    corecore