8,334 research outputs found

    Pairwise similarity of TopSig document signatures

    Get PDF
    This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Measuring concept similarities in multimedia ontologies: analysis and evaluations

    Get PDF
    The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing

    Interactive retrieval of video using pre-computed shot-shot similarities

    Get PDF
    A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision

    Semantic image retrieval using relevance feedback and transaction logs

    Get PDF
    Due to the recent improvements in digital photography and storage capacity, storing large amounts of images has been made possible, and efficient means to retrieve images matching a user’s query are needed. Content-based Image Retrieval (CBIR) systems automatically extract image contents based on image features, i.e. color, texture, and shape. Relevance feedback methods are applied to CBIR to integrate users’ perceptions and reduce the gap between high-level image semantics and low-level image features. The precision of a CBIR system in retrieving semantically rich (complex) images is improved in this dissertation work by making advancements in three areas of a CBIR system: input, process, and output. The input of the system includes a mechanism that provides the user with required tools to build and modify her query through feedbacks. Users behavioral in CBIR environments are studied, and a new feedback methodology is presented to efficiently capture users’ image perceptions. The process element includes image learning and retrieval algorithms. A Long-term image retrieval algorithm (LTL), which learns image semantics from prior search results available in the system’s transaction history, is developed using Factor Analysis. Another algorithm, a short-term learner (STL) that captures user’s image perceptions based on image features and user’s feedbacks in the on-going transaction, is developed based on Linear Discriminant Analysis. Then, a mechanism is introduced to integrate these two algorithms to one retrieval procedure. Finally, a retrieval strategy that includes learning and searching phases is defined for arranging images in the output of the system. The developed relevance feedback methodology proved to reduce the effect of human subjectivity in providing feedbacks for complex images. Retrieval algorithms were applied to images with different degrees of complexity. LTL is efficient in extracting the semantics of complex images that have a history in the system. STL is suitable for query and images that can be effectively represented by their image features. Therefore, the performance of the system in retrieving images with visual and conceptual complexities was improved when both algorithms were applied simultaneously. Finally, the strategy of retrieval phases demonstrated promising results when the query complexity increases
    • …
    corecore