131 research outputs found

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Contributions to the Content-Based Image Retrieval Using Pictorial Queris

    L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca.Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal.La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final

    Contributions to the content-based image retrieval using pictorial queries

    Trade mark similarity assessment support system

    Trade marks are valuable intangible intellectual property (IP) assets with potentially high reputational value that can be protected. Similarity between trade marks may potentially lead to infringement. That similarity is normally assessed based on the visual, conceptual and phonetic aspects of the trade marks in question. Hence, this thesis addresses this issue by proposing a trade mark similarity assessment support system that uses the three main aspects of trade mark similarity as a mechanism to avoid future infringement. A conceptual model of the proposed trade mark similarity assessment support system is first proposed and developed based on the similarity assessment criteria outlined in a trade mark manual. The proposed model is the first contribution of this study, and it consists of visual, conceptual, phonetic and inference engine modules. The second contribution of this work is an algorithm that compares trade marks based on their visual similarity. The algorithm performs a similarity assessment using content-based image retrieval (CBIR) technology and an integrated visual descriptor derived using the low-level image feature, i.e. the shape feature. The performance of the algorithm is then assessed using information retrieval based measures. The obtained result demonstrates better retrieval performance in comparison to the state of the art algorithm. The conceptual aspect of trade mark similarity is then examined and analysed using a proposed algorithm that employs semantic technology in the conceptual module. This contribution enables the computation of the conceptual similarity between trade marks, with the utilisation of an external knowledge source in the form of a lexical ontology, together with natural language processing and set similarity theory. The proposed algorithm is evaluated using both information VI retrieval and human collective opinion measures. The retrieval result produced by the proposed algorithm outperforms the traditional string similarity comparison algorithm in both measures. The phonetic module examines the phonetic similarity of trade marks using another proposed algorithm that utilises phoneme analysis. This algorithm employs phonological features, which are extracted based on human speech articulation. In addition, the algorithm also provides a mechanism to compare the phonetic aspect of trade marks with typographic characters. The proposed algorithm is the fourth contribution of this study. It is evaluated using an information retrieval based measure. The result shows better retrieval performance in comparison to the traditional string similarity algorithm. The final contribution of this study is a methodology to aggregate the overall similarity score between trade marks. It is motivated by the understanding that trade mark similarity should be assessed holistically; that is, the visual, conceptual and phonetic aspects should be considered together. The proposed method is developed in the inference engine module; it utilises fuzzy logic for the inference process. A set of fuzzy rules, which consists of several membership functions, is also derived in this study based on the trade mark manual and a collection of trade mark disputed cases is analysed. The method is then evaluated using both information retrieval and human collective opinion. The proposed method improves the retrieval accuracy and the experiment also proves that the aggregated similarity score correlates well with the score produced from human collective opinion. The evaluations performed in the course of this study employ the following datasets: the MPEG-7 shape dataset, the MPEG-7 trade marks dataset, a collection of 1400 trade marks from real trade mark dispute cases, and a collection of 378,943 company names

    Enhancing RGB-D SLAM Using Deep Learning

    Discovering a Domain Knowledge Representation for Image Grouping: Multimodal Data Modeling, Fusion, and Interactive Learning

    In visually-oriented specialized medical domains such as dermatology and radiology, physicians explore interesting image cases from medical image repositories for comparative case studies to aid clinical diagnoses, educate medical trainees, and support medical research. However, general image classification and retrieval approaches fail in grouping medical images from the physicians\u27 viewpoint. This is because fully-automated learning techniques cannot yet bridge the gap between image features and domain-specific content for the absence of expert knowledge. Understanding how experts get information from medical images is therefore an important research topic. As a prior study, we conducted data elicitation experiments, where physicians were instructed to inspect each medical image towards a diagnosis while describing image content to a student seated nearby. Experts\u27 eye movements and their verbal descriptions of the image content were recorded to capture various aspects of expert image understanding. This dissertation aims at an intuitive approach to extracting expert knowledge, which is to find patterns in expert data elicited from image-based diagnoses. These patterns are useful to understand both the characteristics of the medical images and the experts\u27 cognitive reasoning processes. The transformation from the viewed raw image features to interpretation as domain-specific concepts requires experts\u27 domain knowledge and cognitive reasoning. This dissertation also approximates this transformation using a matrix factorization-based framework, which helps project multiple expert-derived data modalities to high-level abstractions. To combine additional expert interventions with computational processing capabilities, an interactive machine learning paradigm is developed to treat experts as an integral part of the learning process. Specifically, experts refine medical image groups presented by the learned model locally, to incrementally re-learn the model globally. This paradigm avoids the onerous expert annotations for model training, while aligning the learned model with experts\u27 sense-making