131 research outputs found
Recommended from our members
Fast embedding for image classification & retrieval and its application to the hostel industry
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking
an unseen image input and extracting its features representing the input image. Then,
for the classification task, this mathematically measured input is categorized according
to established criteria in the server and consequently shows the output as a result. On
the other hand, for the retrieval task, the extracted features of an unseen query image
are sent to the server to search for the most visually similar images to a given image
and retrieve these images as a result. Despite image features could be represented
by classical features, artificial intelligence-based features, Convolutional Neural
Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless,
the high dimensional CNN features have been a challenge in particular for applications
on mobile or Internet of Things devices. Therefore, in this thesis, several fast
embeddings are explored and proposed to overcome the constraints of low memory,
bandwidth, and power. Furthermore, the first hostel image database is created with
three datasets, hostel image dataset containing 13,908 interior and exterior images of
hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing
972 images and 2,380 images, respectively, of 20 London hostel buildings. The results
demonstrate that the proposed fast embeddings such as the application of GHM-Rand
operator, GHM-Fix operator, and binary feature vectors are able to outperform or give
competitive results to those state-of-the-art methods with a lot less computational
resource. Additionally, the findings from a ten-year literature review of CBIR study in
the tourism industry could picturize the relevant research activities in the past decade
which are not only beneficial to the hostel industry or tourism sector but also to the
computer science and engineering research communities for the potential real-life
applications of the existing and developing technologies in the field
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Contributions to the Content-Based Image Retrieval Using Pictorial Queris
L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca.Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal.La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final
Contributions to the content-based image retrieval using pictorial queries
Descripció del recurs: el 02 de novembre de 2010L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca. Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal. La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final
Trade mark similarity assessment support system
Trade marks are valuable intangible intellectual property (IP) assets with potentially
high reputational value that can be protected. Similarity between trade marks may
potentially lead to infringement. That similarity is normally assessed based on the
visual, conceptual and phonetic aspects of the trade marks in question. Hence, this
thesis addresses this issue by proposing a trade mark similarity assessment support
system that uses the three main aspects of trade mark similarity as a mechanism to
avoid future infringement.
A conceptual model of the proposed trade mark similarity assessment support
system is first proposed and developed based on the similarity assessment criteria
outlined in a trade mark manual. The proposed model is the first contribution of this
study, and it consists of visual, conceptual, phonetic and inference engine modules.
The second contribution of this work is an algorithm that compares trade
marks based on their visual similarity. The algorithm performs a similarity
assessment using content-based image retrieval (CBIR) technology and an
integrated visual descriptor derived using the low-level image feature, i.e. the shape
feature. The performance of the algorithm is then assessed using information
retrieval based measures. The obtained result demonstrates better retrieval
performance in comparison to the state of the art algorithm.
The conceptual aspect of trade mark similarity is then examined and analysed
using a proposed algorithm that employs semantic technology in the conceptual
module. This contribution enables the computation of the conceptual similarity
between trade marks, with the utilisation of an external knowledge source in the
form of a lexical ontology, together with natural language processing and set
similarity theory. The proposed algorithm is evaluated using both information
VI
retrieval and human collective opinion measures. The retrieval result produced by
the proposed algorithm outperforms the traditional string similarity comparison
algorithm in both measures.
The phonetic module examines the phonetic similarity of trade marks using
another proposed algorithm that utilises phoneme analysis. This algorithm employs
phonological features, which are extracted based on human speech articulation. In
addition, the algorithm also provides a mechanism to compare the phonetic aspect
of trade marks with typographic characters. The proposed algorithm is the fourth
contribution of this study. It is evaluated using an information retrieval based
measure. The result shows better retrieval performance in comparison to the
traditional string similarity algorithm.
The final contribution of this study is a methodology to aggregate the overall
similarity score between trade marks. It is motivated by the understanding that trade
mark similarity should be assessed holistically; that is, the visual, conceptual and
phonetic aspects should be considered together. The proposed method is
developed in the inference engine module; it utilises fuzzy logic for the inference
process. A set of fuzzy rules, which consists of several membership functions, is
also derived in this study based on the trade mark manual and a collection of trade
mark disputed cases is analysed. The method is then evaluated using both
information retrieval and human collective opinion. The proposed method improves
the retrieval accuracy and the experiment also proves that the aggregated similarity
score correlates well with the score produced from human collective opinion.
The evaluations performed in the course of this study employ the following
datasets: the MPEG-7 shape dataset, the MPEG-7 trade marks dataset, a collection
of 1400 trade marks from real trade mark dispute cases, and a collection of 378,943
company names
Discovering a Domain Knowledge Representation for Image Grouping: Multimodal Data Modeling, Fusion, and Interactive Learning
In visually-oriented specialized medical domains such as dermatology and radiology, physicians explore interesting image cases from medical image repositories for comparative case studies to aid clinical diagnoses, educate medical trainees, and support medical research. However, general image classification and retrieval approaches fail in grouping medical images from the physicians\u27 viewpoint. This is because fully-automated learning techniques cannot yet bridge the gap between image features and domain-specific content for the absence of expert knowledge. Understanding how experts get information from medical images is therefore an important research topic.
As a prior study, we conducted data elicitation experiments, where physicians were instructed to inspect each medical image towards a diagnosis while describing image content to a student seated nearby. Experts\u27 eye movements and their verbal descriptions of the image content were recorded to capture various aspects of expert image understanding. This dissertation aims at an intuitive approach to extracting expert knowledge, which is to find patterns in expert data elicited from image-based diagnoses. These patterns are useful to understand both the characteristics of the medical images and the experts\u27 cognitive reasoning processes.
The transformation from the viewed raw image features to interpretation as domain-specific concepts requires experts\u27 domain knowledge and cognitive reasoning. This dissertation also approximates this transformation using a matrix factorization-based framework, which helps project multiple expert-derived data modalities to high-level abstractions.
To combine additional expert interventions with computational processing capabilities, an interactive machine learning paradigm is developed to treat experts as an integral part of the learning process. Specifically, experts refine medical image groups presented by the learned model locally, to incrementally re-learn the model globally. This paradigm avoids the onerous expert annotations for model training, while aligning the learned model with experts\u27 sense-making
- …