6 research outputs found
Recommended from our members
Fast embedding for image classification & retrieval and its application to the hostel industry
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking
an unseen image input and extracting its features representing the input image. Then,
for the classification task, this mathematically measured input is categorized according
to established criteria in the server and consequently shows the output as a result. On
the other hand, for the retrieval task, the extracted features of an unseen query image
are sent to the server to search for the most visually similar images to a given image
and retrieve these images as a result. Despite image features could be represented
by classical features, artificial intelligence-based features, Convolutional Neural
Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless,
the high dimensional CNN features have been a challenge in particular for applications
on mobile or Internet of Things devices. Therefore, in this thesis, several fast
embeddings are explored and proposed to overcome the constraints of low memory,
bandwidth, and power. Furthermore, the first hostel image database is created with
three datasets, hostel image dataset containing 13,908 interior and exterior images of
hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing
972 images and 2,380 images, respectively, of 20 London hostel buildings. The results
demonstrate that the proposed fast embeddings such as the application of GHM-Rand
operator, GHM-Fix operator, and binary feature vectors are able to outperform or give
competitive results to those state-of-the-art methods with a lot less computational
resource. Additionally, the findings from a ten-year literature review of CBIR study in
the tourism industry could picturize the relevant research activities in the past decade
which are not only beneficial to the hostel industry or tourism sector but also to the
computer science and engineering research communities for the potential real-life
applications of the existing and developing technologies in the field
On Designing Tattoo Registration and Matching Approaches in the Visible and SWIR Bands
Face, iris and fingerprint based biometric systems are well explored areas of research. However, there are law enforcement and military applications where neither of the aforementioned modalities may be available to be exploited for human identification. In such applications, soft biometrics may be the only clue available that can be used for identification or verification purposes. Tattoo is an example of such a soft biometric trait. Unlike face-based biometric systems that used in both same-spectral and cross-spectral matching scenarios, tattoo-based human identification is still a not fully explored area of research. At this point in time there are no pre-processing, feature extraction and matching algorithms using tattoo images captured at multiple bands. This thesis is focused on exploring solutions on two main challenging problems. The first one is cross-spectral tattoo matching. The proposed algorithmic approach is using as an input raw Short-Wave Infrared (SWIR) band tattoo images and matches them successfully against their visible band counterparts. The SWIR tattoo images are captured at 1100 nm, 1200 nm, 1300 nm, 1400 nm and 1500 nm. After an empirical study where multiple photometric normalization techniques were used to pre-process the original multi-band tattoo images, only one was determined to significantly improve cross spectral tattoo matching performance. The second challenging problem was to develop a fully automatic visible-based tattoo image registration system based on SIFT descriptors and the RANSAC algorithm with a homography model. The proposed automated registration approach significantly improves the operational cost of a tattoo image identification system (using large scale tattoo image datasets), where the alignment of a pair of tattoo images by system operators needs to be performed manually. At the same time, tattoo matching accuracy is also improved (before vs. after automated alignment) by 45.87% for the NIST-Tatt-C database and 12.65% for the WVU-Tatt database
Contributions to the content-based image retrieval using pictorial queries
Descripció del recurs: el 02 de novembre de 2010L'accés massiu a les cà meres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevà ncia totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca. Les imatges són un cas particular de dades que requereixen tècniques especÃfiques de descripció i indexació. L'à rea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en caracterÃstiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en caracterÃstiques visuals representen una via d'expressió universal. La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en à rees de coneixement molt diverses. Aixà doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny grà fic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capÃtols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista prà ctic mitjançant una aplicació final
Contributions to the Content-Based Image Retrieval Using Pictorial Queris
L'accés massiu a les cà meres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevà ncia totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca.Les imatges són un cas particular de dades que requereixen tècniques especÃfiques de descripció i indexació. L'à rea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en caracterÃstiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en caracterÃstiques visuals representen una via d'expressió universal.La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en à rees de coneixement molt diverses. Aixà doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny grà fic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capÃtols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista prà ctic mitjançant una aplicació final
An affective computing and image retrieval approach to support diversified and emotion-aware reminiscence therapy sessions
A demência é uma das principais causas de dependência e incapacidade entre as pessoas idosas em todo o mundo. A terapia de reminiscência é uma terapia não farmacológica comummente utilizada nos cuidados com demência devido ao seu valor terapêutico para as pessoas com demência. Esta terapia é útil para criar uma comunicação envolvente entre pessoas com demência e o resto do mundo, utilizando as capacidades preservadas da memória a longo prazo, em vez de enfatizar as limitações existentes por forma a aliviar a experiência de fracasso e isolamento social. As soluções tecnológicas de assistência existentes melhoram a terapia de reminiscência ao proporcionar uma experiência mais envolvente para todos os participantes (pessoas com demência, familiares e clÃnicos), mas não estão livres de lacunas: a) os dados multimédia utilizados permanecem inalterados ao longo das sessões, e há uma falta de personalização para cada pessoa com demência; b) não têm em conta as emoções transmitidas pelos dados multimédia utilizados nem as reacções emocionais da pessoa com demência aos dados multimédia apresentados; c) a perspectiva dos cuidadores ainda não foi totalmente tida em consideração. Para superar estes desafios, seguimos uma abordagem de concepção centrada no utilizador através de inquéritos mundiais, entrevistas de seguimento, e grupos de discussão com cuidadores formais e informais para informar a concepção de soluções tecnológicas no âmbito dos cuidados de demência. Para cumprir com os requisitos identificados, propomos novos métodos que facilitam a inclusão de emoções no loop durante a terapia de reminiscência para personalizar e diversificar o conteúdo das sessões ao longo do tempo. As contribuições desta tese incluem: a) um conjunto de requisitos funcionais validados recolhidos com os cuidadores formais e informais, os resultados esperados com o cumprimento de cada requisito, e um modelo de arquitectura para o desenvolvimento de soluções tecnológicas de assistência para cuidados de demência; b) uma abordagem end-to-end para identificar automaticamente múltiplas informações emocionais transmitidas por imagens; c) uma abordagem para reduzir a quantidade de imagens que precisam ser anotadas pelas pessoas sem comprometer o desempenho dos modelos de reconhecimento; d) uma técnica de fusão tardia interpretável que combina dinamicamente múltiplos sistemas de recuperação de imagens com base em conteúdo para procurar eficazmente por imagens semelhantes para diversificar e personalizar o conjunto de imagens disponÃveis para serem utilizadas nas sessões.Dementia is one of the major causes of dependency and disability among elderly subjects worldwide. Reminiscence therapy is an inexpensive non-pharmacological therapy commonly used within dementia care due to its therapeutic value for people with dementia. This therapy is useful to create engaging communication between people with dementia and the rest of the world by using the preserved abilities of long-term memory rather than emphasizing the existing impairments to alleviate the experience of failure and social isolation. Current assistive technological solutions improve reminiscence therapy by providing a more lively and engaging experience to all participants (people with dementia, family members, and clinicians), but they are not free of drawbacks: a) the multimedia data used remains unchanged throughout sessions, and there is a lack of customization for each person with dementia; b) they do not take into account the emotions conveyed by the multimedia data used nor the person with dementia’s emotional reactions to the multimedia presented; c) the caregivers’ perspective have not been fully taken into account yet. To overcome these challenges, we followed a usercentered design approach through worldwide surveys, follow-up interviews, and focus groups with formal and informal caregivers to inform the design of technological solutions within dementia care. To fulfil the requirements identified, we propose novel methods that facilitate the inclusion of emotions in the loop during reminiscence therapy to personalize and diversify the content of the sessions over time. Contributions from this thesis include: a) a set of validated functional requirements gathered from formal and informal caregivers, the expected outcomes with the fulfillment of each requirement, and an architecture’s template for the development of assistive technology solutions for dementia care; b) an end-to-end approach to automatically identify multiple emotional information conveyed by images; c) an approach to reduce the amount of images that need to be annotated by humans without compromising the recognition models’ performance; d) an interpretable late-fusion technique that dynamically combines multiple content-based image retrieval systems to effectively search for similar images to diversify and personalize the pool of images available to be used in sessions
Structured representation learning from complex data
This thesis advances several theoretical and practical aspects of the recently introduced restricted Boltzmann machine - a powerful probabilistic and generative framework for modelling data and learning representations. The contributions of this study represent a systematic and common theme in learning structured representations from complex data