Search CORE

117 research outputs found

On Achieving Diversity in the Presence of Outliers in Participatory Camera Sensor Networks

Author: Abdelzaher Tarek F.
Amin Md Tanvir Al
Govindan Ramesh
Iyengar Arun
Uddin Md Yusuf Sarwar
Publication venue
Publication date: 25/04/2012
Field of study

This paper addresses the problem of collection and delivery of a representative subset of pictures, in participatory camera networks, to maximize coverage when a significant portion of the pictures may be redundant or irrelevant. Consider, for example, a rescue mission where volunteers and survivors of a large-scale disaster scout a wide area to capture pictures of damage in distressed neighborhoods, using handheld cameras, and report them to a rescue station. In this participatory camera network, a significant amount of pictures may be redundant (i.e., similar pictures may be reported by many) or irrelevant (i.e., may not document an event of interest). Given this pool of pictures, we aim to build a protocol to store and deliver a smaller subset of pictures, among all those taken, that minimizes redundancy and eliminates irrelevant objects and outliers. While previous work addressed removal of redundancy alone, doing so in the presence of outliers is tricky, because outliers, by their very nature, are different from other objects, causing redundancy minimizing algorithms to favor their inclusion, which is at odds with the goal of finding a representative subset. To eliminate both outliers and redundancy at the same time, two seemingly opposite objectives must be met together. The contribution of this paper lies in a new prioritization technique (and its in-network implementation) that minimizes redundancy among delivered pictures, while also reducing outliers.unpublishedis peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Multi modal multi-semantic image retrieval

Author: Kesorn Kraisak
Publication venue
Publication date: 01/01/2010
Field of study

PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

Queen Mary Research Online

VISUAL SEARCH APPLICATION FOR ANDROID

Author: Patel Harsh
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2012
Field of study

The Search Engine has played an important role in information society. Information in the form of text or a visual image is easily available over the internet. The Visual Search Engine aims at helping users locate and rapidly get information about their items of interest. The Visual Search Engine takes input in the form of keywords or visual images and provides information about destinations, artworks, books, wine, branded items, product’s catalogs, etc. The main goal of this project was to create a Visual Search Algorithm that can find matching images based on an input image’s features or key-points. We have also focused on researching new techniques and applied them to optimize the proposed algorithm and its integration with Android mobile client. The project’s end product is an Android Mobile Visual Search Application, which will function as follows: User clicks a picture of his/her desired item. Application recognizes what that item is, using the Visual Search Algorithm. Application gets the user’s location and based on the user’s current location, it will provide a list of stores nearby him/her of the desired item with its price

SJSU ScholarWorks

Visual search for musical performances and endoscopic videos

Author: Roldán Carlos Jennifer
Publication venue: Universitat Politècnica de Catalunya
Publication date: 07/05/2015
Field of study

[ANGLÈS] This project explores the potential of LIRE, an en existing Content-Based Image Retrieval (CBIR) system, when used to retrieve medical videos. These videos are recording of the live streams used by surgeons during the endoscopic procedures, captured from inside of the subject. The growth of such video content stored in servers requires search engines capable to assist surgeons in their management and retrieval. In our tool, queries are formulated by visual examples and those allow surgeons to re-find shots taken during the procedure. This thesis presents an extension and adaptation of Lire for video retrieval based on visual features and late fusion. The results are assessed from two perspectives: a quantitative and qualitative one. While the quantitative one follows the standard practices and metrics for video retrieval, the qualitative assessment has been based on an empirical social study using a semi-interactive web-interface. In particular, a thinking aloud test was applied to analyze if the user expectations and requirements were fulfilled. Due to the scarcity of surgeons available for the qualitative tests, a second domain was also addressed: videos captured at musical performances. These type of videos has also experienced an exponential growth with the advent of affordable multimedia smart phones, available to a large audience. Analogously to the endoscopic videos, searching in a large data set of such videos is a challenging topic.[CASTELLÀ] Este proyecto investiga el potencial de Lire, un sistema existente de recuperación basado en contenido de imagen (CBIR) utilizado en el dominio médico. Estos vídeos son grabaciones a tiempo real del interior de los pacientes y son utilizados por cirujanos durante las operaciones de endoscopia. La creciente demanda de este conjunto de vídeos que son almacenados en diferentes servidores, requiere nuevos motores de búsqueda capaces de dar soporte al trabajo de los médicos con su gestión y posterior recuperación cuando se necesite. En nuestra herramienta, las consultas son formuladas mediante ejemplos visuales. Esto permite a los cirujanos volver a encontrar los diferentes instantes capturados durante las intervenciones. En esta tesis se presenta una extensión y adaptación de Lire para la recuperación de vídeo basado en las características visuales y métodos de late fusion. Los resultados son evaluados desde dos perspectivas: una cuantitativa y una cualitativa. Mientras que la parte cuantitativa sigue el estándar de las prácticas y métricas empleadas en vídeo retrieval, la evaluación cualitativa ha sido basada en un estudio social empírico mediante una interfaz web semi-interactiva. Particularmente, se ha emprendido el método "thinking aloud test" para analizar si nuestra herramienta cumple con las expectativas y necesidades de los usuarios a la hora de utilizar la aplicación. Debido a la escasez de médicos disponibles para llevar a cabo las pruebas cualitativas, el trabajo se ha dirigido también a un segundo dominio: conjunto de vídeos de acontecimientos musicales. Este tipo de vídeos también ha experimentado un crecimiento exponencial con la llegada de los smart phones y se encuentran al alcance de un público muy amplio. Análogamente a los vídeos endoscópicos, hacer una busca en una gran base de datos de este tipo también es un tema difícil y motivo de estudio.[CATALÀ] Aquest projecte investiga el potencial de Lire, un sistema existent de recuperació basat en contingut d'imatge (CBIR) utilitzat en el domini mèdic. Aquests vídeos són enregistraments a temps real de l'interior dels pacients i són utilitzats per cirurgians durant les operacions d'endoscòpia. La creixent demanda d'aquest conjunt de vídeos que són emmagatzemats a diferents servidors, requereix nous motors de cerca capaços de donar suport a la feina dels metges amb la seva gestió i posterior recuperació quan es necessiti. A la nostra eina, les consultes són formulades mitjançant exemples visuals. Això permet als cirurgians tornar a trobar els diferents instants capturats durant la intervenció. En aquesta tesi es presenta una extensió i adaptació del Lire per a la recuperació de vídeo basat en característiques visuals i late fusion. Els resultats són avaluats des de dues perspectives: una quantitativa i una qualitativa. Mentre que la part quantitativa segueix l'estàndard de les pràctiques i mètriques per vídeo retrieval, l'avaluació qualitativa ha estat basada en un estudi social empíric mitjançant una interfície web semiinteractiva. Particularment, s'ha emprès el mètode "thinking aloud test" per analitzar si la nostra eina compleix amb les expectatives i necessitats dels usuaris a l'hora d'utilitzar l'aplicació. A causa de l'escassetat de metges disponibles per dur a terme les proves qualitatives, el treball s'ha adreçat també a un segon domini: conjunt de vídeos d'esdeveniments musicals. Aquest tipus de vídeos també ha experimentat un creixement exponencial amb l'arribada dels smart phones i es troben a l'abast d'un públic molt ampli. Anàlogament als vídeos endoscòpics, fer una cerca en una gran base de dades d'aquest tipus també és un tema difícil i motiu d'estudi

UPCommons. Portal del coneixement obert de la UPC

PnP Maxtools: Autonomous Parameter Control in MaxMSP Utilizing MIR Algorithms

Author: Franklin Austin Alexander
Publication venue: LSU Digital Commons
Publication date: 15/12/2022
Field of study

This research presents a new approach to computer automation through the implementation of novel real-time music information retrieval algorithms developed for this project. It documents the development of the PnP.Maxtools package, a set of open source objects designed within the popular programming environment MaxMSP. The package is a set of pre/post processing filters, objective and subjective timbral descriptors, audio effects, and other objects that are designed to be used together to compose music or improvise without the use of external controllers or hardware. The PnP.Maxtools package objects are designed to be used quickly and easily using a `plug and play\u27 style with as few initial arguments needed as possible. The PnP.Maxtools package is designed to take incoming audio from a microphone, analyze it, and use the analysis to control an audio effect on the incoming signal in real-time. In this way, the audio content has a real musical and analogous relationship with the resulting musical transformations while the control parameters become more multifaceted and better able to serve the needs of artists. The term Reflexive Automation is presented that describes this unsupervised relationship between the content of the sound being analyzed and the analogous and automatic control over a specific musical parameter. A set of compositions are also presented that demonstrate ideal usage of the object categories for creating reflexive systems and achieving fully autonomous control over musical parameters

Louisiana State University