2,396 research outputs found

    Multimodal Classification of Urban Micro-Events

    Get PDF
    In this paper we seek methods to effectively detect urban micro-events. Urban micro-events are events which occur in cities, have limited geographical coverage and typically affect only a small group of citizens. Because of their scale these are difficult to identify in most data sources. However, by using citizen sensing to gather data, detecting them becomes feasible. The data gathered by citizen sensing is often multimodal and, as a consequence, the information required to detect urban micro-events is distributed over multiple modalities. This makes it essential to have a classifier capable of combining them. In this paper we explore several methods of creating such a classifier, including early, late, hybrid fusion and representation learning using multimodal graphs. We evaluate performance on a real world dataset obtained from a live citizen reporting system. We show that a multimodal approach yields higher performance than unimodal alternatives. Furthermore, we demonstrate that our hybrid combination of early and late fusion with multimodal embeddings performs best in classification of urban micro-events

    Large-scale image collection cleansing, summarization and exploration

    Get PDF
    A perennially interesting topic in the research field of large scale image collection organization is how to effectively and efficiently conduct the tasks of image cleansing, summarization and exploration. The primary objective of such an image organization system is to enhance user exploration experience with redundancy removal and summarization operations on large-scale image collection. An ideal system is to discover and utilize the visual correlation among the images, to reduce the redundancy in large-scale image collection, to organize and visualize the structure of large-scale image collection, and to facilitate exploration and knowledge discovery. In this dissertation, a novel system is developed for exploiting and navigating large-scale image collection. Our system consists of the following key components: (a) junk image filtering by incorporating bilingual search results; (b) near duplicate image detection by using a coarse-to-fine framework; (c) concept network generation and visualization; (d) image collection summarization via dictionary learning for sparse representation; and (e) a multimedia practice of graffiti image retrieval and exploration. For junk image filtering, bilingual image search results, which are adopted for the same keyword-based query, are integrated to automatically identify the clusters for the junk images and the clusters for the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. The duplicate pairs are detected with both global feature (partition based color histogram) and local feature (CPAM and SIFT Bag-of-Word model). The duplications are detected and removed from the data collection to facilitate further exploration and visual correlation analysis. After junk image filtering and duplication removal, the visual concepts are further organized and visualized by the proposed concept network. An automatic algorithm is developed to generate such visual concept network which characterizes the visual correlation between image concept pairs. Multiple kernels are combined and a kernel canonical correlation analysis algorithm is used to characterize the diverse visual similarity contexts between the image concepts. The FishEye visualization technique is implemented to facilitate the navigation of image concepts through our image concept network. To better assist the exploration of large scale data collection, we design an efficient summarization algorithm to extract representative examplars. For this collection summarization task, a sparse dictionary (a small set of the most representative images) is learned to represent all the images in the given set, e.g., such sparse dictionary is treated as the summary for the given image set. The simulated annealing algorithm is adopted to learn such sparse dictionary (image summary) by minimizing an explicit optimization function. In order to handle large scale image collection, we have evaluated both the accuracy performance of the proposed algorithms and their computation efficiency. For each of the above tasks, we have conducted experiments on multiple public available image collections, such as ImageNet, NUS-WIDE, LabelMe, etc. We have observed very promising results compared to existing frameworks. The computation performance is also satisfiable for large-scale image collection applications. The original intention to design such a large-scale image collection exploration and organization system is to better service the tasks of information retrieval and knowledge discovery. For this purpose, we utilize the proposed system to a graffiti retrieval and exploration application and receive positive feedback

    An Autoencoder-Based Image Descriptor for Image Matching and Retrieval

    Get PDF
    Local image features are used in many computer vision applications. Many point detectors and descriptors have been proposed in recent years; however, creation of effective descriptors is still a topic of research. The Scale Invariant Feature Transform (SIFT) developed by David Lowe is widely used in image matching and image retrieval. SIFT detects interest points in an image based on Scale-Space analysis, which is invariant to change in image scale. A SIFT descriptor contains gradient information about an image patch centered at a point of interest. SIFT is found to provide a high matching rate, is robust to image transformations; however, it is found to be slow in image matching/retrieval. Autoencoder is a method for representation learning and is used in this project to construct a low-dimensional representation of a high-dimensional data while preserving the structure and geometry of the data. In many computer vision tasks, the high dimensionality of input data means a high computational cost. The main motivation in this project is to improve the speed and the distinctness of SIFT descriptors. To achieve this, a new descriptor is proposed that is based on Autoencoder. Our newly generated descriptors can reduce the size and complexity of SIFT descriptors, reducing the time required in image matching and image retrieval

    Deep learning-based graffiti detection: A study using Images from the streets of Lisbon

    Get PDF
    This research work comes from a real problem from Lisbon City Council that was interested in developing a system that automatically detects in real-time illegal graffiti present throughout the city of Lisbon by using cars equipped with cameras. This system would allow a more efficient and faster identification and clean-up of the illegal graffiti constantly being produced, with a georeferenced position. We contribute also a city graffiti database to share among the scientific community. Images were provided and collected from different sources that included illegal graffiti, images with graffiti considered street art, and images without graffiti. A pipeline was then developed that, first, classifies the image with one of the following labels: illegal graffiti, street art, or no graffiti. Then, if it is illegal graffiti, another model was trained to detect the coordinates of graffiti on an image. Pre-processing, data augmentation, and transfer learning techniques were used to train the models. Regarding the classification model, an overall accuracy of 81.4% and F1-scores of 86%, 81%, and 66% were obtained for the classes of street art, illegal graffiti, and image without graffiti, respectively. As for the graffiti detection model, an Intersection over Union (IoU) of 70.3% was obtained for the test set.info:eu-repo/semantics/publishedVersio
    • …
    corecore