153 research outputs found

    SVS-JOIN : efficient spatial visual similarity join for geo-multimedia

    Get PDF
    In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale geo-multimedia retrieval. Spatial similarity join is one of the significant problems in the area of spatial database. Previous works focused on spatial textual document search problem, rather than geo-multimedia retrieval. In this paper, we investigate a novel geo-multimedia retrieval paradigm named spatial visual similarity join (SVS-JOIN for short), which aims to search similar geo-image pairs in both aspects of geo-location and visual content. Firstly, the definition of SVS-JOIN is proposed and then we present the geographical similarity and visual similarity measurement. Inspired by the approach for textual similarity join, we develop an algorithm named SVS-JOIN B by combining the PPJOIN algorithm and visual similarity. Besides, an extension of it named SVS-JOIN G is developed, which utilizes spatial grid strategy to improve the search efficiency. To further speed up the search, a novel approach called SVS-JOIN Q is carefully designed, in which a quadtree and a global inverted index are employed. Comprehensive experiments are conducted on two geo-image datasets and the results demonstrate that our solution can address the SVS-JOIN problem effectively and efficiently

    COMIC: Towards A Compact Image Captioning Model with Attention

    Full text link
    Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to be deployed on embedded system with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that, our proposed model, named COMIC for COMpact Image Captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedding vocabulary size that is 39x - 99x smaller. The source code and models are available at: https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-AttentionComment: Added source code link and new results in Table

    Distinctive 3D local deep descriptors

    Full text link
    We present a simple but yet effective method for learning distinctive 3D local deep descriptors (DIPs) that can be used to register point clouds without requiring an initial alignment. Point cloud patches are extracted, canonicalised with respect to their estimated local reference frame and encoded into rotation-invariant compact descriptors by a PointNet-based deep neural network. DIPs can effectively generalise across different sensor modalities because they are learnt end-to-end from locally and randomly sampled points. Because DIPs encode only local geometric information, they are robust to clutter, occlusions and missing regions. We evaluate and compare DIPs against alternative hand-crafted and deep descriptors on several indoor and outdoor datasets consisting of point clouds reconstructed using different sensors. Results show that DIPs (i) achieve comparable results to the state-of-the-art on RGB-D indoor scenes (3DMatch dataset), (ii) outperform state-of-the-art by a large margin on laser-scanner outdoor scenes (ETH dataset), and (iii) generalise to indoor scenes reconstructed with the Visual-SLAM system of Android ARCore. Source code: https://github.com/fabiopoiesi/dip.Comment: IEEE International Conference on Pattern Recognition 202

    A Systematic Survey of Classification Algorithms for Cancer Detection

    Get PDF
    Cancer is a fatal disease induced by the occurrence of a count of inherited issues and also a count of pathological changes. Malignant cells are dangerous abnormal areas that could develop in any part of the human body, posing a life-threatening threat. To establish what treatment options are available, cancer, also referred as a tumor, should be detected early and precisely. The classification of images for cancer diagnosis is a complex mechanism that is influenced by a diverse of parameters. In recent years, artificial vision frameworks have focused attention on the classification of images as a key problem. Most people currently rely on hand-made features to demonstrate an image in a specific manner. Learning classifiers such as random forest and decision tree were used to determine a final judgment. When there are a vast number of images to consider, the difficulty occurs. Hence, in this paper, weanalyze, review, categorize, and discuss current breakthroughs in cancer detection utilizing machine learning techniques for image recognition and classification. We have reviewed the machine learning approaches like logistic regression (LR), Naïve Bayes (NB), K-nearest neighbors (KNN), decision tree (DT), and Support Vector Machines (SVM)
    corecore