Search CORE

348 research outputs found

Image Retrieval Using Image Captioning

Author: Vijayaraju Nivetha
Publication venue: SJSU ScholarWorks
Publication date: 20/05/2019
Field of study

The rapid growth in the availability of the Internet and smartphones have resulted in the increase in usage of social media in recent years. This increased usage has thereby resulted in the exponential growth of digital images which are available. Therefore, image retrieval systems play a major role in fetching images relevant to the query provided by the users. These systems should also be able to handle the massive growth of data and take advantage of the emerging technologies, like deep learning and image captioning. This report aims at understanding the purpose of image retrieval and various research held in image retrieval in the past. This report will also analyze various gaps in the past research and it will state the role of image captioning in these systems. Additionally, this report proposes a new methodology using image captioning to retrieve images and presents the results of this method, along with comparing the results with past research

SJSU ScholarWorks

SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks

Author: Hu Han
Shang Qisen
Wang Zhendong
Xu Bo
Yu Haojia
Zhu Qing
Publication venue
Publication date: 20/06/2023
Field of study

Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images. This need contrasts with the majority of existing methods, which typically generate over-smoothed footprint polygons. Editing these automatically produced polygons can be inefficient, if not more time-consuming than manual digitization. This paper introduces a semi-automatic approach for building footprint extraction through semantically-sensitive superpixels and neural graph networks. Drawing inspiration from object-based classification techniques, we first learn to generate superpixels that are not only boundary-preserving but also semantically-sensitive. The superpixels respond exclusively to building boundaries rather than other natural objects, while simultaneously producing semantic segmentation of the buildings. These intermediate superpixel representations can be naturally considered as nodes within a graph. Consequently, graph neural networks are employed to model the global interactions among all superpixels and enhance the representativeness of node features for building segmentation. Classical approaches are utilized to extract and regularize boundaries for the vectorized building footprints. Utilizing minimal clicks and straightforward strokes, we efficiently accomplish accurate segmentation outcomes, eliminating the necessity for editing polygon vertices. Our proposed approach demonstrates superior precision and efficacy, as validated by experimental assessments on various public benchmark datasets. A significant improvement of 8% in AP50 was observed in vector graphics evaluation, surpassing established techniques. Additionally, we have devised an optimized and sophisticated pipeline for interactive editing, poised to further augment the overall quality of the results

arXiv.org e-Print Archive

Topology Reasoning for Driving Scenes

Author: Chen Li
Geng Xiangwei
Jiang Shengyin
Li Hongyang
Li Tianyu
Li Yang
Liu Zhenbo
Luo Ping
Qiao Yu
Wang Huijie
Wang Xiaogang
Wang Yuting
Wen Feng
Xu Chunjing
Xu Hang
Yan Junchi
Zhang Wei
Publication venue
Publication date: 11/04/2023
Field of study

Understanding the road genome is essential to realize autonomous driving. This highly intelligent problem contains two aspects - the connection relationship of lanes, and the assignment relationship between lanes and traffic elements, where a comprehensive topology reasoning method is vacant. On one hand, previous map learning techniques struggle in deriving lane connectivity with segmentation or laneline paradigms; or prior lane topology-oriented approaches focus on centerline detection and neglect the interaction modeling. On the other hand, the traffic element to lane assignment problem is limited in the image domain, leaving how to construct the correspondence from two views an unexplored challenge. To address these issues, we present TopoNet, the first end-to-end framework capable of abstracting traffic knowledge beyond conventional perception tasks. To capture the driving scene topology, we introduce three key designs: (1) an embedding module to incorporate semantic knowledge from 2D elements into a unified feature space; (2) a curated scene graph neural network to model relationships and enable feature interaction inside the network; (3) instead of transmitting messages arbitrarily, a scene knowledge graph is devised to differentiate prior knowledge from various types of the road genome. We evaluate TopoNet on the challenging scene understanding benchmark, OpenLane-V2, where our approach outperforms all previous works by a great margin on all perceptual and topological metrics. The code would be released soon

arXiv.org e-Print Archive

DeepFacePencil: Creating Face Images from Freehand Sketches

Author: Choi Yunjey
Diederik
Heusel Martin
Huang Yukun
Isola Phillip
Karras Tero
Kim Taeksoo
Lee Cheng-Han
Liu Jiawei
Radford Alec
Theis Lucas
Wang Ting-Chun
Xie Saining
Yi Zili
Zeleznik Robert C.
Zhu Jun-Yan
Zhu Jun-Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/08/2020
Field of study

In this paper, we explore the task of generating photo-realistic face images from hand-drawn sketches. Existing image-to-image translation methods require a large-scale dataset of paired sketches and images for supervision. They typically utilize synthesized edge maps of face images as training data. However, these synthesized edge maps strictly align with the edges of the corresponding face images, which limit their generalization ability to real hand-drawn sketches with vast stroke diversity. To address this problem, we propose DeepFacePencil, an effective tool that is able to generate photo-realistic face images from hand-drawn sketches, based on a novel dual generator image translation network during training. A novel spatial attention pooling (SAP) is designed to adaptively handle stroke distortions which are spatially varying to support various stroke styles and different levels of details. We conduct extensive experiments and the results demonstrate the superiority of our model over existing methods on both image quality and model generalization to hand-drawn sketches.Comment: ACM MM 2020 (oral

arXiv.org e-Print Archive

Crossref

Feature Extraction in Music information retrival using Machine Learning Algorithms

Author: Radhika A D
Savita Chaudhary
V Karthik
Publication venue: Prisma Publications
Publication date: 23/09/2022
Field of study

Music classification is essential for faster Music record recovery. Separating the ideal arrangement of highlights and selecting the best investigation technique are critical for obtaining the best results from sound grouping. The extraction of sound elements could be viewed as an exceptional case of information sound information being transformed into sound instances. Music division and order can provide a rich dataset for the analysis of sight and sound substances. Because of the great dimensionality of sound highlights as well as the variable length of sound fragments, Music layout is dependent on the overpowering computation. By focusing on rhythmic aspects of different songs, this article provides an introduction of some of the possibilities for computing music similarity. Almost every MIR toolkit includes a method for extracting the beats per minute (BPM) and consequently the tempo of each music. The simplest method of computing very low-level rhythmic similarities is to sort and compare songs solely by their tempo There are undoubtedly far better and more precise solutions. work discusses some of the most promising ways for computing rhythm similarities in a Big Data framework usaing machine Learning algorithms

International Journal of Data Informatics and Intelligent Computing

タンパク質の機能予測のための深層転移学習法に関する研究

Author: Yuan Xin
Publication venue
Publication date: 01/01/2022
Field of study

早大学位記番号:新9098早稲田大

Waseda University Repository