348 research outputs found
Image Retrieval Using Image Captioning
The rapid growth in the availability of the Internet and smartphones have resulted in the increase in usage of social media in recent years. This increased usage has thereby resulted in the exponential growth of digital images which are available. Therefore, image retrieval systems play a major role in fetching images relevant to the query provided by the users. These systems should also be able to handle the massive growth of data and take advantage of the emerging technologies, like deep learning and image captioning. This report aims at understanding the purpose of image retrieval and various research held in image retrieval in the past. This report will also analyze various gaps in the past research and it will state the role of image captioning in these systems. Additionally, this report proposes a new methodology using image captioning to retrieve images and presents the results of this method, along with comparing the results with past research
SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks
Most urban applications necessitate building footprints in the form of
concise vector graphics with sharp boundaries rather than pixel-wise raster
images. This need contrasts with the majority of existing methods, which
typically generate over-smoothed footprint polygons. Editing these
automatically produced polygons can be inefficient, if not more time-consuming
than manual digitization. This paper introduces a semi-automatic approach for
building footprint extraction through semantically-sensitive superpixels and
neural graph networks. Drawing inspiration from object-based classification
techniques, we first learn to generate superpixels that are not only
boundary-preserving but also semantically-sensitive. The superpixels respond
exclusively to building boundaries rather than other natural objects, while
simultaneously producing semantic segmentation of the buildings. These
intermediate superpixel representations can be naturally considered as nodes
within a graph. Consequently, graph neural networks are employed to model the
global interactions among all superpixels and enhance the representativeness of
node features for building segmentation. Classical approaches are utilized to
extract and regularize boundaries for the vectorized building footprints.
Utilizing minimal clicks and straightforward strokes, we efficiently accomplish
accurate segmentation outcomes, eliminating the necessity for editing polygon
vertices. Our proposed approach demonstrates superior precision and efficacy,
as validated by experimental assessments on various public benchmark datasets.
A significant improvement of 8% in AP50 was observed in vector graphics
evaluation, surpassing established techniques. Additionally, we have devised an
optimized and sophisticated pipeline for interactive editing, poised to further
augment the overall quality of the results
Topology Reasoning for Driving Scenes
Understanding the road genome is essential to realize autonomous driving.
This highly intelligent problem contains two aspects - the connection
relationship of lanes, and the assignment relationship between lanes and
traffic elements, where a comprehensive topology reasoning method is vacant. On
one hand, previous map learning techniques struggle in deriving lane
connectivity with segmentation or laneline paradigms; or prior lane
topology-oriented approaches focus on centerline detection and neglect the
interaction modeling. On the other hand, the traffic element to lane assignment
problem is limited in the image domain, leaving how to construct the
correspondence from two views an unexplored challenge. To address these issues,
we present TopoNet, the first end-to-end framework capable of abstracting
traffic knowledge beyond conventional perception tasks. To capture the driving
scene topology, we introduce three key designs: (1) an embedding module to
incorporate semantic knowledge from 2D elements into a unified feature space;
(2) a curated scene graph neural network to model relationships and enable
feature interaction inside the network; (3) instead of transmitting messages
arbitrarily, a scene knowledge graph is devised to differentiate prior
knowledge from various types of the road genome. We evaluate TopoNet on the
challenging scene understanding benchmark, OpenLane-V2, where our approach
outperforms all previous works by a great margin on all perceptual and
topological metrics. The code would be released soon
DeepFacePencil: Creating Face Images from Freehand Sketches
In this paper, we explore the task of generating photo-realistic face images
from hand-drawn sketches. Existing image-to-image translation methods require a
large-scale dataset of paired sketches and images for supervision. They
typically utilize synthesized edge maps of face images as training data.
However, these synthesized edge maps strictly align with the edges of the
corresponding face images, which limit their generalization ability to real
hand-drawn sketches with vast stroke diversity. To address this problem, we
propose DeepFacePencil, an effective tool that is able to generate
photo-realistic face images from hand-drawn sketches, based on a novel dual
generator image translation network during training. A novel spatial attention
pooling (SAP) is designed to adaptively handle stroke distortions which are
spatially varying to support various stroke styles and different levels of
details. We conduct extensive experiments and the results demonstrate the
superiority of our model over existing methods on both image quality and model
generalization to hand-drawn sketches.Comment: ACM MM 2020 (oral
Feature Extraction in Music information retrival using Machine Learning Algorithms
Music classification is essential for faster Music record recovery. Separating the ideal arrangement of highlights and selecting the best investigation technique are critical for obtaining the best results from sound grouping. The extraction of sound elements could be viewed as an exceptional case of information sound information being transformed into sound instances. Music division and order can provide a rich dataset for the analysis of sight and sound substances. Because of the great dimensionality of sound highlights as well as the variable length of sound fragments, Music layout is dependent on the overpowering computation. By focusing on rhythmic aspects of different songs, this article provides an introduction of some of the possibilities for computing music similarity. Almost every MIR toolkit includes a method for extracting the beats per minute (BPM) and consequently the tempo of each music. The simplest method of computing very low-level rhythmic similarities is to sort and compare songs solely by their tempo There are undoubtedly far better and more precise solutions. work discusses some of the most promising ways for computing rhythm similarities in a Big Data framework usaing machine Learning algorithms
- …