444 research outputs found
Learning to Generate and Refine Object Proposals
Visual object recognition is a fundamental and challenging
problem in computer vision. To build a practical recognition
system, one is first confronted with high computation complexity
due to an enormous search space from an image, which is caused by
large variations in object appearance, pose and mutual occlusion,
as well as other environmental factors. To reduce the search
complexity, a moderate set of image regions that are likely to
contain an object, regardless of its category, are usually first
generated in modern object recognition subsystems. These possible
object regions are called object proposals, object hypotheses or
object candidates, which can be used for down-stream
classification or global reasoning in many different vision tasks
like object detection, segmentation and tracking, etc.
This thesis addresses the problem of object proposal generation,
including bounding box and segment proposal generation, in
real-world scenarios. In particular, we investigate the
representation learning in object proposal generation with 3D
cues and contextual information, aiming to propose higher-quality
object candidates which have higher object recall, better
boundary coverage and lower number. We focus on three main
issues: 1) how can we incorporate additional geometric and
high-level semantic context information into the proposal
generation for stereo images? 2) how do we generate object
segment proposals for stereo images with learning representations
and learning grouping process? and 3) how can we learn a
context-driven representation to refine segment proposals
efficiently?
In this thesis, we propose a series of solutions to address each
of the raised problems. We first propose a semantic context and
depth-aware object proposal generation method. We design a set of
new cues to encode the objectness, and then train an efficient
random forest classifier to re-rank the initial proposals and
linear regressors to fine-tune their locations. Next, we extend
the task to the segment proposal generation in the same setting
and develop a learning-based segment proposal generation method
for stereo images. Our method makes use of learned deep features
and designed geometric features to represent a region and learns
a similarity network to guide the superpixel grouping process. We
also learn a ranking network to predict the objectness score for
each segment proposal. To address the third problem, we take a
transformation-based approach to improve the quality of a given
segment candidate pool based on context information. We propose
an efficient deep network that learns affine transformations to
warp an initial object mask towards nearby object region, based
on a novel feature pooling strategy. Finally, we extend our
affine warping approach to address the object-mask alignment
problem and particularly the problem of refining a set of segment
proposals. We design an end-to-end deep spatial transformer
network that learns free-form deformations (FFDs) to non-rigidly
warp the shape mask towards the ground truth, based on a
multi-level dual mask feature pooling strategy. We evaluate all
our approaches on several publicly available object recognition
datasets and show superior performance
Using concolic execution to identify IA32 program errors
In computer science education, one of the most important tasks is to provide students with
feedback that can help them discover errors in their assignment code. Traditionally, this check is
achieved by executing a series of pre-defined test cases. But many bugs are not easily exposed
by such test cases, which are thus insufficient for fair grading. Furthermore, failed test cases give
students little feedback as to how to fix their code. In the last decade, tools have been developed
for code testing that aim at achieving high code coverage even in strict environments, such as
interacting with the operating system. These tools can be helpful if applied in computer science
education. Among these tools, KLEE is particularly designed for improving control flow paths
coverage by exploring different execution paths in the program using concolic execution. In this
thesis, we investigate the possibility of using concolic execution with KLEE to generate feedback for
student assignments written in IA32 (32-bit version of x86) assembly, like the MP1 in our operating
systems course (ECE391). By developing tools for lexical and control flow analysis to translate IA32
to C, we were able to take advantage of KLEE to explore the program’s execution path thoroughly
to generate test cases and feedback that can be helpful for students to detect problems in their
programs. The initial test shows that among the 180 student codes, our tool picked up 139 cases
that contain errors compared to the 105 cases that got picked up by the normal grader, and that all student codes that have errors detected by the grader have been detected to contain errors by our
tool.Ope
The detection of possible -ray quasi-periodic modulation with 600 days from the blazar S2 0109+22
In this work, we analyzed the long term gamma-ray data by a Fermi Large Area
Telescope (Fermi-LAT) of blazar S2 0109+22, ranging from 2008 to 2023. The
quasi-periodic oscillations (QPOs) of blazars aided in investigating the
physical properties of internal supermassive black holes, the nature of
variability, and the underlying radiation mechanism. We employed four different
methods--Weighted Wavelet Z-transform, Lomb-Scargle periodogram, REDFIT and
phase folded light curve analysis, for searching QPO signals. Our analysis
identified a possible QPO behavior with a periodicity of 600 days in
November 2013 to January 2023 at a significance level of 3.5 . This QPO
signal sustained 9 years, corresponding to 5.6 cycles, which was in good
agreement with the previously observed of periodicity 657 days in radio.
We explained this phenomenon based on the accretion model and the lighthouse
effect, in a binary black hole system.Comment: 12 pages, 8 figures,3 tables,accepted for publication in PAS
GDN: A Stacking Network Used for Skin Cancer Diagnosis
Skin cancer, the primary type of cancer that can be identified by visual
recognition, requires an automatic identification system that can accurately
classify different types of lesions. This paper presents GoogLe-Dense Network
(GDN), which is an image-classification model to identify two types of skin
cancer, Basal Cell Carcinoma, and Melanoma. GDN uses stacking of different
networks to enhance the model performance. Specifically, GDN consists of two
sequential levels in its structure. The first level performs basic
classification tasks accomplished by GoogLeNet and DenseNet, which are trained
in parallel to enhance efficiency. To avoid low accuracy and long training
time, the second level takes the output of the GoogLeNet and DenseNet as the
input for a logistic regression model. We compare our method with four baseline
networks including ResNet, VGGNet, DenseNet, and GoogLeNet on the dataset, in
which GoogLeNet and DenseNet significantly outperform ResNet and VGGNet. In the
second level, different stacking methods such as perceptron, logistic
regression, SVM, decision trees and K-neighbor are studied in which Logistic
Regression shows the best prediction result among all. The results prove that
GDN, compared to a single network structure, has higher accuracy in optimizing
skin cancer detection.Comment: Published at ICSPS 202
Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models
In the field of computer graphics, the use of vector graphics, particularly
Scalable Vector Graphics (SVG), represents a notable development from
traditional pixel-based imagery. SVGs, with their XML-based format, are
distinct in their ability to directly and explicitly represent visual elements
such as shape, color, and path. This direct representation facilitates a more
accurate and logical depiction of graphical elements, enhancing reasoning and
interpretability. Recognizing the potential of SVGs, the machine learning
community has introduced multiple methods for image vectorization. However,
transforming images into SVG format while retaining the relational properties
and context of the original scene remains a key challenge. Most vectorization
methods often yield SVGs that are overly complex and not easily interpretable.
In response to this challenge, we introduce our method, Simple-SVG-Generation
(S\textsuperscript{2}VG\textsuperscript{2}). Our method focuses on producing
SVGs that are both accurate and simple, aligning with human readability and
understanding. With simple images, we evaluate our method with reasoning tasks
together with advanced language models, the results show a clear improvement
over previous SVG generation methods. We also conducted surveys for human
evaluation on the readability of our generated SVGs, the results also favor our
methods.Comment: 10 pages, 7 figure
VarifocalNet: An IoU-aware Dense Object Detector
Accurately ranking the vast number of candidate detections is crucial for
dense object detectors to achieve high performance. Prior work uses the
classification score or a combination of classification and predicted
localization scores to rank candidates. However, neither option results in a
reliable ranking, thus degrading detection performance. In this paper, we
propose to learn an Iou-aware Classification Score (IACS) as a joint
representation of object presence confidence and localization accuracy. We show
that dense object detectors can achieve a more accurate ranking of candidate
detections based on the IACS. We design a new loss function, named Varifocal
Loss, to train a dense object detector to predict the IACS, and propose a new
star-shaped bounding box feature representation for IACS prediction and
bounding box refinement. Combining these two new components and a bounding box
refinement branch, we build an IoU-aware dense object detector based on the
FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive
experiments on MS COCO show that our VFNet consistently surpasses the strong
baseline by 2.0 AP with different backbones. Our best model VFNet-X-1200
with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO
test-dev, which is state-of-the-art among various object detectors.Code is
available at https://github.com/hyz-xmaster/VarifocalNet .Comment: Accepted to CVPR 2021 as an ora
RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
One of the recent best attempts at Text-to-SQL is the pre-trained language
model. Due to the structural property of the SQL queries, the seq2seq model
takes the responsibility of parsing both the schema items (i.e., tables and
columns) and the skeleton (i.e., SQL keywords). Such coupled targets increase
the difficulty of parsing the correct SQL queries especially when they involve
many schema items and logic operators. This paper proposes a ranking-enhanced
encoding and skeleton-aware decoding framework to decouple the schema linking
and the skeleton parsing. Specifically, for a seq2seq encoder-decode model, its
encoder is injected by the most relevant schema items instead of the whole
unordered ones, which could alleviate the schema linking effort during SQL
parsing, and its decoder first generates the skeleton and then the actual SQL
query, which could implicitly constrain the SQL parsing. We evaluate our
proposed framework on Spider and its three robustness variants: Spider-DK,
Spider-Syn, and Spider-Realistic. The experimental results show that our
framework delivers promising performance and robustness. Our code is available
at https://github.com/RUCKBReasoning/RESDSQL.Comment: Accepted to AAAI 2023 main conference (oral
Graph Meets LLMs: Towards Large Graph Models
Large models have emerged as the most recent groundbreaking achievements in
artificial intelligence, and particularly machine learning. However, when it
comes to graphs, large models have not achieved the same level of success as in
other fields, such as natural language processing and computer vision. In order
to promote applying large models for graphs forward, we present a perspective
paper to discuss the challenges and opportunities associated with developing
large graph models. First, we discuss the desired characteristics of large
graph models. Then, we present detailed discussions from three key
perspectives: representation basis, graph data, and graph models. In each
category, we provide a brief overview of recent advances and highlight the
remaining challenges together with our visions. Finally, we discuss valuable
applications of large graph models. We believe this perspective can encourage
further investigations into large graph models, ultimately pushing us one step
closer towards artificial general intelligence (AGI). We are the first to
comprehensively study large graph models, to the best of our knowledge.Comment: Accepted by NeurIPS 2023 New Frontiers in Graph Learning Workshop.
Comments are welcom
- …