Search CORE

444 research outputs found

Learning to Generate and Refine Object Proposals

Author: Zhang Haoyang
Publication venue
Publication date: 01/01/2018
Field of study

Visual object recognition is a fundamental and challenging problem in computer vision. To build a practical recognition system, one is first confronted with high computation complexity due to an enormous search space from an image, which is caused by large variations in object appearance, pose and mutual occlusion, as well as other environmental factors. To reduce the search complexity, a moderate set of image regions that are likely to contain an object, regardless of its category, are usually first generated in modern object recognition subsystems. These possible object regions are called object proposals, object hypotheses or object candidates, which can be used for down-stream classification or global reasoning in many different vision tasks like object detection, segmentation and tracking, etc. This thesis addresses the problem of object proposal generation, including bounding box and segment proposal generation, in real-world scenarios. In particular, we investigate the representation learning in object proposal generation with 3D cues and contextual information, aiming to propose higher-quality object candidates which have higher object recall, better boundary coverage and lower number. We focus on three main issues: 1) how can we incorporate additional geometric and high-level semantic context information into the proposal generation for stereo images? 2) how do we generate object segment proposals for stereo images with learning representations and learning grouping process? and 3) how can we learn a context-driven representation to refine segment proposals efficiently? In this thesis, we propose a series of solutions to address each of the raised problems. We first propose a semantic context and depth-aware object proposal generation method. We design a set of new cues to encode the objectness, and then train an efficient random forest classifier to re-rank the initial proposals and linear regressors to fine-tune their locations. Next, we extend the task to the segment proposal generation in the same setting and develop a learning-based segment proposal generation method for stereo images. Our method makes use of learned deep features and designed geometric features to represent a region and learns a similarity network to guide the superpixel grouping process. We also learn a ranking network to predict the objectness score for each segment proposal. To address the third problem, we take a transformation-based approach to improve the quality of a given segment candidate pool based on context information. We propose an efficient deep network that learns affine transformations to warp an initial object mask towards nearby object region, based on a novel feature pooling strategy. Finally, we extend our affine warping approach to address the object-mask alignment problem and particularly the problem of refining a set of segment proposals. We design an end-to-end deep spatial transformer network that learns free-form deformations (FFDs) to non-rigidly warp the shape mask towards the ground truth, based on a multi-level dual mask feature pooling strategy. We evaluate all our approaches on several publicly available object recognition datasets and show superior performance

The Australian National University

Using concolic execution to identify IA32 program errors

Author: Zhang Haoyang
Publication venue
Publication date: 01/05/2020
Field of study

In computer science education, one of the most important tasks is to provide students with feedback that can help them discover errors in their assignment code. Traditionally, this check is achieved by executing a series of pre-defined test cases. But many bugs are not easily exposed by such test cases, which are thus insufficient for fair grading. Furthermore, failed test cases give students little feedback as to how to fix their code. In the last decade, tools have been developed for code testing that aim at achieving high code coverage even in strict environments, such as interacting with the operating system. These tools can be helpful if applied in computer science education. Among these tools, KLEE is particularly designed for improving control flow paths coverage by exploring different execution paths in the program using concolic execution. In this thesis, we investigate the possibility of using concolic execution with KLEE to generate feedback for student assignments written in IA32 (32-bit version of x86) assembly, like the MP1 in our operating systems course (ECE391). By developing tools for lexical and control flow analysis to translate IA32 to C, we were able to take advantage of KLEE to explore the program’s execution path thoroughly to generate test cases and feedback that can be helpful for students to detect problems in their programs. The initial test shows that among the 180 student codes, our tool picked up 139 cases that contain errors compared to the 105 cases that got picked up by the normal grader, and that all student codes that have errors detected by the grader have been detected to contain errors by our tool.Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

The detection of possible $\gamma$ -ray quasi-periodic modulation with $\sim$ 600 days from the blazar S2 0109+22

Author: Dai Benzhong
Wu Fan
Zhang Haoyang
Publication venue: 'IOP Publishing'
Publication date: 20/06/2023
Field of study

In this work, we analyzed the long term gamma-ray data by a Fermi Large Area Telescope (Fermi-LAT) of blazar S2 0109+22, ranging from 2008 to 2023. The quasi-periodic oscillations (QPOs) of blazars aided in investigating the physical properties of internal supermassive black holes, the nature of variability, and the underlying radiation mechanism. We employed four different methods--Weighted Wavelet Z-transform, Lomb-Scargle periodogram, REDFIT and phase folded light curve analysis, for searching QPO signals. Our analysis identified a possible QPO behavior with a periodicity of

\sim

600 days in November 2013 to January 2023 at a significance level of 3.5

\sigma

. This QPO signal sustained

\sim

9 years, corresponding to 5.6 cycles, which was in good agreement with the previously observed of periodicity

\sim

657 days in radio. We explained this phenomenon based on the accretion model and the lighthouse effect, in a binary black hole system.Comment: 12 pages, 8 figures,3 tables,accepted for publication in PAS

arXiv.org e-Print Archive

GDN: A Stacking Network Used for Skin Cancer Diagnosis

Author: Shen Haoyang
Wang Ziyi
Wei Jingmin
Zhang Ziqian
Publication venue
Publication date: 04/12/2023
Field of study

Skin cancer, the primary type of cancer that can be identified by visual recognition, requires an automatic identification system that can accurately classify different types of lesions. This paper presents GoogLe-Dense Network (GDN), which is an image-classification model to identify two types of skin cancer, Basal Cell Carcinoma, and Melanoma. GDN uses stacking of different networks to enhance the model performance. Specifically, GDN consists of two sequential levels in its structure. The first level performs basic classification tasks accomplished by GoogLeNet and DenseNet, which are trained in parallel to enhance efficiency. To avoid low accuracy and long training time, the second level takes the output of the GoogLeNet and DenseNet as the input for a logistic regression model. We compare our method with four baseline networks including ResNet, VGGNet, DenseNet, and GoogLeNet on the dataset, in which GoogLeNet and DenseNet significantly outperform ResNet and VGGNet. In the second level, different stacking methods such as perceptron, logistic regression, SVM, decision trees and K-neighbor are studied in which Logistic Regression shows the best prediction result among all. The results prove that GDN, compared to a single network structure, has higher accuracy in optimizing skin cancer detection.Comment: Published at ICSPS 202

arXiv.org e-Print Archive

Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models

Author: Cheng Yuxuan
Liu Haoyang
Wang Haohan
Zhang Peiyan
Zhang Tong
Publication venue
Publication date: 27/11/2023
Field of study

In the field of computer graphics, the use of vector graphics, particularly Scalable Vector Graphics (SVG), represents a notable development from traditional pixel-based imagery. SVGs, with their XML-based format, are distinct in their ability to directly and explicitly represent visual elements such as shape, color, and path. This direct representation facilitates a more accurate and logical depiction of graphical elements, enhancing reasoning and interpretability. Recognizing the potential of SVGs, the machine learning community has introduced multiple methods for image vectorization. However, transforming images into SVG format while retaining the relational properties and context of the original scene remains a key challenge. Most vectorization methods often yield SVGs that are overly complex and not easily interpretable. In response to this challenge, we introduce our method, Simple-SVG-Generation (S\textsuperscript{2}VG\textsuperscript{2}). Our method focuses on producing SVGs that are both accurate and simple, aligning with human readability and understanding. With simple images, we evaluate our method with reasoning tasks together with advanced language models, the results show a clear improvement over previous SVG generation methods. We also conducted surveys for human evaluation on the readability of our generated SVGs, the results also favor our methods.Comment: 10 pages, 7 figure

arXiv.org e-Print Archive

VarifocalNet: An IoU-aware Dense Object Detector

Author: Dayoub Feras
Sünderhauf Niko
Wang Ying
Zhang Haoyang
Publication venue
Publication date: 01/01/2021
Field of study

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by

\sim

2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet .Comment: Accepted to CVPR 2021 as an ora

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL

Author: Chen Hong
Li Cuiping
Li Haoyang
Zhang Jing
Publication venue
Publication date: 10/04/2023
Field of study

One of the recent best attempts at Text-to-SQL is the pre-trained language model. Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i.e., tables and columns) and the skeleton (i.e., SQL keywords). Such coupled targets increase the difficulty of parsing the correct SQL queries especially when they involve many schema items and logic operators. This paper proposes a ranking-enhanced encoding and skeleton-aware decoding framework to decouple the schema linking and the skeleton parsing. Specifically, for a seq2seq encoder-decode model, its encoder is injected by the most relevant schema items instead of the whole unordered ones, which could alleviate the schema linking effort during SQL parsing, and its decoder first generates the skeleton and then the actual SQL query, which could implicitly constrain the SQL parsing. We evaluate our proposed framework on Spider and its three robustness variants: Spider-DK, Spider-Syn, and Spider-Realistic. The experimental results show that our framework delivers promising performance and robustness. Our code is available at https://github.com/RUCKBReasoning/RESDSQL.Comment: Accepted to AAAI 2023 main conference (oral

arXiv.org e-Print Archive

Graph Meets LLMs: Towards Large Graph Models

Author: Li Haoyang
Qin Yijian
Wang Xin
Zhang Zeyang
Zhang Ziwei
Zhu Wenwu
Publication venue
Publication date: 11/11/2023
Field of study

Large models have emerged as the most recent groundbreaking achievements in artificial intelligence, and particularly machine learning. However, when it comes to graphs, large models have not achieved the same level of success as in other fields, such as natural language processing and computer vision. In order to promote applying large models for graphs forward, we present a perspective paper to discuss the challenges and opportunities associated with developing large graph models. First, we discuss the desired characteristics of large graph models. Then, we present detailed discussions from three key perspectives: representation basis, graph data, and graph models. In each category, we provide a brief overview of recent advances and highlight the remaining challenges together with our visions. Finally, we discuss valuable applications of large graph models. We believe this perspective can encourage further investigations into large graph models, ultimately pushing us one step closer towards artificial general intelligence (AGI). We are the first to comprehensively study large graph models, to the best of our knowledge.Comment: Accepted by NeurIPS 2023 New Frontiers in Graph Learning Workshop. Comments are welcom

arXiv.org e-Print Archive