410 research outputs found

    Learning to Generate and Refine Object Proposals

    Get PDF
    Visual object recognition is a fundamental and challenging problem in computer vision. To build a practical recognition system, one is first confronted with high computation complexity due to an enormous search space from an image, which is caused by large variations in object appearance, pose and mutual occlusion, as well as other environmental factors. To reduce the search complexity, a moderate set of image regions that are likely to contain an object, regardless of its category, are usually first generated in modern object recognition subsystems. These possible object regions are called object proposals, object hypotheses or object candidates, which can be used for down-stream classification or global reasoning in many different vision tasks like object detection, segmentation and tracking, etc. This thesis addresses the problem of object proposal generation, including bounding box and segment proposal generation, in real-world scenarios. In particular, we investigate the representation learning in object proposal generation with 3D cues and contextual information, aiming to propose higher-quality object candidates which have higher object recall, better boundary coverage and lower number. We focus on three main issues: 1) how can we incorporate additional geometric and high-level semantic context information into the proposal generation for stereo images? 2) how do we generate object segment proposals for stereo images with learning representations and learning grouping process? and 3) how can we learn a context-driven representation to refine segment proposals efficiently? In this thesis, we propose a series of solutions to address each of the raised problems. We first propose a semantic context and depth-aware object proposal generation method. We design a set of new cues to encode the objectness, and then train an efficient random forest classifier to re-rank the initial proposals and linear regressors to fine-tune their locations. Next, we extend the task to the segment proposal generation in the same setting and develop a learning-based segment proposal generation method for stereo images. Our method makes use of learned deep features and designed geometric features to represent a region and learns a similarity network to guide the superpixel grouping process. We also learn a ranking network to predict the objectness score for each segment proposal. To address the third problem, we take a transformation-based approach to improve the quality of a given segment candidate pool based on context information. We propose an efficient deep network that learns affine transformations to warp an initial object mask towards nearby object region, based on a novel feature pooling strategy. Finally, we extend our affine warping approach to address the object-mask alignment problem and particularly the problem of refining a set of segment proposals. We design an end-to-end deep spatial transformer network that learns free-form deformations (FFDs) to non-rigidly warp the shape mask towards the ground truth, based on a multi-level dual mask feature pooling strategy. We evaluate all our approaches on several publicly available object recognition datasets and show superior performance

    Using concolic execution to identify IA32 program errors

    Get PDF
    In computer science education, one of the most important tasks is to provide students with feedback that can help them discover errors in their assignment code. Traditionally, this check is achieved by executing a series of pre-defined test cases. But many bugs are not easily exposed by such test cases, which are thus insufficient for fair grading. Furthermore, failed test cases give students little feedback as to how to fix their code. In the last decade, tools have been developed for code testing that aim at achieving high code coverage even in strict environments, such as interacting with the operating system. These tools can be helpful if applied in computer science education. Among these tools, KLEE is particularly designed for improving control flow paths coverage by exploring different execution paths in the program using concolic execution. In this thesis, we investigate the possibility of using concolic execution with KLEE to generate feedback for student assignments written in IA32 (32-bit version of x86) assembly, like the MP1 in our operating systems course (ECE391). By developing tools for lexical and control flow analysis to translate IA32 to C, we were able to take advantage of KLEE to explore the program’s execution path thoroughly to generate test cases and feedback that can be helpful for students to detect problems in their programs. The initial test shows that among the 180 student codes, our tool picked up 139 cases that contain errors compared to the 105 cases that got picked up by the normal grader, and that all student codes that have errors detected by the grader have been detected to contain errors by our tool.Ope

    The detection of possible γ\gamma-ray quasi-periodic modulation with ∼\sim600 days from the blazar S2 0109+22

    Full text link
    In this work, we analyzed the long term gamma-ray data by a Fermi Large Area Telescope (Fermi-LAT) of blazar S2 0109+22, ranging from 2008 to 2023. The quasi-periodic oscillations (QPOs) of blazars aided in investigating the physical properties of internal supermassive black holes, the nature of variability, and the underlying radiation mechanism. We employed four different methods--Weighted Wavelet Z-transform, Lomb-Scargle periodogram, REDFIT and phase folded light curve analysis, for searching QPO signals. Our analysis identified a possible QPO behavior with a periodicity of ∼\sim600 days in November 2013 to January 2023 at a significance level of 3.5 σ\sigma. This QPO signal sustained ∼\sim9 years, corresponding to 5.6 cycles, which was in good agreement with the previously observed of periodicity ∼\sim657 days in radio. We explained this phenomenon based on the accretion model and the lighthouse effect, in a binary black hole system.Comment: 12 pages, 8 figures,3 tables,accepted for publication in PAS

    Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models

    Full text link
    In the field of computer graphics, the use of vector graphics, particularly Scalable Vector Graphics (SVG), represents a notable development from traditional pixel-based imagery. SVGs, with their XML-based format, are distinct in their ability to directly and explicitly represent visual elements such as shape, color, and path. This direct representation facilitates a more accurate and logical depiction of graphical elements, enhancing reasoning and interpretability. Recognizing the potential of SVGs, the machine learning community has introduced multiple methods for image vectorization. However, transforming images into SVG format while retaining the relational properties and context of the original scene remains a key challenge. Most vectorization methods often yield SVGs that are overly complex and not easily interpretable. In response to this challenge, we introduce our method, Simple-SVG-Generation (S\textsuperscript{2}VG\textsuperscript{2}). Our method focuses on producing SVGs that are both accurate and simple, aligning with human readability and understanding. With simple images, we evaluate our method with reasoning tasks together with advanced language models, the results show a clear improvement over previous SVG generation methods. We also conducted surveys for human evaluation on the readability of our generated SVGs, the results also favor our methods.Comment: 10 pages, 7 figure

    VarifocalNet: An IoU-aware Dense Object Detector

    Full text link
    Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by ∼\sim2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet .Comment: Accepted to CVPR 2021 as an ora

    RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL

    Full text link
    One of the recent best attempts at Text-to-SQL is the pre-trained language model. Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i.e., tables and columns) and the skeleton (i.e., SQL keywords). Such coupled targets increase the difficulty of parsing the correct SQL queries especially when they involve many schema items and logic operators. This paper proposes a ranking-enhanced encoding and skeleton-aware decoding framework to decouple the schema linking and the skeleton parsing. Specifically, for a seq2seq encoder-decode model, its encoder is injected by the most relevant schema items instead of the whole unordered ones, which could alleviate the schema linking effort during SQL parsing, and its decoder first generates the skeleton and then the actual SQL query, which could implicitly constrain the SQL parsing. We evaluate our proposed framework on Spider and its three robustness variants: Spider-DK, Spider-Syn, and Spider-Realistic. The experimental results show that our framework delivers promising performance and robustness. Our code is available at https://github.com/RUCKBReasoning/RESDSQL.Comment: Accepted to AAAI 2023 main conference (oral

    Nano-additive manufacturing of multilevel strengthened aluminum matrix composites

    Get PDF
    Nanostructured materials are being actively developed, while it remains an open question how to rapidly scale them up to bulk engineering materials for broad industrial applications. This study propose an industrial approach to rapidly fabricate high-strength large-size nanostructured metal matrix composites and attempts to investigate and optimize the deposition process and strengthening mechanism. Here, advanced nanocrystalline aluminum matrix composites (nanoAMCs) were assembled for the first time by a novel nano-additive manufacturing method that was guided by numerical simulations (i.e. the in-flight particle model and the porefree deposition model). The present nanoAMC with a mean grain size <50 nm in matrix exhibited hardness eight times higher than the bulk aluminum and shows the highest hardness among all Al–Al2O3 composites reported to date in the literature, which are the outcome of controlling multiscale strengthening mechanisms from tailoring solution atoms, dislocations, grain boundaries, precipitates, and externally introduced reinforcing particles. The present high-throughput strategy and method can be extended to design and architect advanced coatings or bulk materials in a highly efficient (synthesizing a nanostructured bulk with dimensions of 50 × 20 × 4 mm3 in 9 min) and highly flexible (regulating the gradient microstructures in bulk) way, which is conducive to industrial production and application

    Graph Meets LLMs: Towards Large Graph Models

    Full text link
    Large models have emerged as the most recent groundbreaking achievements in artificial intelligence, and particularly machine learning. However, when it comes to graphs, large models have not achieved the same level of success as in other fields, such as natural language processing and computer vision. In order to promote applying large models for graphs forward, we present a perspective paper to discuss the challenges and opportunities associated with developing large graph models. First, we discuss the desired characteristics of large graph models. Then, we present detailed discussions from three key perspectives: representation basis, graph data, and graph models. In each category, we provide a brief overview of recent advances and highlight the remaining challenges together with our visions. Finally, we discuss valuable applications of large graph models. We believe this perspective can encourage further investigations into large graph models, ultimately pushing us one step closer towards artificial general intelligence (AGI). We are the first to comprehensively study large graph models, to the best of our knowledge.Comment: Accepted by NeurIPS 2023 New Frontiers in Graph Learning Workshop. Comments are welcom
    • …
    corecore