26,717 research outputs found

    Text to Image Synthesis via Mask Anchor Points and Aesthetic Assessment

    Get PDF
    Text-to-image is a process of generating an image from the input text. It has a variety of applications in art generation, computer-aided design, and photo-editing. In this thesis, we propose a new framework that leverages mask anchor points to incorporate two major steps in the image synthesis. In the first step, the mask image is generated from the input text and the mask dataset. In the second step, the mask image is fed into the state-of-the-art mask-to-image generator. Note that the mask image captures the semantic information and the location relationship via the anchor points. We develop a user-friendly interface that helps parse the input text into the meaningful semantic objects. However, to synthesize an appealing image from the text, image aesthetics criteria should be considered. Therefore, we further improve our proposed framework by incorporating the aesthetic assessment from photography composition rules. To this end, we randomize a set of mask maps from the input text via the anchor point-based mask map generator, and then we compute and rank the image aesthetics score for all generated mask maps following two composition rules, namely, the rule of thirds along with the rule of formal balance. In the next stage, we feed the subset of the mask maps, which are the highest, lowest, and the average aesthetic scores, into the state-of-the-art mask-to-image generator via image generator. The photorealistic images are further re-ranked to obtain the synthesized image with the highest aesthetic score. Thus, to overcome the state-of-the-arts generated images’ problems such as the un-naturality, the ambiguity, and the distortion, we propose a new framework. Our framework maintains the clarity of the entities’ shape, the details of the entity edges, and the proper layout no matter how complex the input text is and how many entities and spatial relations in the text. The experiments on the most challenging COCO-stuff dataset illustrates the superiority of our proposed approach over the previous state of the arts.https://ecommons.udayton.edu/stander_posters/3465/thumbnail.jp

    FoveaBox: Beyond Anchor-based Object Detector

    Full text link
    We present FoveaBox, an accurate, flexible, and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations. In FoveaBox, an instance is assigned to adjacent feature levels to make the model more accurate.We demonstrate its effectiveness on standard benchmarks and report extensive experimental analysis. Without bells and whistles, FoveaBox achieves state-of-the-art single model performance on the standard COCO and Pascal VOC object detection benchmark. More importantly, FoveaBox avoids all computation and hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. We believe the simple and effective approach will serve as a solid baseline and help ease future research for object detection. The code has been made publicly available at https://github.com/taokong/FoveaBox .Comment: IEEE Transactions on Image Processing, code at: https://github.com/taokong/FoveaBo
    • …
    corecore