11 research outputs found
A Multi-Implicit Neural Representation for Fonts
Fonts are ubiquitous across documents and come in a variety of styles. They
are either represented in a native vector format or rasterized to produce fixed
resolution images. In the first case, the non-standard representation prevents
benefiting from latest network architectures for neural representations; while,
in the latter case, the rasterized representation, when encoded via networks,
results in loss of data fidelity, as font-specific discontinuities like edges
and corners are difficult to represent using neural networks. Based on the
observation that complex fonts can be represented by a superposition of a set
of simpler occupancy functions, we introduce \textit{multi-implicits} to
represent fonts as a permutation-invariant set of learned implict functions,
without losing features (e.g., edges and corners). However, while
multi-implicits locally preserve font features, obtaining supervision in the
form of ground truth multi-channel signals is a problem in itself. Instead, we
propose how to train such a representation with only local supervision, while
the proposed neural architecture directly finds globally consistent
multi-implicits for font families. We extensively evaluate the proposed
representation for various tasks including reconstruction, interpolation, and
synthesis to demonstrate clear advantages with existing alternatives.
Additionally, the representation naturally enables glyph completion, wherein a
single characteristic font is used to synthesize a whole font family in the
target style
Recognizing Vector Graphics without Rasterization
In this paper, we consider a different data format for images: vector graphics. In contrast to raster graphics which are widely used in image recognition, vector graphics can be scaled up or down into any resolution without aliasing or information loss, due to the analytic representation of the primitives in the document. Furthermore, vector graphics are able to give extra structural information on how low-level elements group together to form high level shapes or structures. These merits of graphic vectors have not been fully leveraged in existing methods. To explore this data format, we target on the fundamental recognition tasks: object localization and classification. We propose an efficient CNN-free pipeline that does not render the graphic into pixels (i.e. rasterization), and takes textual document of the vector graphics as input, called YOLaT (You Only Look at Text). YOLaT builds multi-graphs to model the structural and spatial information in vector graphics, and a dual-stream graph neural network is proposed to detect objects from the graph. Our experiments show that by directly operating on vector graphics, YOLaT outperforms raster-graphic based object detection baselines in terms of both average precision and efficiency. Code is available at https://github.com/microsoft/YOLaT-VectorGraphicsRecognition
AI-generated Content for Various Data Modalities: A Survey
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions