183 research outputs found
A Unified Framework to Super-Resolve Face Images of Varied Low Resolutions
The existing face image super-resolution (FSR) algorithms usually train a
specific model for a specific low input resolution for optimal results. By
contrast, we explore in this work a unified framework that is trained once and
then used to super-resolve input face images of varied low resolutions. For
that purpose, we propose a novel neural network architecture that is composed
of three anchor auto-encoders, one feature weight regressor and a final image
decoder. The three anchor auto-encoders are meant for optimal FSR for three
pre-defined low input resolutions, or named anchor resolutions, respectively.
An input face image of an arbitrary low resolution is firstly up-scaled to the
target resolution by bi-cubic interpolation and then fed to the three
auto-encoders in parallel. The three encoded anchor features are then fused
with weights determined by the feature weight regressor. At last, the fused
feature is sent to the final image decoder to derive the super-resolution
result. As shown by experiments, the proposed algorithm achieves robust and
state-of-the-art performance over a wide range of low input resolutions by a
single framework. Code and models will be made available after the publication
of this work
DreamLLM: Synergistic Multimodal Comprehension and Creation
This paper presents DreamLLM, a learning framework that first achieves
versatile Multimodal Large Language Models (MLLMs) empowered with frequently
overlooked synergy between multimodal comprehension and creation. DreamLLM
operates on two fundamental principles. The first focuses on the generative
modeling of both language and image posteriors by direct sampling in the raw
multimodal space. This approach circumvents the limitations and information
loss inherent to external feature extractors like CLIP, and a more thorough
multimodal understanding is obtained. Second, DreamLLM fosters the generation
of raw, interleaved documents, modeling both text and image contents, along
with unstructured layouts. This allows DreamLLM to learn all conditional,
marginal, and joint multimodal distributions effectively. As a result, DreamLLM
is the first MLLM capable of generating free-form interleaved content.
Comprehensive experiments highlight DreamLLM's superior performance as a
zero-shot multimodal generalist, reaping from the enhanced learning synergy.Comment: see project page at https://dreamllm.github.io
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Large Language Models (LLMs) have taken Knowledge Representation -- and the
world -- by storm. This inflection point marks a shift from explicit knowledge
representation to a renewed focus on the hybrid representation of both explicit
knowledge and parametric knowledge. In this position paper, we will discuss
some of the common debate points within the community on LLMs (parametric
knowledge) and Knowledge Graphs (explicit knowledge) and speculate on
opportunities and visions that the renewed focus brings, as well as related
research topics and challenges.Comment: 30 page
Large Scale Surface Reconstruction based on Point Visibility
Katedra kybernetik
Federated Domain Generalization: A Survey
Machine learning typically relies on the assumption that training and testing
distributions are identical and that data is centrally stored for training and
testing. However, in real-world scenarios, distributions may differ
significantly and data is often distributed across different devices,
organizations, or edge nodes. Consequently, it is imperative to develop models
that can effectively generalize to unseen distributions where data is
distributed across different domains. In response to this challenge, there has
been a surge of interest in federated domain generalization (FDG) in recent
years. FDG combines the strengths of federated learning (FL) and domain
generalization (DG) techniques to enable multiple source domains to
collaboratively learn a model capable of directly generalizing to unseen
domains while preserving data privacy. However, generalizing the federated
model under domain shifts is a technically challenging problem that has
received scant attention in the research area so far. This paper presents the
first survey of recent advances in this area. Initially, we discuss the
development process from traditional machine learning to domain adaptation and
domain generalization, leading to FDG as well as provide the corresponding
formal definition. Then, we categorize recent methodologies into four classes:
federated domain alignment, data manipulation, learning strategies, and
aggregation optimization, and present suitable algorithms in detail for each
category. Next, we introduce commonly used datasets, applications, evaluations,
and benchmarks. Finally, we conclude this survey by providing some potential
research topics for the future
A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge
A vector database is used to store high-dimensional data that cannot be
characterized by traditional DBMS. Although there are not many articles
describing existing or introducing new vector database architectures, the
approximate nearest neighbor search problem behind vector databases has been
studied for a long time, and considerable related algorithmic articles can be
found in the literature. This article attempts to comprehensively review
relevant algorithms to provide a general understanding of this booming research
area. The basis of our framework categorises these studies by the approach of
solving ANNS problem, respectively hash-based, tree-based, graph-based and
quantization-based approaches. Then we present an overview of existing
challenges for vector databases. Lastly, we sketch how vector databases can be
combined with large language models and provide new possibilities
Development of Some Spatial-domain Preprocessing and Post-processing Algorithms for Better 2-D Up-scaling
Image super-resolution is an area of great interest in recent years and is extensively used in applications like video streaming, multimedia, internet technologies, consumer electronics, display and printing industries. Image super-resolution is a process of increasing the resolution of a given image without losing its integrity. Its most common application is to provide better visual effect after resizing a digital image for display or printing. One of the
methods of improving the image resolution is through the employment of a 2-D interpolation.
An up-scaled image should retain all the image details with very less degree of blurring meant for better visual quality. In literature, many efficient 2-D interpolation
schemes are found that well preserve the image details in the up-scaled images; particularly at the regions with edges and fine details. Nevertheless, these existing interpolation schemes too give blurring effect in the up-scaled images due to the high frequency (HF) degradation during the up-sampling process. Hence, there is a scope to further improve their performance through the incorporation of various spatial domain pre-processing, post-processing and composite algorithms.
Therefore, it is felt that there is sufficient scope to develop various efficient but simple pre-processing, post-processing and composite schemes to effectively restore the HF contents in the up-scaled images for various online and off-line applications. An efficient and widely used Lanczos-3 interpolation is taken for further performance improvement through the incorporation of various proposed algorithms.
The various pre-processing algorithms developed in this thesis are summarized here. The term pre-processing refers to processing the low-resolution input image prior to image
up-scaling. The various pre-processing algorithms proposed in this thesis are: Laplacian of Laplacian based global pre-processing (LLGP) scheme; Hybrid global pre-processing (HGP); Iterative Laplacian of Laplacian based global pre-processing (ILLGP); Unsharp masking based pre-processing (UMP); Iterative unsharp masking (IUM); Error based up-sampling(EU) scheme.
The proposed algorithms: LLGP, HGP and ILLGP are three spatial domain preprocessing algorithms which are based on 4th, 6th and 8th order derivatives to alleviate nonuniform
blurring in up-scaled images. These algorithms are used to obtain the high frequency (HF) extracts from an image by employing higher order derivatives and perform precise
sharpening on a low resolution image to alleviate the blurring in its 2-D up-sampled counterpart.
In case of unsharp masking based pre-processing (UMP) scheme, the blurred version of a low resolution image is used for HF extraction from the original version through image subtraction. The weighted version of the HF extracts are superimposed with the original image to produce a sharpened image prior to image up-scaling to counter blurring effectively.
IUM makes use of many iterations to generate an unsharp mask which contains very high frequency (VHF) components. The VHF extract is the result of signal decomposition in terms of sub-bands using the concept of analysis filter bank. Since the degradation of VHF components is maximum, restoration of such components would produce much better restoration performance.
EU is another pre-processing scheme in which the HF degradation due to image upscaling is extracted and is called prediction error. The prediction error contains the lost high frequency components. When this error is superimposed on the low resolution image prior to image up-sampling, blurring is considerably reduced in the up-scaled images.
Various post-processing algorithms developed in this thesis are summarized in following. The term post-processing refers to processing the high resolution up-scaled image. The various post-processing algorithms proposed in this thesis are: Local adaptive Laplacian (LAL); Fuzzy weighted Laplacian (FWL); Legendre functional link artificial neural network(LFLANN).
LAL is a non-fuzzy, local based scheme. The local regions of an up-scaled image with high variance are sharpened more than the region with moderate or low variance by employing a local adaptive Laplacian kernel. The weights of the LAL kernel are varied as per the normalized local variance so as to provide more degree of HF enhancement to high variance regions than the low variance counterpart to effectively counter the non-uniform blurring. Furthermore, FWL post-processing scheme with a higher degree of non-linearity is
proposed to further improve the performance of LAL. FWL, being a fuzzy based mapping scheme, is highly nonlinear to resolve the blurring problem more effectively than LAL which
employs a linear mapping.
Another LFLANN based post-processing scheme is proposed here to minimize the cost function so as to reduce the blurring in a 2-D up-scaled image. Legendre polynomials are
used for functional expansion of the input pattern-vector and provide high degree of nonlinearity. Therefore, the requirement of multiple layers can be replaced by single layer LFLANN architecture so as to reduce the cost function effectively for better restoration performance. With single layer architecture, it has reduced the computational complexity and hence is suitable for various real-time applications.
There is a scope of further improvement of the stand-alone pre-processing and postprocessing schemes by combining them through composite schemes. Here, two spatial domain composite schemes, CS-I and CS-II are proposed to tackle non-uniform blurring in an up-scaled image. CS-I is developed by combining global iterative Laplacian (GIL) preprocessing scheme with LAL post-processing scheme. Another highly nonlinear composite
scheme, CS-II is proposed which combines ILLGP scheme with a fuzzy weighted Laplacian post-processing scheme for more improved performance than the stand-alone schemes.
Finally, it is observed that the proposed algorithms: ILLGP, IUM, FWL, LFLANN and CS-II are better algorithms in their respective categories for effectively reducing blurring in the up-scaled images
DORIS-MAE: Scientific Document Retrieval using Multi-level Aspect-based Queries
In scientific research, the ability to effectively retrieve relevant
documents based on complex, multifaceted queries is critical. Existing
evaluation datasets for this task are limited, primarily due to the high cost
and effort required to annotate resources that effectively represent complex
queries. To address this, we propose a novel task, Scientific DOcument
Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed
to handle the complex nature of user queries in scientific research. We
developed a benchmark dataset within the field of computer science, consisting
of 100 human-authored complex query cases. For each complex query, we assembled
a collection of 100 relevant documents and produced annotated relevance scores
for ranking them. Recognizing the significant labor of expert annotation, we
also introduce Anno-GPT, a scalable framework for validating the performance of
Large Language Models (LLMs) on expert-level dataset annotation tasks. LLM
annotation of the DORIS-MAE dataset resulted in a 500x reduction in cost,
without compromising quality. Furthermore, due to the multi-tiered structure of
these complex queries, the DORIS-MAE dataset can be extended to over 4,000
sub-query test cases without requiring additional annotation. We evaluated 17
recent retrieval methods on DORIS-MAE, observing notable performance drops
compared to traditional datasets. This highlights the need for better
approaches to handle complex, multifaceted queries in scientific research. Our
dataset and codebase are available at
https://github.com/Real-Doris-Mae/Doris-Mae-Dataset.Comment: To appear in NeurIPS 2023 Datasets and Benchmarks Trac
- …