19 research outputs found
DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision
The goal of low-light image enhancement is to restore the color and details
of the image and is of great significance for high-level visual tasks in
autonomous driving. However, it is difficult to restore the lost details in the
dark area by relying only on the RGB domain. In this paper we introduce
frequency as a new clue into the network and propose a novel DCT-driven
enhancement transformer (DEFormer). First, we propose a learnable frequency
branch (LFB) for frequency enhancement contains DCT processing and
curvature-based frequency enhancement (CFE). CFE calculates the curvature of
each channel to represent the detail richness of different frequency bands,
then we divides the frequency features, which focuses on frequency bands with
richer textures. In addition, we propose a cross domain fusion (CDF) for
reducing the differences between the RGB domain and the frequency domain. We
also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively
improves the performance of the detector, bringing 2.1% and 3.4% improvement in
ExDark and DARK FACE datasets on mAP respectively.Comment: submit to ICRA202
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
We present DreamCraft3D, a hierarchical 3D content generation method that
produces high-fidelity and coherent 3D objects. We tackle the problem by
leveraging a 2D reference image to guide the stages of geometry sculpting and
texture boosting. A central focus of this work is to address the consistency
issue that existing works encounter. To sculpt geometries that render
coherently, we perform score distillation sampling via a view-dependent
diffusion model. This 3D prior, alongside several training strategies,
prioritizes the geometry consistency but compromises the texture fidelity. We
further propose Bootstrapped Score Distillation to specifically boost the
texture. We train a personalized diffusion model, Dreambooth, on the augmented
renderings of the scene, imbuing it with 3D knowledge of the scene being
optimized. The score distillation from this 3D-aware diffusion prior provides
view-consistent guidance for the scene. Notably, through an alternating
optimization of the diffusion prior and 3D scene representation, we achieve
mutually reinforcing improvements: the optimized 3D scene aids in training the
scene-specific diffusion model, which offers increasingly view-consistent
guidance for 3D optimization. The optimization is thus bootstrapped and leads
to substantial texture boosting. With tailored 3D priors throughout the
hierarchical generation, DreamCraft3D generates coherent 3D objects with
photorealistic renderings, advancing the state-of-the-art in 3D content
generation. Code available at https://github.com/deepseek-ai/DreamCraft3D.Comment: Project Page: https://mrtornado24.github.io/DreamCraft3D
DeepSeek-VL: Towards Real-World Vision-Language Understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed
for real-world vision and language understanding applications. Our approach is
structured around three key dimensions:
We strive to ensure our data is diverse, scalable, and extensively covers
real-world scenarios including web screenshots, PDFs, OCR, charts, and
knowledge-based content, aiming for a comprehensive representation of practical
contexts. Further, we create a use case taxonomy from real user scenarios and
construct an instruction tuning dataset accordingly. The fine-tuning with this
dataset substantially improves the model's user experience in practical
applications. Considering efficiency and the demands of most real-world
scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently
processes high-resolution images (1024 x 1024), while maintaining a relatively
low computational overhead. This design choice ensures the model's ability to
capture critical semantic and detailed information across various visual tasks.
We posit that a proficient Vision-Language Model should, foremost, possess
strong language abilities. To ensure the preservation of LLM capabilities
during pretraining, we investigate an effective VL pretraining strategy by
integrating LLM training from the beginning and carefully managing the
competitive dynamics observed between vision and language modalities.
The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user
experiences as a vision-language chatbot in real-world applications, achieving
state-of-the-art or competitive performance across a wide range of
visual-language benchmarks at the same model size while maintaining robust
performance on language-centric benchmarks. We have made both 1.3B and 7B
models publicly accessible to foster innovations based on this foundation
model.Comment: https://github.com/deepseek-ai/DeepSeek-V
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
The rapid development of open-source large language models (LLMs) has been
truly remarkable. However, the scaling law described in previous literature
presents varying conclusions, which casts a dark cloud over scaling LLMs. We
delve into the study of scaling laws and present our distinctive findings that
facilitate scaling of large scale models in two commonly used open-source
configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek
LLM, a project dedicated to advancing open-source language models with a
long-term perspective. To support the pre-training phase, we have developed a
dataset that currently consists of 2 trillion tokens and is continuously
expanding. We further conduct supervised fine-tuning (SFT) and Direct
Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the
creation of DeepSeek Chat models. Our evaluation results demonstrate that
DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in
the domains of code, mathematics, and reasoning. Furthermore, open-ended
evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance
compared to GPT-3.5
Redundancy Analysis of Capacitance Data of a Coplanar Electrode Array for Fast and Stable Imaging Processing
A coplanar electrode array sensor is established for the imaging of composite-material adhesive-layer defect detection. The sensor is based on the capacitive edge effect, which leads to capacitance data being considerably weak and susceptible to environmental noise. The inverse problem of coplanar array electrical capacitance tomography (C-ECT) is ill-conditioning, in which a small error of capacitance data can seriously affect the quality of reconstructed images. In order to achieve a stable image reconstruction process, a redundancy analysis method for capacitance data is proposed. The proposed method is based on contribution rate and anti-interference capability. According to the redundancy analysis, the capacitance data are divided into valid and invalid data. When the image is reconstructed by valid data, the sensitivity matrix needs to be changed accordingly. In order to evaluate the effectiveness of the sensitivity map, singular value decomposition (SVD) is used. Finally, the two-dimensional (2D) and three-dimensional (3D) images are reconstructed by the Tikhonov regularization method. Through comparison of the reconstructed images of raw capacitance data, the stability of the image reconstruction process can be improved, and the quality of reconstructed images is not degraded. As a result, much invalid data are not collected, and the data acquisition time can also be reduced
Cellular alterations and crosstalk in the osteochondral joint in osteoarthritis and promising therapeutic strategies
Osteoarthritis (OA) is a joint disorder involving cartilage degeneration and subchondral bone sclerosis. The bone-cartilage interface is implicated in OA pathogenesis due to its susceptibility to mechanical and biological factors. The crosstalk between cartilage and the underlying subchondral bone is elevated in OA due to multiple factors, such as increased vascularization, porosity, microcracks and fissures. Changes in the osteochondral joint are traceable to alterations in chondrocytes and bone cells (osteoblasts, osteocytes and osteoclasts). The phenotypes of these cells can change with the progression of OA. Aberrant intercellular communications among bone cell-bone cell and bone cell-chondrocyte are of great importance and might be factors OA development. An appreciation of cellular phenotypic changes in OA and the mechanisms by which these cells communicate would be expected to lead to the development of targeted drugs with fewer side effects
Microscale Corrosion Inhibition Behavior of Four Corrosion Inhibitors (BTA, MBI, MBT, and MBO) on Archeological Silver Artifacts Based on Scanning Electrochemical Cell Microscopy
The problem of corrosion-induced discoloration and embrittlement
in silverware is a significant concern for the long-term preservation
of excavated archeological silver artifacts, even after thermal restoration.
The key to addressing this issue lies in the meticulous selection
and evaluation of corrosion inhibitors that possess targeted corrosion
inhibition capabilities. This study focuses on the evaluation of corrosion
inhibitors for archeological silver artifacts using scanning electrochemical
cell microscopy (SECCM) and X-ray photoelectron spectroscopy (XPS).
The researchers aimed to compare the inhibition effects of four corrosion
inhibitors [1,2,3-benzotriazole (BTA), 2-mercaptobenzimidazole (MBI),
2-mercaptobenzothiazole (MBT), and 2-mercaptobenzoxazole (MBO)] on
a simulated AgāCu alloy sample and understand their mechanisms.
The results showed that MBT exhibited better corrosion inhibition
for microstructural regions with higher silver content due to its
ability to form stable chelation structures with Ag(I). MBO exhibited
better corrosion inhibition for microstructural regions with higher
copper content due to its strong affinity with Cu(I). The targeted
corrosion inhibition ability for the Ī²-phase was ranked as MBO
> BTA ā MBI > MBT, while for the Ī±-phase the ranking
was MBT > MBO > MBI > BTA. The study demonstrated the feasibility
and capabilities of SECCM in the targeted screening of corrosion inhibitors
for different compositions and microstructural regions in archeological
metal artifacts. This study highlights the potential of SECCM in corrosion
inhibitor research for archeological metal artifacts and wider applications
in metal material corrosion protection
Metallurgically lithiated SiOx anode with high capacity and ambient air compatibility
A common issue plaguing battery anodes is the large consumption of lithium in the initial cycle as a result of the formation of a solid electrolyte interphase followed by gradual loss in subsequent cycles. It presents a need for prelithiation to compensate for the loss. However, anode prelithiation faces the challenge of high chemical reactivity because of the low anode potential. Previous efforts have produced prelithiated Si nanoparticles with dry air stability, which cannot be stabilized under ambient air. Here, we developed a one-pot metallurgical process to synthesize Lix Si/Li2 O composites by using low-cost SiO or SiO2 as the starting material. The resulting composites consist of homogeneously dispersed Lix Si nanodomains embedded in a highly crystalline Li2 O matrix, providing the composite excellent stability even in ambient air with 40% relative humidity. The composites are readily mixed with various anode materials to achieve high first cycle Coulombic efficiency (CE) of >100% or serve as an excellent anode material by itself with stable cyclability and consistently high CEs (99.81% at the seventh cycle and ???99.87% for subsequent cycles). Therefore, Lix Si/Li2 O composites achieved balanced reactivity and stability, promising a significant boost to lithium ion batteries.clos
Arbitrary coherent distributions in a programmable quantum walk
The coherent superposition of position states in a quantum walk (QW) can be precisely engineered towards the desired distributions to meet the need of quantum information applications. The coherent distribution can make full use of quantum parallel in computation and simulation. Particularly, the uniform superposition provides the robust nonlocality, which has wide applications such as the generation of genuine multibit random numbers without postprocessing. We experimentally demonstrate that the rich dynamics featured with arbitrary coherent distributions can be obtained by introducing different sets of the time- and position-dependent operations. Such a QW is realized by a resource-constant and flexible optical circuit, in which the variable operation is executed based on a Sagnac interferometer in an intrinsically stable and precisely controlled way. Our results contribute to the practical realization of quantum-walk-based quantum computation, quantum simulations, and quantum information protocols