19 research outputs found

    DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

    Full text link
    The goal of low-light image enhancement is to restore the color and details of the image and is of great significance for high-level visual tasks in autonomous driving. However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain. In this paper we introduce frequency as a new clue into the network and propose a novel DCT-driven enhancement transformer (DEFormer). First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE). CFE calculates the curvature of each channel to represent the detail richness of different frequency bands, then we divides the frequency features, which focuses on frequency bands with richer textures. In addition, we propose a cross domain fusion (CDF) for reducing the differences between the RGB domain and the frequency domain. We also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively improves the performance of the detector, bringing 2.1% and 3.4% improvement in ExDark and DARK FACE datasets on mAP respectively.Comment: submit to ICRA202

    DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

    Full text link
    We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation. Code available at https://github.com/deepseek-ai/DreamCraft3D.Comment: Project Page: https://mrtornado24.github.io/DreamCraft3D

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

    Full text link
    We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.Comment: https://github.com/deepseek-ai/DeepSeek-V

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Full text link
    The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5

    Redundancy Analysis of Capacitance Data of a Coplanar Electrode Array for Fast and Stable Imaging Processing

    Full text link
    A coplanar electrode array sensor is established for the imaging of composite-material adhesive-layer defect detection. The sensor is based on the capacitive edge effect, which leads to capacitance data being considerably weak and susceptible to environmental noise. The inverse problem of coplanar array electrical capacitance tomography (C-ECT) is ill-conditioning, in which a small error of capacitance data can seriously affect the quality of reconstructed images. In order to achieve a stable image reconstruction process, a redundancy analysis method for capacitance data is proposed. The proposed method is based on contribution rate and anti-interference capability. According to the redundancy analysis, the capacitance data are divided into valid and invalid data. When the image is reconstructed by valid data, the sensitivity matrix needs to be changed accordingly. In order to evaluate the effectiveness of the sensitivity map, singular value decomposition (SVD) is used. Finally, the two-dimensional (2D) and three-dimensional (3D) images are reconstructed by the Tikhonov regularization method. Through comparison of the reconstructed images of raw capacitance data, the stability of the image reconstruction process can be improved, and the quality of reconstructed images is not degraded. As a result, much invalid data are not collected, and the data acquisition time can also be reduced

    Cellular alterations and crosstalk in the osteochondral joint in osteoarthritis and promising therapeutic strategies

    Full text link
    Osteoarthritis (OA) is a joint disorder involving cartilage degeneration and subchondral bone sclerosis. The bone-cartilage interface is implicated in OA pathogenesis due to its susceptibility to mechanical and biological factors. The crosstalk between cartilage and the underlying subchondral bone is elevated in OA due to multiple factors, such as increased vascularization, porosity, microcracks and fissures. Changes in the osteochondral joint are traceable to alterations in chondrocytes and bone cells (osteoblasts, osteocytes and osteoclasts). The phenotypes of these cells can change with the progression of OA. Aberrant intercellular communications among bone cell-bone cell and bone cell-chondrocyte are of great importance and might be factors OA development. An appreciation of cellular phenotypic changes in OA and the mechanisms by which these cells communicate would be expected to lead to the development of targeted drugs with fewer side effects

    Microscale Corrosion Inhibition Behavior of Four Corrosion Inhibitors (BTA, MBI, MBT, and MBO) on Archeological Silver Artifacts Based on Scanning Electrochemical Cell Microscopy

    Full text link
    The problem of corrosion-induced discoloration and embrittlement in silverware is a significant concern for the long-term preservation of excavated archeological silver artifacts, even after thermal restoration. The key to addressing this issue lies in the meticulous selection and evaluation of corrosion inhibitors that possess targeted corrosion inhibition capabilities. This study focuses on the evaluation of corrosion inhibitors for archeological silver artifacts using scanning electrochemical cell microscopy (SECCM) and X-ray photoelectron spectroscopy (XPS). The researchers aimed to compare the inhibition effects of four corrosion inhibitors [1,2,3-benzotriazole (BTA), 2-mercaptobenzimidazole (MBI), 2-mercaptobenzothiazole (MBT), and 2-mercaptobenzoxazole (MBO)] on a simulated Agā€“Cu alloy sample and understand their mechanisms. The results showed that MBT exhibited better corrosion inhibition for microstructural regions with higher silver content due to its ability to form stable chelation structures with Ag(I). MBO exhibited better corrosion inhibition for microstructural regions with higher copper content due to its strong affinity with Cu(I). The targeted corrosion inhibition ability for the Ī²-phase was ranked as MBO > BTA ā‰ˆ MBI > MBT, while for the Ī±-phase the ranking was MBT > MBO > MBI > BTA. The study demonstrated the feasibility and capabilities of SECCM in the targeted screening of corrosion inhibitors for different compositions and microstructural regions in archeological metal artifacts. This study highlights the potential of SECCM in corrosion inhibitor research for archeological metal artifacts and wider applications in metal material corrosion protection

    Metallurgically lithiated SiOx anode with high capacity and ambient air compatibility

    Full text link
    A common issue plaguing battery anodes is the large consumption of lithium in the initial cycle as a result of the formation of a solid electrolyte interphase followed by gradual loss in subsequent cycles. It presents a need for prelithiation to compensate for the loss. However, anode prelithiation faces the challenge of high chemical reactivity because of the low anode potential. Previous efforts have produced prelithiated Si nanoparticles with dry air stability, which cannot be stabilized under ambient air. Here, we developed a one-pot metallurgical process to synthesize Lix Si/Li2 O composites by using low-cost SiO or SiO2 as the starting material. The resulting composites consist of homogeneously dispersed Lix Si nanodomains embedded in a highly crystalline Li2 O matrix, providing the composite excellent stability even in ambient air with 40% relative humidity. The composites are readily mixed with various anode materials to achieve high first cycle Coulombic efficiency (CE) of >100% or serve as an excellent anode material by itself with stable cyclability and consistently high CEs (99.81% at the seventh cycle and ???99.87% for subsequent cycles). Therefore, Lix Si/Li2 O composites achieved balanced reactivity and stability, promising a significant boost to lithium ion batteries.clos

    Arbitrary coherent distributions in a programmable quantum walk

    Full text link
    The coherent superposition of position states in a quantum walk (QW) can be precisely engineered towards the desired distributions to meet the need of quantum information applications. The coherent distribution can make full use of quantum parallel in computation and simulation. Particularly, the uniform superposition provides the robust nonlocality, which has wide applications such as the generation of genuine multibit random numbers without postprocessing. We experimentally demonstrate that the rich dynamics featured with arbitrary coherent distributions can be obtained by introducing different sets of the time- and position-dependent operations. Such a QW is realized by a resource-constant and flexible optical circuit, in which the variable operation is executed based on a Sagnac interferometer in an intrinsically stable and precisely controlled way. Our results contribute to the practical realization of quantum-walk-based quantum computation, quantum simulations, and quantum information protocols
    corecore