36 research outputs found
trans-4,5-DihydrÂoxy-1,3-bisÂ(4-methoxyÂphenÂyl)imidazolidine-2-thione
In the title compound, C17H18N2O4S, where one of the N-4-methoxyÂphenyl fragments is disordered over two sets of sites, the five-membered ring exhibits a nearly half-chair conformation and the two hydroxyl groups lie on opposite sides of the five-membered ring. In the crystal, the molÂecules are linked into sheets parallel to (100) via O—H⋯O and O—H⋯S hydrogen bonds
Dimethyl 2,6-dimethyl-1,4-dihydroÂpyridine-3,5-dicarboxylÂate
In the crystal of the title compound, C11H15NO4, the molÂecules are linked into sheets by N—H⋯O and C—H⋯O hydrogen bonds. Within the molÂecule, the 1,4-dihydroÂpyridine ring exhibits a distinctive planar conformation [r.m.s. deviation from the mean plane of 0.009 (3)Ã…], and the other non-H atoms are almost coplanar [r.m.s. deviation = 0.021 (3) Å] with the 1,4-dihydroÂpyridine ring. The conformation of the latter is governed mainly by two intraÂmolecular C—H⋯O non-classical interÂactions
trans-4,5-DihydrÂoxy-1,3-diphenylÂimidazolidine-2-thione
In the title compound, C15H14N2O2S, the five-membered ring adopts an envelope conformation and the two hydrÂoxy groups lie on opposite sides of the ring. The six-membered rings are oriented at a dihedral angle of 22.63 (3)°. In the crystal structure, interÂmolecular O—H⋯S and O—H⋯O hydrogen bonds link the molÂecules into a two-dimensional network
Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots
Improving the generalization capabilities of general-purpose robotic agents
has long been a significant challenge actively pursued by research communities.
Existing approaches often rely on collecting large-scale real-world robotic
data, such as the RT-1 dataset. However, these approaches typically suffer from
low efficiency, limiting their capability in open-domain scenarios with new
objects, and diverse backgrounds. In this paper, we propose a novel paradigm
that effectively leverages language-grounded segmentation masks generated by
state-of-the-art foundation models, to address a wide range of pick-and-place
robot manipulation tasks in everyday scenarios. By integrating precise
semantics and geometries conveyed from masks into our multi-view policy model,
our approach can perceive accurate object poses and enable sample-efficient
learning. Besides, such design facilitates effective generalization for
grasping new objects with similar shapes observed during training. Our approach
consists of two distinct steps. First, we introduce a series of foundation
models to accurately ground natural language demands across multiple tasks.
Second, we develop a Multi-modal Multi-view Policy Model that incorporates
inputs such as RGB images, semantic masks, and robot proprioception states to
jointly predict precise and executable robot actions. Extensive real-world
experiments conducted on a Franka Emika robot arm validate the effectiveness of
our proposed paradigm. Real-world demos are shown in YouTube
(https://www.youtube.com/watch?v=1m9wNzfp_4E ) and Bilibili
(https://www.bilibili.com/video/BV178411Z7H2/ )
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
We propose a novel framework for learning high-level cognitive capabilities
in robot manipulation tasks, such as making a smiley face using building
blocks. These tasks often involve complex multi-step reasoning, presenting
significant challenges due to the limited paired data connecting human
instructions (e.g., making a smiley face) and robot actions (e.g., end-effector
movement). Existing approaches relieve this challenge by adopting an open-loop
paradigm decomposing high-level instructions into simple sub-task plans, and
executing them step-by-step using low-level control models. However, these
approaches are short of instant observations in multi-step reasoning, leading
to sub-optimal results. To address this issue, we propose to automatically
collect a cognitive robot dataset by Large Language Models (LLMs). The
resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of
multi-step text plans and paired observation sequences. To enable efficient
data acquisition, we employ elaborated multi-round prompt designs that
effectively reduce the burden of extensive human involvement. We further
propose a closed-loop multi-modal embodied planning model that autoregressively
generates plans by taking image observations as input. To facilitate effective
learning, we leverage MiniGPT-4 with a frozen visual encoder and LLM, and
finetune additional vision adapter and Q-former to enable fine-grained spatial
perception for manipulation tasks. We conduct experiments to verify the
superiority over existing open and closed-loop methods, and achieve a
significant increase in success rate by 21.4% and 14.5% over ChatGPT and GPT-4
based robot tasks. Real-world demos are shown in
https://www.youtube.com/watch?v=ayAzID1_qQk
SBS Content Detection for Modified Asphalt Using Deep Neural Network
This study proposes a prediction model for accurately detecting styrene-butadiene-styrene (SBS) content in modified asphalt using the deep neural network (DNN). Traditional methods used for evaluating the SBS content are inaccurate and complicated because they are prone to produce errors by manual computation. Feature data of SBS content are derived from the spectra, which are obtained by the Fourier-transform infrared spectroscopy test. After designing DNN, preprocessed feature data are utilized as training and testing data and are fed into the DNN via a feature matrix. Furthermore, comparative studies are conducted to verify the accuracy of the proposed model. Results show that the mean square error value decreased by 68% for DNN with noise and dimension reduction. The DNN-based prediction model showed that the correlation coefficient between the target value and the mean predicted value is 0.9978 and 0.9992 for training and testing samples, respectively, indicating its remarkable accuracy and applicability after training. In comparison with the standard curve method and the random forest method, the precision of DNN is greater than 98% for the same test conditions, achieving the best predicting performance
CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
Current RGB-D scene recognition approaches often train two standalone backbones for RGB and depth modalities with the same Places or ImageNet pre-training. However, the pre-trained depth network is still biased by RGB-based models which may result in a suboptimal solution. In this paper, we present a single-model self-supervised hybrid pre-training framework for RGB and depth modalities, termed as CoMAE. Our CoMAE presents a curriculum learning strategy to unify the two popular self-supervised representation learning algorithms: contrastive learning and masked image modeling. Specifically, we first build a patch-level alignment task to pre-train a single encoder shared by two modalities via cross-modal contrastive learning. Then, the pre-trained contrastive encoder is passed to a multi-modal masked autoencoder to capture the finer context features from a generative perspective. In addition, our single-model design without requirement of fusion module is very flexible and robust to generalize to unimodal scenario in both training and testing phases. Extensive experiments on SUN RGB-D and NYUDv2 datasets demonstrate the effectiveness of our CoMAE for RGB and depth representation learning. In addition, our experiment results reveal that CoMAE is a data-efficient representation learner. Although we only use the small-scale and unlabeled training set for pre-training, our CoMAE pre-trained models are still competitive to the state-of-the-art methods with extra large-scale and supervised RGB dataset pre-training. Code will be released at https://github.com/MCG-NJU/CoMAE
An Intelligent Vision System for Detecting Defects in Micro-Armatures for Smartphones
Automatic vision inspection technology shows a high potential for quality inspection, and has drawn great interest in micro-armature manufacturing. Given that the inspection process is highly influenced by the lack of real standardization and efficiency performed with the human eye, thus, it is necessary to develop an automatic defect detection process. In this work, an elaborated vision system for the defect inspection of micro-armatures used in smartphones was developed. It consists of two parts, the front-end module and the deep convolution neural networks (DCNNs) module, which are responsible for different areas. The front-end module runs first and the DCNNs module will not run if the output of the front-end module is negative. To verify the application of this system, an apparatus consisting of an objective table, control panel, and a camera connected to a Personal Computer (PC) was used to simulate an industrial position of production. The results indicate that the developed vision system is capable of defect detection of micro-armatures