780 research outputs found
Open X-Embodiment:Robotic learning datasets and RT-X models
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x.github.io
Shuffling of Promoters for Multiple Genes To Optimize Xylose Fermentation in an Engineered Saccharomyces cerevisiae Strainâ–¿ â€
We describe here a useful metabolic engineering tool, multiple-gene-promoter shuffling (MGPS), to optimize expression levels for multiple genes. This method approaches an optimized gene overexpression level by fusing promoters of various strengths to genes of interest for a particular pathway. Selection of these promoters is based on the expression levels of the native genes under the same physiological conditions intended for the application. MGPS was implemented in a yeast xylose fermentation mixture by shuffling the promoters for GND2 and HXK2 with the genes for transaldolase (TAL1), transketolase (TKL1), and pyruvate kinase (PYK1) in the Saccharomyces cerevisiae strain FPL-YSX3. This host strain has integrated xylose-metabolizing genes, including xylose reductase, xylitol dehydrogenase, and xylulose kinase. The optimal expression levels for TAL1, TKL1, and PYK1 were identified by analysis of volumetric ethanol production by transformed cells. We found the optimal combination for ethanol production to be GND2-TAL1-HXK2-TKL1-HXK2-PYK1. The MGPS method could easily be adapted for other eukaryotic and prokaryotic organisms to optimize expression of genes for industrial fermentation
Erosion Gully Networks Extraction Based on InSAR Refined Digital Elevation Model and Relative Elevation Algorithm—A Case Study in Huangfuchuan Basin, Northern Loess Plateau, China
The time-effective mapping of erosion gullies is crucial for monitoring and early detection of developing erosional progression. However, current methods face challenges in obtaining large-scale erosion gully networks rapidly due to limitations in data availability and computational complexity. This study developed a rapid method for extracting erosion gully networks by integrating interferometric synthetic aperture radar (InSAR) and the relative elevation algorithm (REA) within the Huangfuchuan Basin, a case basin in the northern Loess Plateau, China. Validation in the study area demonstrated that the proposed method achieved an F1 score of 81.94%, representing a 9.77% improvement over that of the reference ASTER GDEM. The method successfully detected small reliefs of erosion gullies using the InSAR-refined DEM. The accuracy of extraction varied depending on the characteristics of the gullies in different locations. The F1 score showed a positive correlation with gully depth (R2 = 0.62), while the fragmented gully heads presented a higher potential of being missed due to the resolution effect. The extraction results provided insights into the erosion gully networks in the case study area. A total of approximately 28,000 gullies were identified, exhibiting pinnate and trellis patterns. Most of the gullies had notable intersecting angles exceeding 60°. The basin’s average depth was 64 m, with the deepest gully being 140 m deep. Surface fragmentation indicated moderate erosive activity, with the southeastern loess region showing more severe erosion than the Pisha sandstone-dominated central and northwestern regions. The method described in this study offers a rapid approach to map gullies, streamlining the workflow of erosion gully extraction and enabling efficiently targeted interventions for erosion control efforts. Its practical applicability and potential to leverage open-source data make it accessible for broader application in similar regions facing erosion challenges
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning
Current point-cloud detection methods have difficulty detecting the
open-vocabulary objects in the real world, due to their limited generalization
capability. Moreover, it is extremely laborious and expensive to collect and
fully annotate a point-cloud detection dataset with numerous classes of
objects, leading to the limited classes of existing point-cloud datasets and
hindering the model to learn general representations to achieve open-vocabulary
point-cloud detection. As far as we know, we are the first to study the problem
of open-vocabulary 3D point-cloud detection. Instead of seeking a point-cloud
dataset with full labels, we resort to ImageNet1K to broaden the vocabulary of
the point-cloud detector. We propose OV-3DETIC, an Open-Vocabulary 3D DETector
using Image-level Class supervision. Specifically, we take advantage of two
modalities, the image modality for recognition and the point-cloud modality for
localization, to generate pseudo labels for unseen classes. Then we propose a
novel debiased cross-modal contrastive learning method to transfer the
knowledge from image modality to point-cloud modality during training. Without
hurting the latency during inference, OV-3DETIC makes the point-cloud detector
capable of achieving open-vocabulary detection. Extensive experiments
demonstrate that the proposed OV-3DETIC achieves at least 10.77 % mAP
improvement (absolute value) and 9.56 % mAP improvement (absolute value) by a
wide range of baselines on the SUN-RGBD dataset and ScanNet dataset,
respectively. Besides, we conduct sufficient experiments to shed light on why
the proposed OV-3DETIC works
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
The goal of open-vocabulary detection is to identify novel objects based on
arbitrary textual descriptions. In this paper, we address open-vocabulary 3D
point-cloud detection by a dividing-and-conquering strategy, which involves: 1)
developing a point-cloud detector that can learn a general representation for
localizing various objects, and 2) connecting textual and point-cloud
representations to enable the detector to classify novel object categories
based on text prompting. Specifically, we resort to rich image pre-trained
models, by which the point-cloud detector learns localizing objects under the
supervision of predicted 2D bounding boxes from 2D pre-trained detectors.
Moreover, we propose a novel de-biased triplet cross-modal contrastive learning
to connect the modalities of image, point-cloud and text, thereby enabling the
point-cloud detector to benefit from vision-language pre-trained
models,i.e.,CLIP. The novel use of image and vision-language pre-trained models
for point-cloud detectors allows for open-vocabulary 3D object detection
without the need for 3D annotations. Experiments demonstrate that the proposed
method improves at least 3.03 points and 7.47 points over a wide range of
baselines on the ScanNet and SUN RGB-D datasets, respectively. Furthermore, we
provide a comprehensive analysis to explain why our approach works.Comment: I want to update this manuscrip
- …