Search CORE

459 research outputs found

On the Existing Problems and Countermeasures of Practical Teaching Mode on China’s Public Security Academies

Author: LIU Cai
WANG Ruihua
Publication venue: Canadian Research & Development Center of Sciences and Cultures
Publication date: 26/03/2018
Field of study

Practical teaching is a significant approach in the cultivation of applied talents in public security academies. Strengthening practical teaching and focusing on cultivating the practical ability of students are the basic requirements for the reform of public security education. This paper focuses on the existing problems in the practical teaching model of public security academies, analyzes the reasons and puts forward the constructive idea of practical teaching models based on the actual situation practice

CSCanada.net: E-Journals (Canadian Academy of Oriental and Occidental Culture, Canadian Research & Development Center of Sciences and Cultures)

ViCo: Engaging Video Comment Generation with Human Preference Rewards

Author: Chen Xu
Fu Jianlong
Liu Bei
Song Ruihua
Sun Yuchong
Publication venue
Publication date: 22/08/2023
Field of study

Engaging video comments play an important role in video social media, as they are the carrier of feelings, thoughts, or humor of the audience. Preliminary works have made initial exploration for video comment generation by adopting caption-style encoder-decoder models. However, comment generation presents some unique challenges distinct from caption generation, which makes these methods somewhat less effective at generating engaging comments. In contrast to the objective and descriptive nature of captions, comments tend to be inherently subjective, making it hard to quantify and evaluate the engagement of comments. Furthermore, the scarcity of truly engaging comments brings difficulty to collecting enough high-quality training examples. In this paper, we propose ViCo with three novel designs to tackle the above challenges for generating engaging Video Comments. Firstly, to quantify the engagement of comments, we utilize the number of "likes" each comment receives as a proxy of human preference after an appropriate debiasing procedure. Secondly, to automatically evaluate the engagement of comments, we train a reward model to align its judgment to the above proxy. Our user studies indicate that this reward model effectively aligns with human judgments. Lastly, to alleviate the scarcity of high-quality comments, an initial generator is trained on readily available but noisy data to generate comments. Then the reward model is employed to offer feedback on the generated comments, thus optimizing the initial generator. To facilitate the research of video commenting, we collect a large video comment-dataset (ViCo-20k) with rich metadata from a popular video website. Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered

arXiv.org e-Print Archive

Classification of External Vibration Sources through Data-Driven Models Using Hybrid CNNs and LSTMs

Author: Kaewunruen Sakdirat
Liang Ruihua
Liu Weifeng
Wu Zongzhen
Zhang Hougui
Publication venue: 'Hindawi Limited'
Publication date: 13/03/2023
Field of study

University of Birmingham Research Portal

Discovery of new cellulases from the metagenome by a metagenomics-guided strategy.

Author: LI A
Liu RUIHUA
Qu HONG
WANG Y
Xia Y
Yang C
Zhang T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background Energy shortage has become a global problem. Production of biofuels from renewable biomass resources is an inevitable trend of sustainable development. Cellulose is the most abundant and renewable resource in nature. Lack of new cellulases with unique properties has become the bottleneck of the efficient utilization of cellulose. Environmental metagenomes are regarded as huge reservoirs for a variety of cellulases. However, new cellulases cannot be obtained easily by functional screening of metagenomic libraries. Results In this work, a metagenomics-guided strategy for obtaining new cellulases from the metagenome was proposed. Metagenomic sequences of DNA extracted from the anaerobic beer lees converting consortium enriched at thermophilic conditions were assembled, and 23 glycoside hydrolase (GH) sequences affiliated with the GH family 5 were identified. Among the 23 GH sequences, three target sequences (designated as cel7482, cel3623 and cel36) showing low identity with those known GHs were chosen as the putative cellulase genes to be functionally expressed in Escherichia coli after PCR cloning. The three cellulases were classified into endo-β-1,4-glucanases by product pattern analysis. The recombinant cellulases were more active at pH 5.5 and within a temperature range of 60–70 °C. Computer-assisted 3D structure modeling indicated that the active residues in the active site of the recombinant cellulases were more similar to each other compared with non-active site residues. The recombinant cel7482 was extremely tolerant to 2 M NaCl, suggesting that cel7482 may be a halotolerant cellulase. Moreover, the recombinant cel7482 was shown to have an ability to resist three ionic liquids (ILs), which are widely used for cellulose pretreatment. Furthermore, active cel7482 was secreted by the twin-arginine translocation (Tat) pathway of Bacillus subtilis 168 into the culture medium, which facilitates the subsequent purification and reduces the formation of inclusion body in the context of overexpression. Conclusions This study demonstrated a simple and efficient method for direct cloning of new cellulase genes from environmental metagenomes. In the future, the metagenomics-guided strategy may be applied to the high-throughput screening of new cellulases from environmental metagenomes.published_or_final_versio

Springer - Publisher Connector

PubMed Central

HKU Scholars Hub

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Author: Fu Jianlong
Jin Chuhao
Liu Bei
Song Ruihua
Tan Wenhui
Wang Limin
Yang Jiange
Publication venue
Publication date: 30/05/2023
Field of study

We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks. These tasks often involve complex multi-step reasoning, presenting significant challenges due to the limited paired data connecting human instructions (e.g., making a smiley face) and robot actions (e.g., end-effector movement). Existing approaches relieve this challenge by adopting an open-loop paradigm decomposing high-level instructions into simple sub-task plans, and executing them step-by-step using low-level control models. However, these approaches are short of instant observations in multi-step reasoning, leading to sub-optimal results. To address this issue, we propose to automatically collect a cognitive robot dataset by Large Language Models (LLMs). The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation sequences. To enable efficient data acquisition, we employ elaborated multi-round prompt designs that effectively reduce the burden of extensive human involvement. We further propose a closed-loop multi-modal embodied planning model that autoregressively generates plans by taking image observations as input. To facilitate effective learning, we leverage MiniGPT-4 with a frozen visual encoder and LLM, and finetune additional vision adapter and Q-former to enable fine-grained spatial perception for manipulation tasks. We conduct experiments to verify the superiority over existing open and closed-loop methods, and achieve a significant increase in success rate by 21.4% and 14.5% over ChatGPT and GPT-4 based robot tasks. Real-world demos are shown in https://www.youtube.com/watch?v=ayAzID1_qQk

arXiv.org e-Print Archive

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Author: Fu Jianlong
Jin Chuhao
Liu Bei
Song Ruihua
Tan Wenhui
Wang Limin
Yang Jiange
Publication venue
Publication date: 09/06/2023
Field of study

Improving the generalization capabilities of general-purpose robotic agents has long been a significant challenge actively pursued by research communities. Existing approaches often rely on collecting large-scale real-world robotic data, such as the RT-1 dataset. However, these approaches typically suffer from low efficiency, limiting their capability in open-domain scenarios with new objects, and diverse backgrounds. In this paper, we propose a novel paradigm that effectively leverages language-grounded segmentation masks generated by state-of-the-art foundation models, to address a wide range of pick-and-place robot manipulation tasks in everyday scenarios. By integrating precise semantics and geometries conveyed from masks into our multi-view policy model, our approach can perceive accurate object poses and enable sample-efficient learning. Besides, such design facilitates effective generalization for grasping new objects with similar shapes observed during training. Our approach consists of two distinct steps. First, we introduce a series of foundation models to accurately ground natural language demands across multiple tasks. Second, we develop a Multi-modal Multi-view Policy Model that incorporates inputs such as RGB images, semantic masks, and robot proprioception states to jointly predict precise and executable robot actions. Extensive real-world experiments conducted on a Franka Emika robot arm validate the effectiveness of our proposed paradigm. Real-world demos are shown in YouTube (https://www.youtube.com/watch?v=1m9wNzfp_4E ) and Bilibili (https://www.bilibili.com/video/BV178411Z7H2/ )

arXiv.org e-Print Archive

Recommended from our members

Localized kernel-based approximation for pricing financial options under regime switching jump diffusion model

Author: Haghi Majid
Liu Ruihua
Mollapourasl Reza
Publication venue
Publication date
Field of study

In this paper, we consider European and American option pricing problems under regime switching jump diffusion models which are formulated as a system of partial integro-differential equations (PIDEs) with fixed and free boundaries. For free boundary problem arising in pricing American option, we use operator splitting method to deal with early exercise feature of American option. For developing a numerical technique we employ localized radial basis function generated finite difference (RBF-FD) approximation to overcome the ill-conditioning and high density issues of discretized matrices. The proposed method leads to linear systems with tridiagonal and diagonal dominant matrices. Also, in this paper the convergence and consistency of the proposed method are discussed. Numerical examples presented in the last section illustrate the robustness and practical performance of the proposed algorithm for pricing European and American options. Published by Elsevier B.V. on behalf of IMACS

ScholarsArchive@OSU

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Author: Fu Jianlong
Li Houqiang
Liu Bei
Luo Jiebo
Song Ruihua
Sun Yuchong
Xue Hongwei
Publication venue
Publication date: 22/09/2022
Field of study

The pre-trained image-text models, like CLIP, have demonstrated the strong power of vision-language representation learned from a large scale of web-collected image-text data. In light of the well-learned visual features, some existing works transfer image representation to video domain and achieve good results. However, how to utilize image-language pre-trained model (e.g., CLIP) for video-language pre-training (post-pretraining) is still under explored. In this paper, we investigate two questions: 1) what are the factors hindering post-pretraining CLIP to further improve the performance on video-language tasks? and 2) how to mitigate the impact of these factors? Through a series of comparative experiments and analyses, we find that the data scale and domain gap between language sources have great impacts. Motivated by these, we propose a Omnisource Cross-modal Learning method equipped with a Video Proxy mechanism on the basis of CLIP, namely CLIP-ViP. Extensive results show that our approach improves the performance of CLIP on video-text retrieval by a large margin. Our model also achieves SOTA results on a variety of datasets, including MSR-VTT, DiDeMo, LSMDC, and ActivityNet. We will release our code and pre-trained CLIP-ViP models at https://github.com/microsoft/XPretrain/tree/main/CLIP-ViP

arXiv.org e-Print Archive