Search CORE

87 research outputs found

DETECTING GENETIC ENGINEERING WITH A KNOWLEDGE-RICH DNA SEQUENCE CLASSIFIER

Author: Ge Yuchen
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 06/01/2021
Field of study

Detecting evidence of genetic engineering in the wild is a problem of growing importance for biosecurity, provenance, and intellectual property rights. This thesis describes a computational system designed to detect engineering from DNA sequencing of biological samples and presents its performance on fully blinded test data. The pipeline builds on existing computational resources for metagenomics, including methods that use the full set of reference genomes deposited in GenBank. Starting from raw reads generated from short-read sequencers, the dominant host species are identified by k-mer analysis. Next, all the sequencing reads are mapped to the imputed host strain; those reads that do not map are retained as suspicious. Suspicious reads are de novo assembled to suspicious contigs, followed by sequence alignment against the NCBI non-redundant nucleotide database to annotate the engineered sequence and to identify whether the engineering is in a plasmid or is integrated into the host genome. Our initial system applied to blinded samples provides excellent identification of foreign gene content, the changes most likely to be functional. We have less ability to detect functional structural variants and small indels and SNPs produced by genetic engineering but which are more difficult to distinguish from natural variation. Future work will focus on improved methods for detecting synonymous recoding, used to introduce watermarks and for compatibility with synthesis and assembly methods, for using long read sequence data, and for distinguishing engineered sequence from natural variation

JScholarship (Johns Hopkins Univ.)

Recommended from our members

Dissociation Time, Quantum Yield, and Dynamic Reaction Pathways in the Thermolysis of <i>trans</i>-3,4-Dimethyl-1,2-dioxetane

Author: Leszczynski Jerzy
Prezhdo Oleg
Shu Yinan
Wang Yuchen
Zhou Jian-Ge
Publication venue
Publication date: 12/09/2024
Field of study

The thermolysis of trans-3,4-dimethyl-1,2-dioxetane is studied by trajectory surface hopping. The significant difference between long and short dissociation times is rationalized by frustrated dissociations and the time spent in triplet states. If the C–C bond breaks through an excited state channel, then the trajectory passes over a ridge of the potential energy surface of that state. The calculated triplet quantum yields match the experimental results. The dissociation half-times and quantum yields follow the same ascending order as per the product states, justifying the conjecture that the longer dissociation time leads to a higher quantum yield, proposed in the context of the methylation effect. The populations of the molecular Coulomb Hamiltonian and diagonal states reach equilibrium, but the triplet populations with different Sz components fluctuate indefinitely. Certain initial velocities, leading the trajectories to given product states, can be identified as the most characteristic features for sorting trajectories according to their product states

Knowledge UChicago

Load carrying capability of regional electricity-heat energy systems:Definitions, characteristics, and optimal value evaluation

Author: Cao Yuchen
Ge Shaoyun
Gu Chenghong
He Xingtang
Liu Hong
Xu Zhengyang
Publication venue: 'Elsevier BV'
Publication date: 15/03/2022
Field of study

Evaluating the load carrying capability of regional electricity-heat energy systems is of great significance to its planning and construction. Existing methods evaluate energy supply capability without considering load characteristics between various users. Besides, the impact of integrated demand response is not fully considered. To address these problems, this paper builds a load carrying capability interval model, which uses reliability as a security constraint and considers integrated demand response. An evaluation method for the optimal load carrying capability considering uncertainties of load growth is proposed. First, this paper defines energy supply capability, available capacity, and load carrying capability. Interval models are built to achieve the visualization display of these indices. Their characteristics are studied and the impact factors of interval boundary are analyzed. Secondly, a two-layer optimization model for the evaluation of optimal load carrying capability is constructed, considering the uncertainties of load growth. The upper-layer model aims at optimizing the sum of load carrying capability benefit, integrated demand response cost, and load curtailment penalty. The lower-layer model maximizes energy supply capability. Thereafter, the lower-layer model is linearized based on piecewise linearization and the least square method. The computation efficiency is greatly enhanced. In the case study, a real regional electricity-heat energy system is used to validate the proposed model and method.</p

OPUS

pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting

Author: Chu Zhixuan
Huang Yuchen
Jin Ge
Li Sheng
Ruan Yijia
Zhou Yunyi
Publication venue
Publication date: 16/05/2023
Field of study

Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE overall member models and competitive ensemble methods.Comment: The 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023

arXiv.org e-Print Archive

A Survey of Source Code Search: A 3-Dimensional Perspective

Author: Chen Yuchen
Chen Zhenyu
Fang Chunrong
Ge Xiuting
Ge Yifei
Hu Yuling
Liu Yang
Sun Weisong
Zhang Quanjun
Publication venue
Publication date: 13/11/2023
Field of study

(Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.Comment: submitted to ACM Transactions on Software Engineering and Methodolog

arXiv.org e-Print Archive

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Author: Chen Wenhu
Fu Jie
Lin Bill Yuchen
Liu Xueling
Shen Tianhao
Yue Xiang
Zhang Ge
Zheng Tianyu
Publication venue
Publication date: 27/02/2024
Field of study

The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter

arXiv.org e-Print Archive

Learning Generalizable Feature Fields for Mobile Manipulation

Author: Atanasov Nikolay
Fu Yang
Hu Yafei
Mu Jiteng
Qiu Ri-Zhao
Scherer Sebastian
Song Yuchen
Wang Xiaolong
Yang Ge
Yang Ruihan
Ye Jianglong
Publication venue
Publication date: 12/03/2024
Field of study

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We evaluate GeFF's ability to generalize to open-set objects as well as running time, when performing open-vocabulary mobile manipulation in dynamic scenes.Comment: Preprint. Project website is at: https://geff-b1.github.io

arXiv.org e-Print Archive

A versatile route to fabricate single atom catalysts with high chemoselectivity

Author: Chen Hongyu
Deng Yuchen
Ge Binghui
He Qian
He Xiaohui
Ji Hongbing
Ma Ding
Peng Mi
Xiao Dequan
Yao Siyu
Zhang Mengtao
Zhang Ying
Publication venue: Digital Commons @ New Haven
Publication date: 01/01/2019
Field of study

Preparation of single atom catalysts (SACs) is of broad interest to materials scientists and chemists but remains a formidable challenge. Herein, we develop an efficient approach to synthesize SACs via a precursor-dilution strategy, in which metalloporphyrin (MTPP) with target metals are co-polymerized with diluents (tetraphenylporphyrin, TPP), followed by pyrolysis to N-doped porous carbon supported SACs (M1/N-C). Twenty-four different SACs, including noble metals and non-noble metals, are successfully prepared. In addition, the synthesis of a series of catalysts with different surface atom densities, bi-metallic sites, and metal aggregation states are achieved. This approach shows remarkable adjustability and generality, providing sufficient freedom to design catalysts at atomic-scale and explore the unique catalytic properties of SACs. As an example, we show that the prepared Pt1/N-C exhibits superior chemoselectivity and regioselectivity in hydrogenation. It only converts terminal alkynes to alkenes while keeping other reducible functional groups such as alkenyl, nitro group, and even internal alkyne intact

Digital Commons @ New Haven