87 research outputs found

    DETECTING GENETIC ENGINEERING WITH A KNOWLEDGE-RICH DNA SEQUENCE CLASSIFIER

    Get PDF
    Detecting evidence of genetic engineering in the wild is a problem of growing importance for biosecurity, provenance, and intellectual property rights. This thesis describes a computational system designed to detect engineering from DNA sequencing of biological samples and presents its performance on fully blinded test data. The pipeline builds on existing computational resources for metagenomics, including methods that use the full set of reference genomes deposited in GenBank. Starting from raw reads generated from short-read sequencers, the dominant host species are identified by k-mer analysis. Next, all the sequencing reads are mapped to the imputed host strain; those reads that do not map are retained as suspicious. Suspicious reads are de novo assembled to suspicious contigs, followed by sequence alignment against the NCBI non-redundant nucleotide database to annotate the engineered sequence and to identify whether the engineering is in a plasmid or is integrated into the host genome. Our initial system applied to blinded samples provides excellent identification of foreign gene content, the changes most likely to be functional. We have less ability to detect functional structural variants and small indels and SNPs produced by genetic engineering but which are more difficult to distinguish from natural variation. Future work will focus on improved methods for detecting synonymous recoding, used to introduce watermarks and for compatibility with synthesis and assembly methods, for using long read sequence data, and for distinguishing engineered sequence from natural variation

    Load carrying capability of regional electricity-heat energy systems:Definitions, characteristics, and optimal value evaluation

    Get PDF
    Evaluating the load carrying capability of regional electricity-heat energy systems is of great significance to its planning and construction. Existing methods evaluate energy supply capability without considering load characteristics between various users. Besides, the impact of integrated demand response is not fully considered. To address these problems, this paper builds a load carrying capability interval model, which uses reliability as a security constraint and considers integrated demand response. An evaluation method for the optimal load carrying capability considering uncertainties of load growth is proposed. First, this paper defines energy supply capability, available capacity, and load carrying capability. Interval models are built to achieve the visualization display of these indices. Their characteristics are studied and the impact factors of interval boundary are analyzed. Secondly, a two-layer optimization model for the evaluation of optimal load carrying capability is constructed, considering the uncertainties of load growth. The upper-layer model aims at optimizing the sum of load carrying capability benefit, integrated demand response cost, and load curtailment penalty. The lower-layer model maximizes energy supply capability. Thereafter, the lower-layer model is linearized based on piecewise linearization and the least square method. The computation efficiency is greatly enhanced. In the case study, a real regional electricity-heat energy system is used to validate the proposed model and method.</p

    pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting

    Full text link
    Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE overall member models and competitive ensemble methods.Comment: The 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023

    A Survey of Source Code Search: A 3-Dimensional Perspective

    Full text link
    (Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.Comment: submitted to ACM Transactions on Software Engineering and Methodolog

    OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

    Full text link
    The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter

    Learning Generalizable Feature Fields for Mobile Manipulation

    Full text link
    An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We evaluate GeFF's ability to generalize to open-set objects as well as running time, when performing open-vocabulary mobile manipulation in dynamic scenes.Comment: Preprint. Project website is at: https://geff-b1.github.io

    A versatile route to fabricate single atom catalysts with high chemoselectivity

    Get PDF
    Preparation of single atom catalysts (SACs) is of broad interest to materials scientists and chemists but remains a formidable challenge. Herein, we develop an efficient approach to synthesize SACs via a precursor-dilution strategy, in which metalloporphyrin (MTPP) with target metals are co-polymerized with diluents (tetraphenylporphyrin, TPP), followed by pyrolysis to N-doped porous carbon supported SACs (M1/N-C). Twenty-four different SACs, including noble metals and non-noble metals, are successfully prepared. In addition, the synthesis of a series of catalysts with different surface atom densities, bi-metallic sites, and metal aggregation states are achieved. This approach shows remarkable adjustability and generality, providing sufficient freedom to design catalysts at atomic-scale and explore the unique catalytic properties of SACs. As an example, we show that the prepared Pt1/N-C exhibits superior chemoselectivity and regioselectivity in hydrogenation. It only converts terminal alkynes to alkenes while keeping other reducible functional groups such as alkenyl, nitro group, and even internal alkyne intact
    • …
    corecore