30 research outputs found

    Towards Code Watermarking with Dual-Channel Transformations

    Full text link
    The expansion of the open source community and the rise of large language models have raised ethical and security concerns on the distribution of source code, such as misconduct on copyrighted code, distributions without proper licenses, or misuse of the code for malicious purposes. Hence it is important to track the ownership of source code, in wich watermarking is a major technique. Yet, drastically different from natural languages, source code watermarking requires far stricter and more complicated rules to ensure the readability as well as the functionality of the source code. Hence we introduce SrcMarker, a watermarking system to unobtrusively encode ID bitstrings into source code, without affecting the usage and semantics of the code. To this end, SrcMarker performs transformations on an AST-based intermediate representation that enables unified transformations across different programming languages. The core of the system utilizes learning-based embedding and extraction modules to select rule-based transformations for watermarking. In addition, a novel feature-approximation technique is designed to tackle the inherent non-differentiability of rule selection, thus seamlessly integrating the rule-based transformations and learning-based networks into an interconnected system to enable end-to-end training. Extensive experiments demonstrate the superiority of SrcMarker over existing methods in various watermarking requirements.Comment: 16 page

    StaPep: an open-source tool for the structure prediction and feature extraction of hydrocarbon-stapled peptides

    Full text link
    Many tools exist for extracting structural and physiochemical descriptors from linear peptides to predict their properties, but similar tools for hydrocarbon-stapled peptides are lacking.Here, we present StaPep, a Python-based toolkit designed for generating 2D/3D structures and calculating 21 distinct features for hydrocarbon-stapled peptides.The current version supports hydrocarbon-stapled peptides containing 2 non-standard amino acids (norleucine and 2-aminoisobutyric acid) and 6 nonnatural anchoring residues (S3, S5, S8, R3, R5 and R8).Then we established a hand-curated dataset of 201 hydrocarbon-stapled peptides and 384 linear peptides with sequence information and experimental membrane permeability, to showcase StaPep's application in artificial intelligence projects.A machine learning-based predictor utilizing above calculated features was developed with AUC of 0.85, for identifying cell-penetrating hydrocarbon-stapled peptides.StaPep's pipeline spans data retrieval, cleaning, structure generation, molecular feature calculation, and machine learning model construction for hydrocarbon-stapled peptides.The source codes and dataset are freely available on Github: https://github.com/dahuilangda/stapep_package.Comment: 26 pages, 6 figure

    A New Method of RNA Secondary Structure Prediction Based on Convolutional Neural Network and Dynamic Programming

    Get PDF
    In recent years, obtaining RNA secondary structure information has played an important role in RNA and gene function research. Although some RNA secondary structures can be gained experimentally, in most cases, efficient, and accurate computational methods are still needed to predict RNA secondary structure. Current RNA secondary structure prediction methods are mainly based on the minimum free energy algorithm, which finds the optimal folding state of RNA in vivo using an iterative method to meet the minimum energy or other constraints. However, due to the complexity of biotic environment, a true RNA structure always keeps the balance of biological potential energy status, rather than the optimal folding status that meets the minimum energy. For short sequence RNA its equilibrium energy status for the RNA folding organism is close to the minimum free energy status; therefore, the minimum free energy algorithm for predicting RNA secondary structure has higher accuracy. Nevertheless, in a longer sequence RNA, constant folding causes its biopotential energy balance to deviate far from the minimum free energy status. This deviation is because of its complex structure and results in a serious decline in the prediction accuracy of its secondary structure. In this paper, we propose a novel RNA secondary structure prediction algorithm using a convolutional neural network model combined with a dynamic programming method to improve the accuracy with large-scale RNA sequence and structure data. We analyze current experimental RNA sequences and structure data to construct a deep convolutional network model, and then we extract implicit features of an effective classification from large-scale data to predict the pairing probability of each base in an RNA sequence. For the obtained probabilities of RNA sequence base pairing, an enhanced dynamic programming method is applied to obtain the optimal RNA secondary structure. Results indicate that our proposed method is superior to the common RNA secondary structure prediction algorithms in predicting three benchmark RNA families. Based on the characteristics of deep learning algorithm, it can be inferred that the method proposed in this paper has a 30% higher prediction success rate when compared with other algorithms, which will be needed as the amount of real RNA structure data increases in the future

    Theoretical and technological system for Highly efficient development of deep coalbed methane in the Eastern edge of Erdos Basin

    Get PDF
    Aiming at the development problems of deep coal reservoirs, such as deep burial, low permeability and complex stress field, this paper clarifies that the resource enrichment conditions, effective fracturing volume, effective horizontal section length, and good reservoir conditions are the key factors for high productivity on the basis of summarizing the exploration and development practice of the Daning-Jixian Block in the past five years. Under the guidance of the theory of “artificial gas reservoir” development, a technical system for the efficient development of deep coalbed methane was preliminarily established as follows. ① According to the reservoir resource conditions, structural preservation conditions and engineering fracturing conditions, a total of 11 indicators in three categories established the geological-engineering “dessert” evaluation standards of deep coalbed methane. ② Based on the techniques such as microstructural characterization, multi-scale fracture prediction, and 3D geological model construction, the quantitative and visual characterization of all elements of “geology + engineering” of deep coal seam was achieved. ⑱ Based on the guiding idea of “geological small scale, three-dimensional seismic microscale, drill along the target, less adjustment and fast drilling”, a three-stage geological-engineering geo-steering technology with an excellent design of pre-drilling trajectory, precise target entry and post-target fine-tuning as the core was developed.④ The optimization design of the five-in-one well network based on “in-situ stress field, natural fracture field, artificial fracture field, well type and orientation, well network and well spacing” realized the maximization of resource utilization and the maximization of gas field recovery. â‘€ According to the occurrence characteristics, seepage mechanism and production characteristics of deep coalbed methane, a reasonable production capacity evaluation and EUR prediction technology based on the rate-transient analysis method, the empirical production decline method, the numerical simulation method, and the empirical analogy method was formed. â‘„ Following to the design principle of “four-in-one” precision fracturing section and “fracture staggering + differentiation between fracturing segments”, a large-scale volumetric fracturing technology aimed at constructing artificial gas reservoirs was proposed.⑩ According to the characteristics of gas-water variation in gas wells, the optimal control technology of drainage and production in different production stages through the whole life cycle of wells was put forwarded. ⑧ Combined with the current progress of AI technology, and the characteristics of deep coalbed methane development law, gathering and transportation, the technology of gathering, transportation and digital intelligence integrating geological, engineering, and ground aspects was explored. Under the guidance of this achievement, 29 horizontal wells that have been put into productionwith an initial production of 5×104−16×104 m3/d, an average of 10.2×104 m3/d, and the daily gas production of the block has exceeded 3 million cubic meters, which has important guiding significance for accelerating the large-scale production of deep coalbed methane in the eastern margin of the Ordos basin. Also the study establishes a reference and standard for the efficient development of similar resources

    Genome-wide identification of resistance genes and response mechanism analysis of key gene knockout strain to catechol in Saccharomyces cerevisiae

    Get PDF
    Engineering Saccharomyces cerevisiae for biodegradation and transformation of industrial toxic substances such as catechol (CA) has received widespread attention, but the low tolerance of S. cerevisiae to CA has limited its development. The exploration and modification of genes or pathways related to CA tolerance in S. cerevisiae is an effective way to further improve the utilization efficiency of CA. This study identified 36 genes associated with CA tolerance in S. cerevisiae through genome-wide identification and bioinformatics analysis and the ERG6 knockout strain (ERG6Δ) is the most sensitive to CA. Based on the omics analysis of ERG6Δ under CA stress, it was found that ERG6 knockout affects pathways such as intrinsic component of membrane and pentose phosphate pathway. In addition, the study revealed that 29 genes related to the cell wall-membrane system were up-regulated by more than twice, NADPH and NADP+ were increased by 2.48 and 4.41 times respectively, and spermidine and spermine were increased by 2.85 and 2.14 times, respectively, in ERG6Δ. Overall, the response of cell wall-membrane system, the accumulation of spermidine and NADPH, as well as the increased levels of metabolites in pentose phosphate pathway are important findings in improving the CA resistance. This study provides a theoretical basis for improving the tolerance of strains to CA and reducing the damage caused by CA to the ecological environment and human health

    The response mechanism analysis of HMX1 knockout strain to levulinic acid in Saccharomyces cerevisiae

    Get PDF
    Levulinic acid, a hydrolysis product of lignocellulose, can be metabolized into important compounds in the field of medicine and pesticides by engineered strains of Saccharomyces cerevisiae. Levulinic acid, as an intermediate product widely found in the conversion process of lignocellulosic biomass, has multiple applications. However, its toxicity to Saccharomyces cerevisiae reduces its conversion efficiency, so screening Saccharomyces cerevisiae genes that can tolerate levulinic acid becomes the key. By creating a whole-genome knockout library and bioinformatics analysis, this study used the phenotypic characteristics of cells as the basis for screening and found the HMX1 gene that is highly sensitive to levulinic acid in the oxidative stress pathway. After knocking out HMX1 and treating with levulinic acid, the omics data of the strain revealed that multiple affected pathways, especially the expression of 14 genes related to the cell wall and membrane system, were significantly downregulated. The levels of acetyl-CoA and riboflavin decreased by 1.02-fold and 1.44-fold, respectively, while the content of pantothenic acid increased. These findings indicate that the cell wall-membrane system, as well as the metabolism of acetyl-CoA and riboflavin, are important in improving the resistance of Saccharomyces cerevisiae to levulinic acid. They provide theoretical support for enhancing the tolerance of microorganisms to levulinic acid, which is significant for optimizing the conversion process of lignocellulosic biomass to levulinic acid

    A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm

    No full text
    Building electricity load forecasting plays an important role in building energy management, peak demand and power grid security. In the past two decades, a large number of data-driven models have been applied to building and larger-scale energy consumption predictions. Although these models have been successful in specific cases, their performances would be greatly affected by the quantity and quality of the building data. Moreover, for older buildings with sparse data, or new buildings with no historical data, accurate predictions are difficult to achieve. Aiming at such a data silos problem caused by the insufficient data collection in the building energy consumption prediction, this study proposes a building electricity load forecasting method based on a similarity judgement and an improved TrAdaBoost algorithm (iTrAdaBoost). The Maximum Mean Discrepancy (MMD) is used to search similar building samples related to the target building from public datasets. Different from general Boosting algorithms, the proposed iTrAdaBoost algorithm iteratively updates the weights of the similar building samples and combines them together with the target building samples for a prediction accuracy improvement. An educational building’s case study is carried out in this paper. The results show that even when the target and source samples belong to different domains, i.e., the geographical location and meteorological condition of the buildings are different, the proposed MMD-iTradaBoost method has a better prediction accuracy in the transfer learning process than the BP or traditional AdaBoost models. In addition, compared with other advanced deep learning models, the proposed method has a simple structure and is easy for engineering implementation

    Psychosocial profiles of physical activity fluctuation in office employees: A latent profile analysis.

    No full text
    OBJECTIVES:Fluctuation is a common but neglected phenomenon of physical activity (PA) behavior. This study aimed to explore the psychosocial profiles of PA fluctuation in office employees, and to examine the association of latent profiles with demographics and PA level. METHOD:434 Chinese office employees who were identified as PA fluctuators (M = 32.4 years, SD = 6.9, 55.5% female) completed a cross-sectional online survey covering demographics, PA behavior, and six psychosocial indicators (self-efficacy, planning, action control, affective attitude, social support, and perceived barriers). Latent profile analysis was used to determine PA fluctuators' psychosocial profiles. Associated factors of profile membership were identified with multinomial logistic regression. RESULTS:The two-profile model (uncommitted vs. moderately committed) was selected as the best solution. The moderately committed group (n = 346, 79.7%) possessed a more active mindset by reporting significantly higher scores of self-efficacy (t = 9.42 p < .001), planning (t = 16.33 p < .001), action control (t = 14.55 p < .001), affective attitude (t = 13.33 p < .001), and social support (t = 11.50 p < .001) compared with the uncommitted group (n = 88, 20.3%). Results from a multinomial logistic regression showed that the moderately committed profile was associated with normal weight status (OR = 2.00, p< .05), having a medium managerial position (OR = 2.54, p< .01), and high level of moderate to vigorous PA behavior (OR = 4.85, p< .001). CONCLUSIONS:These findings demonstrate the variability of PA fluctuators' mindsets. Future tailored interventions are recommended to promote PA behavior for this population based on the categorization from the present study

    Multipath target discrimination in human monitoring based on MIMO mmWave radar

    No full text
    Abstract To address the interference caused by multipath ghosts, this paper presents a multipath target discrimination method based on separated Gaussian similarity matrices. The proposed method utilizes separated Gaussian similarity functions based on the Maximum Likelihood principle to quantify the similarity from distance and angle perspectives. Meanwhile, dimensionality reduction is applied to maximize the distinction between multipath ghosts and actual targets. This method can effectively discriminate the multipath relationships and attributes of detected targets and has higher recognition accuracy than other geometry‐based methods. The effectiveness and superiority of the proposed method were validated through Monte‐Carlo and practical experiments
    corecore