109 research outputs found

    Prediction of TF-binding site by inclusion of higher order position dependencies

    Get PDF
    Most proposed methods for TF-binding site (TFBS) predictions only use low order dependencies for predictions due to the lack of efficient methods to extract higher order dependencies. In this work, We first propose a novel method to extract higher order dependencies by applying CNN on histone modification features. We then propose a novel TFBS prediction method, referred to as CNN_TF, by incorporating low order and higher order dependencies. CNN_TF is first evaluated on 13 TFs in the mES cell. Results show that using higher order dependencies outperforms low order dependencies significantly on 11 TFs. This indicates that higher order dependencies are indeed more effective for TFBS predictions than low order dependencies. Further experiments show that using both low order dependencies and higher order dependencies improves performance significantly on 12 TFs, indicating the two dependency types are complementary. To evaluate the influence of cell-types on prediction performances, CNN_TF was applied to five TFs in five cell-types of humans. Even though low order dependencies and higher order dependencies show different contributions in different cell-types, they are always complementary in predictions. When comparing to several state-of-the-art methods, CNN_TF outperforms them by at least 5.3% in AUPR

    Multi-task learning with mutual learning for joint sentiment classification and topic detection

    Get PDF
    Recently, advances in neural network approaches have achieved many successes in both sentiment classification and probabilistic topic modelling. On the one hand, latent topics derived from the global context of documents could be helpful in capturing more accurate word semantics and hence could potentially improve the sentiment classification accuracy. On the other hand, the word-level attention vectors obtained during the learning of sentiment classifiers could carry word-level polarity information and can be used to guide the discovery of topics in topic modelling. This paper proposes a multi-task learning framework which jointly learns a sentiment classifier and a topic model by making the word-level latent topic distributions in the topic model to be similar to the word-level attention vectors in the classifier through mutual learning. Experimental results on the Yelp and IMDB datasets verify the superior performance of the proposed framework over strong baselines on both sentiment classification accuracy and topic modelling evaluation results including perplexity and topic coherence measures. The proposed framework also extracts more interpretable topics compared to other conventional topic models and neural topic models

    MTTFsite : cross-cell-type TF binding site prediction by using multi-task learning

    Get PDF
    Motivation The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data. Results In this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained

    Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation

    Get PDF
    Background: DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. Results: We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. Conclusions: The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/

    New concept and design of electronically controlled cylinder lubrication system for large two-stroke marine diesel engines

    Get PDF
    Lubrication of cylinders between liners and rings is one of the crucial factors that affects the efficient operation of diesel engines. Marine diesel engines usually use inferior heavy fuel oil with high sulphur content, and the acidic substances formed by fuel combustion need alkaline cylinder oil to neutralize. For the operational cost to a marine engine, besides fuel oil, cylinder oil also takes a big share. This article first analyses the advantages and disadvantages of existing cylinder lubrication systems with regard to oil injection control. Second, the control parameters and variables such as the oil injection pressure, timing, oil feed rate and reliability are analysed, and the corresponding control schemes formulated. Third, the control strategies are developed in detail. Finally, verification tests are carried out on an actual engine, with the results showing that the control strategies developed in this article provide a stable, cost-effective, creative and excellent solution for cylinder lubrication with reduced cylinder wear. A thin and uniform oil film distribution is retained on the liner surface, with savings in cylinder oil consumption, lower particulate matter emission levels and improved cylinder liner and piston rings running conditions. The experimental results show that the oil consumption could be reduced by up to 5

    Design and experimental development of a new electronically controlled cylinder lubrication system for the large two-stroke crosshead diesel engines

    Get PDF
    Accurate, stable and reliable lubrication for the cylinders is very important to ensure the trouble-free operation of the marine diesel engines. A new electronically controlled cylinder lubrication system has been developed to remedy the defects of the conventional mechanical lubrication system. This new system’s design method, composition and implementation are described. The sensitivity tests are conducted on the test bench and the verification tests are also fulfilled on operating vessels. The main performance data are as follows: oil injection pressure about 3.0 MPa, oil injection timing precision 0.1 ms, oil injection duration 15 °CA or less. The oil injection concentrates onto the piston rings pack to ensure the good lubrication and neutralization, and the oil injection frequency is regulated according to engine load, the sulphur content in fuel, total base number of cylinder oil, cylinder liner running-in condition and so on. This results in the cylinder oil consumption rate falling approximately 25% compared with that of the conventional mechanical lubrication system. As a retrofit on vessels in service, the lubrication system has been fitted more than 120 main engines and has a payback period of less than 2 years

    EnDNA-Prot:identification of DNA-binding proteins by applying ensemble learning

    Get PDF
    DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public

    Carbapenemase-Producing Escherichia coli among Humans and Backyard Animals

    Get PDF
    Background: The rapidly increasing dissemination of carbapenem-resistant Enterobacteriaceae (CRE) in both humans and animals poses a global threat to public health. However, the transmission of CRE between humans and animals has not yet been well studied. Objectives: We investigated the prevalence, risk factors, and drivers of CRE transmission between humans and their backyard animals in rural China. Methods: We conducted a comprehensive sampling strategy in 12 villages in Shandong, China. Using the household [residents and their backyard animals (farm and companion animals)] as a single surveillance unit, we assessed the prevalence of CRE at the household level and examined the factors associated with CRE carriage through a detailed questionnaire. Genetic relationships among human- and animal-derived CRE were assessed using whole-genome sequencing–based molecular methods. Results: A total of 88 New Delhi metallo-β-lactamases –type carbapenem-resistant Escherichia coli (NDM-EC), including 17 from humans, 44 from pigs, 12 from chickens, 1 from cattle, and 2 from dogs, were isolated from 65 of the 746 households examined. The remaining 12 NDM-EC were from flies in the immediate backyard environment. The NDM-EC colonization in households was significantly associated with a) the number of species of backyard animals raised/kept in the same household, and b) the use of human and/or animal feces as fertilizer. Discriminant analysis of principal components (DAPC) revealed that a large proportion of the core genomes of the NDM-EC belonged to strains from hosts other than their own, and several human isolates shared closely related core single-nucleotide polymorphisms and blaNDM genetic contexts with isolates from backyard animals. Conclusions: To our knowledge, we are the first to report evidence of direct transmission of NDM-EC between humans and animals. Given the rise of NDM-EC in community and hospital infections, combating NDM-EC transmission in backyard farm systems is needed. https://doi.org/10.1289/EHP525

    Impacts of climate change, population growth, and power sector decarbonization on urban building energy use

    Get PDF
    Climate, technologies, and socio-economic changes will influence future building energy use in cities. However, current low-resolution regional and state-level analyses are insufficient to reliably assist city-level decision-making. Here we estimate mid-century hourly building energy consumption in 277 U.S. urban areas using a bottom-up approach. The projected future climate change results in heterogeneous changes in energy use intensity (EUI) among urban areas, particularly under higher warming scenarios, with on average 10.1–37.7% increases in the frequency of peak building electricity EUI but over 110% increases in some cities. For each 1 °C of warming, the mean city-scale space-conditioning EUI experiences an average increase/decrease of ~14%/ ~ 10% for space cooling/heating. Heterogeneous city-scale building source energy use changes are primarily driven by population and power sector changes, on average ranging from –9% to 40% with consistent south–north gradients under different scenarios. Across the scenarios considered here, the changes in city-scale building source energy use, when averaged over all urban areas, are as follows: –2.5% to –2.0% due to climate change, 7.3% to 52.2% due to population growth, and –17.1% to –8.9% due to power sector decarbonization. Our findings underscore the necessity of considering intercity heterogeneity when developing sustainable and resilient urban energy systems.<br/
    corecore