146 research outputs found

    Interpretability-oriented data-driven modelling of bladder cancer via computational intelligence

    Get PDF

    Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone

    Get PDF
    Background: Computational prediction of protein function constitutes one of the more complex problems in Bioinformatics, because of the diversity of functions and mechanisms in that proteins exert in nature. This issue is reinforced especially for proteins that share very low primary or tertiary structure similarity to existing annotated proteomes. In this sense, new alignment-free (AF) tools are needed to overcome the inherent limitations of classic alignment-based approaches to this issue. We have recently introduced AF protein-numerical-encoding programs (TI2BioP and ProtDCal), whose sequence-based features have been successfully applied to detect remote protein homologs, post-translational modifications and antibacterial peptides. Here we aim to demonstrate the applicability of 4 AF protein descriptor families, implemented in our programs, for the identification enzyme-like proteins. At the same time, the use of our novel family of 3D-structure-based descriptors is introduced for the first time. The Dobson & Doig (D&D) benchmark dataset is used for the evaluation of our AF protein descriptors, because of its proven structural diversity that permits one to emulate an experiment within the twilight zone of alignment-based methods (pair-wise identity <30%). The performance of our sequence-based predictor was further assessed using a subset of formerly uncharacterized proteins which currently represent a benchmark annotation dataset. Results: Four protein descriptor families (sequence-composition-based (0D), linear-topology-based (1D), pseudo-fold-topology-based (2D) and 3D-structure features (3D), were assessed using the D&D benchmark dataset. We show that only the families of ProtDCal's descriptors (0D, 1D and 3D) encode significant information for enzymes and non-enzymes discrimination. The obtained 3D-structure-based classifier ranked first among several other SVM-based methods assessed in this dataset. Furthermore, the model leveraging 1D descriptors, showed a higher success rate than EzyPred on a benchmark annotation dataset from the Shewanella oneidensis proteome. Conclusions: The applicability of ProtDCal as a general-purpose-AF protein modelling method is illustrated through the discrimination between two comprehensive protein functional classes. The observed performances using the highly diverse D&D dataset, and the set of formerly uncharacterized (hard-to-annotate) proteins of Shewanella oneidensis, places our methodology on the top range of methods to model and predict protein function using alignment-free approaches

    Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone

    Get PDF
    Background: Computational prediction of protein function constitutes one of the more complex problems in Bioinformatics, because of the diversity of functions and mechanisms in that proteins exert in nature. This issue is reinforced especially for proteins that share very low primary or tertiary structure similarity to existing annotated proteomes. In this sense, new alignment-free (AF) tools are needed to overcome the inherent limitations of classic alignment-based approaches to this issue. We have recently introduced AF protein-numerical-encoding programs (TI2BioP and ProtDCal), whose sequence-based features have been successfully applied to detect remote protein homologs, post-translational modifications and antibacterial peptides. Here we aim to demonstrate the applicability of 4 AF protein descriptor families, implemented in our programs, for the identification enzyme-like proteins. At the same time, the use of our novel family of 3D-structure-based descriptors is introduced for the first time. The Dobson & Doig (D&D) benchmark dataset is used for the evaluation of our AF protein descriptors, because of its proven structural diversity that permits one to emulate an experiment within the twilight zone of alignment-based methods (pair-wise identity <30%). The performance of our sequence-based predictor was further assessed using a subset of formerly uncharacterized proteins which currently represent a benchmark annotation dataset. Results: Four protein descriptor families (sequence-composition-based (0D), linear-topology-based (1D), pseudo-fold-topology-based (2D) and 3D-structure features (3D), were assessed using the D&D benchmark dataset. We show that only the families of ProtDCal's descriptors (0D, 1D and 3D) encode significant information for enzymes and non-enzymes discrimination. The obtained 3D-structure-based classifier ranked first among several other SVM-based methods assessed in this dataset. Furthermore, the model leveraging 1D descriptors, showed a higher success rate than EzyPred on a benchmark annotation dataset from the Shewanella oneidensis proteome. Conclusions: The applicability of ProtDCal as a general-purpose-AF protein modelling method is illustrated through the discrimination between two comprehensive protein functional classes. The observed performances using the highly diverse D&D dataset, and the set of formerly uncharacterized (hard-to-annotate) proteins of Shewanella oneidensis, places our methodology on the top range of methods to model and predict protein function using alignment-free approaches. © 2017 The Author(s).Acknowledgements The authors thank Dr. Reinaldo Molina-Ruiz for his assistance in obtaining the latest version of TI2BioP program. GACh acknowledges Dr. Federico Pallardo’s support, Dean of Medicine and Dentistry Faculty, University of Valencia (UV) in regards to the access to the UV’s facilities during part of this work. Funding YBRB is financed by a Postdoc Fellowship in the Chemistry Institute of the UNAM (DGAPA-UNAM [PAPIIT-IN200115]). GACh was funded by a Postdoc fellowship (SFRH/BPD/92978/2013) granted by the Portuguese Fundação para a Ciência e a Tecnologia (FCT). AA was partially supported by the Strategic Funding UID/Multi/04423/2013 through national funds provided by FCT and the European Regional Development Fund (ERDF) in the framework of the program PT2020, by the European Structural and Investment Funds (ESIF) through the Competitiveness and Internationalization Operational Program – COMPETE 2020 and by National Funds through the FCT under the project PTDC/AAG-GLO/6887/2014 (POCI-01-0124-FEDER-016845), and by the Structured Programs of R&D&I INNOVMAR (NORTE-01-0145-FEDER-000035 – NOVELMAR) and CORAL NORTE (NORTE- 01–0145-FEDER-000036), and funded by the Northern Regional Operational Program (NORTE2020) through the ERDF. The funding sources were not involved with the design of the study, analysis and interpretation of data or in the writing of the manuscript

    Modeling Approaches for Describing Microbial Population Heterogeneity

    Get PDF

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Unraveling the genetic secrets of ancient Baikal amphipods

    Get PDF
    Lake Baikal is the oldest, by volume, the largest, and the deepest freshwater lake on Earth. It is characterized by an outstanding diversity of endemic faunas with more than 350 amphipod species and subspecies (Amphipoda, Crustacea, Arthropoda). They are the dominant benthic organisms in the lake, contributing substantially to the overall biomass. Eulimnogammarus verrucosus, E. cyaneus, and E. vittatus, in particular, serve as emerging models in ecotoxicological studies. It was, then, necessary to investigate whether these endemic littoral amphipods species form genetically separate populations across Baikal, to scrutinize if the results obtained --~for example, about stress responses~-- with samples from one single location (Bolshie Koty, where the biological station is located), could be extrapolated to the complete lake or not. The genetic diversity within those three endemic littoral amphipod species was determined based on fragments of Cytochrome C Oxidase I (COI) and 18S rDNA (only for E. verrucosus). Gammarus lacustris, a Holarctic species living in water bodies near Baikal, was examined for comparison. The intra-specific genetic diversities within E. verrucosus and E. vittatus (13% and 10%, respectively) were similar to the inter-species differences, indicating the occurrence of cryptic, morphologically highly similar species. This was confirmed with 18S rDNA for E. verrucosus. The haplotypes of E. cyaneus and G. lacustris specimens were, with intra-specific genetic distances of 3% and 2%, respectively, more homogeneous, indicating no --or only recent disruption of-- gene flow of E. cyaneus across Baikal, and recent colonization of water bodies around Baikal by G. lacustris. The data provide the first clear evidence for the formation of cryptic (sub)species within endemic littoral amphipod species of Lake Baikal and mark the inflows/outflow of large rivers as dispersal barriers. Lake Baikal has provided a stable environment for millions of years, in stark contrast to small, transient water bodies in its immediate vicinity. A highly diverse endemic amphipod fauna is found in one but not the other habitat. To gain more insights and explain the immiscibility barrier between Lake Baikal and non-Baikal environments faunas, the differences in the stress response pathways were studied. To this end, exposure experiments to increasing temperature and a heavy metal (cadmium) as proteotoxic stressors were conducted in Russia. High-quality de novo transcriptome assemblies were obtained, covering multiple conditions, for three amphipod species: E. verrucosus and E. cyaneus -Baikal endemics-, and G. lacustris -Holarctic- as a potential invader. After comparing the transcriptomic stress responses, it was found that both Baikal species possess intact stress response systems and respond to elevated temperature with relatively similar changes in their expression profiles. G. lacustris reacts less strongly to the same stressors, possibly because its transcriptome is already perturbed by acclimation conditions (matching the Lake Baikal littoral). Comprehensive genomic resources are of utmost importance for ecotoxicological and ecophysiological studies in an evolutionary context, especially considering the exceptional value of Baikal as a UNESCO World Heritage Site. In that context, the results presented here, on the genome of Eulimnogammarus verrucosus, have been the first massive step to establish genomic sequence resources for a Baikalian amphipod (other than mitochondrial genomes and gene expression data in the form of de novo transcriptomes assemblies). Based on the data from a survey of its genome (a single lane of paired-end Illumina HiSeq 2000 reads, 3X) as well as a full dataset (two complete flow cells, 46X) the genome size was estimated as nearly 10 Gb based on the k-mer spectra and the coverage of highly conserved miRNA, hox genes, and other Sanger-sequenced genes. At least two-thirds of the genome are non-unique DNA, and no less than half of the genomic DNA is composed of just five families of repetitive elements, including low complexity sequences. Some of the repeats families found in high abundance in E. verrucosus seem to be species-specific, or Baikalian-specific. Attempts to use off-the-shelf assembly tools on the available low coverage data, both before and after the removal of highly repetitive components, as well as on the full dataset, resulted in extremely fragmented assemblies. Nevertheless, the analysis of coverage in Hox genes and their homeobox showed no clear evidence for paralogs, indicating that a genome duplication did not contribute to the large genome size. Several mate-pair libraries with bigger insert sizes than the 2kb used here and long reads sequencing technology combined with semi-automated methods for genome assembly seem to be necessary to obtain a reliable assembly for this species

    Experimental investigation and modelling of the heating value and elemental composition of biomass through artificial intelligence

    Get PDF
    Abstract: Knowledge advancement in artificial intelligence and blockchain technologies provides new potential predictive reliability for biomass energy value chain. However, for the prediction approach against experimental methodology, the prediction accuracy is expected to be high in order to develop a high fidelity and robust software which can serve as a tool in the decision making process. The global standards related to classification methods and energetic properties of biomass are still evolving given different observation and results which have been reported in the literature. Apart from these, there is a need for a holistic understanding of the effect of particle sizes and geospatial factors on the physicochemical properties of biomass to increase the uptake of bioenergy. Therefore, this research carried out an experimental investigation of some selected bioresources and also develops high-fidelity models built on artificial intelligence capability to accurately classify the biomass feedstocks, predict the main elemental composition (Carbon, Hydrogen, and Oxygen) on dry basis and the Heating value in (MJ/kg) of biomass...Ph.D. (Mechanical Engineering Science

    INTEGRATED GENOMIC MARKERS FOR CHEMOTHERAPEUTICS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Poly(ionic liquid) nanovesicles via polymerization induced self-assembly and their stabilization of Cu nanoparticles for tailored CO2 electroreduction

    Get PDF
    Herein, we report a straightforward, scalable synthetic route towards poly(ionic liquid) (PIL) homopolymer nanovesicles (NVs) with a tunable particle size of 50 to 120 nm and a shell thickness of 15 to 60 nm via one-step free radical polymerization induced self-assembly. By increasing monomer concentration for polymerization, their nanoscopic morphology can evolve from hollow NVs to dense spheres, and finally to directional worms, in which a multilamellar packing of PIL chains occurred in all samples. The transformation mechanism of NVs’ internal morphology is studied in detail by coarse-grained simulations, revealing a correlation between the PIL chain length and the shell thickness of NVs. To explore their potential applications, PIL NVs with varied shell thickness are in situ functionalized with ultra-small (1 ∼ 3 nm in size) copper nanoparticles (CuNPs) and employed as electrocatalysts for CO2 electroreduction. The composite electrocatalysts exhibit a 2.5-fold enhancement in selectivity towards C1 products (e.g., CH4), compared to the pristine CuNPs. This enhancement is attributed to the strong electronic interactions between the CuNPs and the surface functionalities of PIL NVs. This study casts new aspects on using nanostructured PILs as new electrocatalyst supports in CO2 conversion to C1 products

    CHARMM: The biomolecular simulation program

    Full text link
    CHARMM (Chemistry at HARvard Molecular Mechanics) is a highly versatile and widely used molecular simulation program. It has been developed over the last three decades with a primary focus on molecules of biological interest, including proteins, peptides, lipids, nucleic acids, carbohydrates, and small molecule ligands, as they occur in solution, crystals, and membrane environments. For the study of such systems, the program provides a large suite of computational tools that include numerous conformational and path sampling methods, free energy estimators, molecular minimization, dynamics, and analysis techniques, and model-building capabilities. The CHARMM program is applicable to problems involving a much broader class of many-particle systems. Calculations with CHARMM can be performed using a number of different energy functions and models, from mixed quantum mechanical-molecular mechanical force fields, to all-atom classical potential energy functions with explicit solvent and various boundary conditions, to implicit solvent and membrane models. The program has been ported to numerous platforms in both serial and parallel architectures. This article provides an overview of the program as it exists today with an emphasis on developments since the publication of the original CHARMM article in 1983. © 2009 Wiley Periodicals, Inc.J Comput Chem, 2009.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63074/1/21287_ftp.pd
    corecore