5,807 research outputs found

    Predicting Proteome-Early Drug Induced Cardiac Toxicity Relationships (Pro-EDICToRs) with Node Overlapping Parameters (NOPs) of a new class of Blood Mass-Spectra graphs

    Get PDF
    The 11th International Electronic Conference on Synthetic Organic Chemistry session Computational ChemistryBlood Serum Proteome-Mass Spectra (SP-MS) may allow detecting Proteome-Early Drug Induced Cardiac Toxicity Relationships (called here Pro-EDICToRs). However, due to the thousands of proteins in the SP identifying general Pro-EDICToRs patterns instead of a single protein marker may represents a more realistic alternative. In this sense, first we introduced a novel Cartesian 2D spectrum graph for SP-MS. Next, we introduced the graph node-overlapping parameters (nopk) to numerically characterize SP-MS using them as inputs to seek a Quantitative Proteome-Toxicity Relationship (QPTR) classifier for Pro-EDICToRs with accuracy higher than 80%. Principal Component Analysis (PCA) on the nopk values present in the QPTR model explains with one factor (F1) the 82.7% of variance. Next, these nopk values were used to construct by the first time a Pro-EDICToRs Complex Network having nodes (samples) linked by edges (similarity between two samples). We compared the topology of two sub-networks (cardiac toxicity and control samples); finding extreme relative differences for the re-linking (P) and Zagreb (M2) indices (9.5 and 54.2 % respectively) out of 11 parameters. We also compared subnetworks with well known ideal random networks including Barabasi-Albert, Kleinberg Small World, Erdos-Renyi, and Epsstein Power Law models. Finally, we proposed Partial Order (PO) schemes of the 115 samples based on LDA-probabilities, F1-scores and/or network node degrees. PCA-CN and LDA-PCA based POs with Tanimoto’s coefficients equal or higher than 0.75 are promising for the study of Pro-EDICToRs. These results shows that simple QPTRs models based on MS graph numerical parameters are an interesting tool for proteome researchThe authors thank projects funded by the Xunta de Galicia (PXIB20304PR and BTF20302PR) and the Ministerio de Sanidad y Consumo (PI061457). González-Díaz H. acknowledges tenure track research position funded by the Program Isidro Parga Pondal, Xunta de Galici

    Molecular Genetic Diversity Study of Forest Coffee Tree (Coffea arabica L.) Populations in Ethiopia: Implications for Conservation and Breeding

    Get PDF
    Coffee provides one of the most widely drunk beverages in the world, and is a very important source of foreign exchange income for many countries. Coffea arabica, which contributes over 70 percent of the world's coffee productions, is characterized by a low genetic diversity, attributed to its allopolyploidy origin, reproductive biology and evolution. C. arabica has originated in the southwest rain forests of Ethiopia, where it is grown under four different systems, namely forest coffee, small holders coffee, semi plantation coffee and plantation coffee. Genetic diversity of the forest coffee (C. arabica) gene pool in Ethiopia is being lost at an alarming rate because of habitat destruction (deforestation), competition from other cash crops and replacement by invariable disease resistant coffee cultivars. This study focused on molecular genetic diversity study of forest coffee populations in Ethiopia using PCR based DNA markers such as random amplified polymorphic DNA (RAPD), inverse sequence-tagged repeat (ISTR), inter-simple sequence repeats (ISSR) and simple sequence repeat (SSR) or microsatellites. The objectives of the study are to estimate the extent and distribution of molecular genetic diversity of forest coffee and to design conservation strategies for it’s sustainable use in future coffee breeding. In this study, considerable samples of forest coffee collected from four coffee growing regions (provinces) of Ethiopia were analysed. The results indicate that moderate genetic diversity exists within and among few forest coffee populations, which need due attention from a conservation and breeding point of view. The cluster analysis revealed that most of the samples from the same region (province) were grouped together which could be attributed to presence of substantial gene flow between adjacent populations in each region in the form of young coffee plants through transplantation by man. In addition wild animals such as monkeys also play a significant role in coffee trees gene flow between adjacent populations. The overall variation of the forest coffee is found to reside in few populations from each region. Therefore, considering few populations from each region for either in situ or ex situ conservation may preserve most of the variation within the species. For instance, Welega-2, Ilubabor-2, Jima-2 and Bench Maji-2 populations should be given higher priority. In addition, some populations or genotypes have displayed unique amplification profiles particularly for RAPD and ISTR markers. Whether these unique bands are linked to any of the important agronomic traits and serve in marker assisted selections in future coffee breeding requires further investigations

    A New Similarity/Diversity Measure for the Characterization of DNA Sequences

    Get PDF
    In this paper, a new similarity/diversity measure is proposed as a new approach to the analysis of sequential data, where useful information can be also obtained by the ordering relationships between the sequence elements. This methodology has been applied to characterize DNA sequences, evaluating their similarity/diversity. The new proposed distance (weighted standardized Hasse distance) is evaluated between pairs of Hasse matrices derived from the classical partial ordering rules. It can be naturally standardized, thus allowing the interpretation of these distances as absolute values (e.g. percentage) and deriving simple similarity and correlation indices. DNA sequences taken from the first exons of the beta-globins for eight different species have been analyzed. Sensitivity analysis has been also performed, showing the high capability of this measure to take into account small modifications of the DNA sequences. Finally, a comparison with results obtained from literature is given

    Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases.

    Get PDF
    Inflammatory bowel diseases, which include Crohn's disease and ulcerative colitis, affect several million individuals worldwide. Crohn's disease and ulcerative colitis are complex diseases that are heterogeneous at the clinical, immunological, molecular, genetic, and microbial levels. Individual contributing factors have been the focus of extensive research. As part of the Integrative Human Microbiome Project (HMP2 or iHMP), we followed 132 subjects for one year each to generate integrated longitudinal molecular profiles of host and microbial activity during disease (up to 24 time points each; in total 2,965 stool, biopsy, and blood specimens). Here we present the results, which provide a comprehensive view of functional dysbiosis in the gut microbiome during inflammatory bowel disease activity. We demonstrate a characteristic increase in facultative anaerobes at the expense of obligate anaerobes, as well as molecular disruptions in microbial transcription (for example, among clostridia), metabolite pools (acylcarnitines, bile acids, and short-chain fatty acids), and levels of antibodies in host serum. Periods of disease activity were also marked by increases in temporal variability, with characteristic taxonomic, functional, and biochemical shifts. Finally, integrative analysis identified microbial, biochemical, and host factors central to this dysregulation. The study's infrastructure resources, results, and data, which are available through the Inflammatory Bowel Disease Multi'omics Database ( http://ibdmdb.org ), provide the most comprehensive description to date of host and microbial activities in inflammatory bowel diseases

    Machine learning-guided directed evolution for protein engineering

    Get PDF
    Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

    Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

    Get PDF
    An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis

    Immunogenomics of the Rhesus macaque, an animal model for HIV vaccine development

    Full text link
    Human Immunodeficiency Virus (HIV) is a lentivirus that causes Acquired Immunodeficiency Syndrome (AIDS) resulting in the progressive failure of the immune system. Due to its rapid replication rate and high mutation frequency, the virus is able to evade the immune system and thwart an efficacious response. Current HIV infection prophylaxes and therapeutics are not optimal and there is an urgent need to develop an efficacious HIV vaccine. Recently, high-throughput sequencing of the Immunoglobulin (Ig) repertoire from HIV-infected humans and immunized Rhesus macaques has led to important insights into vaccines against HIV-1. Further elucidation of the antibody response in these crucial animal studies will require substantially greater power to analyze the Ig repertoires than is currently possible. Reliable information on macaque Ig genes is insufficient due to the incompleteness of the whole genome sequence (WGS) and the inherent difficulty of obtaining complete Ig sequences due to its complex and repetitive nature. To address this issue, we have generated a high quality, annotated WGS with precisely annotated Ig loci from ten macaques. We used low error, synthetic long reads generated by Illumina TruSeq technology, Illumina 150bp, paired-end reads (110X coverage) and Irys genome mapping technology to assemble the genome de novo. We employed a bait-and-sequence strategy using human Ig probes to capture macaque Ig genes for the accurate assembly and annotation of Ig genes and alleles. Together, these data will generate a complete Rhesus macaque genome with detailed information on allelic diversity at the Ig loci. This study is essential for making the macaque a viable model for adaptive immunity. In addition, it will provide information on the similarities and differences between macaque and human Ig genes that will aid in the design and interpretation of vaccine studies

    Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species

    Get PDF
    Antigenic variation enables pathogens to avoid the host immune response by continual switching of surface proteins. The protozoan blood parasite Trypanosoma brucei causes human African trypanosomiasis ("sleeping sickness") across sub-Saharan Africa and is a model system for antigenic variation, surviving by periodically replacing a monolayer of variant surface glycoproteins (VSG) that covers its cell surface. We compared the genome of Trypanosoma brucei with two closely related parasites Trypanosoma congolense and Trypanosoma vivax, to reveal how the variant antigen repertoire has evolved and how it might affect contemporary antigenic diversity. We reconstruct VSG diversification showing that Trypanosoma congolense uses variant antigens derived from multiple ancestral VSG lineages, whereas in Trypanosoma brucei VSG have recent origins, and ancestral gene lineages have been repeatedly co-opted to novel functions. These historical differences are reflected in fundamental differences between species in the scale and mechanism of recombination. Using phylogenetic incompatibility as a metric for genetic exchange, we show that the frequency of recombination is comparable between Trypanosoma congolense and Trypanosoma brucei but is much lower in Trypanosoma vivax. Furthermore, in showing that the C-terminal domain of Trypanosoma brucei VSG plays a crucial role in facilitating exchange, we reveal substantial species differences in the mechanism of VSG diversification. Our results demonstrate how past VSG evolution indirectly determines the ability of contemporary parasites to generate novel variant antigens through recombination and suggest that the current model for antigenic variation in Trypanosoma brucei is only one means by which these parasites maintain chronic infections

    Exploring the mycobacteriophage metaproteome: Phage genomics as an educational platform

    Get PDF
    Bacteriophages are the most abundant forms of life in the biosphere and carry genomes characterized by high genetic diversity and mosaic architectures. The complete sequences of 30 mycobacteriophage genomes show them collectively to encode 101 tRNAs, three tmRNAs, and 3,357 proteins belonging to 1,536 "phamilies" of related sequences, and a statistical analysis predicts that these represent approximately 50% of the total number of phamilies in the mycobacteriophage population. These phamilies contain 2.19 proteins on average; more than half (774) of them contain just a single protein sequence. Only six phamilies have representatives in more than half of the 30 genomes, and only three - encoding tape-measure proteins, lysins, and minor tail proteins - are present in all 30 phages, although these phamilies are themselves highly modular, such that no single amino acid sequence element is present in all 30 mycobacteriophage genomes. Of the 1,536 phamilies, only 230 (15%) have amino acid sequence similarity to previously reported proteins, reflecting the enormous genetic diversity of the entire phage population. The abundance and diversity of phages, the simplicity of phage isolation, and the relatively small size of phage genomes support bacteriophage isolation and comparative genomic analysis as a highly suitable platform for discovery-based education. © 2006 Hatfull et al
    corecore