41 research outputs found

    Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning

    Get PDF
    The growing body of DNA microarray data has the potential to advance our understanding of the molecular basis of disease. However annotating microarray datasets with clinically useful information is not always possible, as this often requires access to detailed patient records. In this study we introduce GLAD, a new Semi-Supervised Learning (SSL) method for combining independent annotated datasets and unannotated datasets with the aim of identifying more robust sample classifiers

    Genomic Regulatory Networks, Reduction Mappings and Control

    Get PDF
    All high-level living organisms are made of small cell units, containing DNA, RNA, genes, proteins etc. Genes are important components of the cells and it is necessary to understand the inter-gene relations, in order to comprehend, predict and ultimately intervene in the cells’ dynamics. Genetic regulatory networks (GRN) represent the gene interactions that dictate the cell behavior. Translational genomics aims to mathematically model GRNs and one of the main goals is to alter the networks’ behavior away from undesirable phenotypes such as cancer. The mathematical framework that has been often used for modeling GRNs is the probabilistic Boolean network (PBN), which is a collection of constituent Boolean networks with perturbation, BNp. This dissertation uses BNps, to model gene regulatory networks with an intent of designing stationary control policies (CP) for the networks to shift their dynamics toward more desirable states. Markov Chains (MC) are used to represent the PBNs and stochastic control has been employed to find stationary control policies to affect steady-state distribution of the MC. However, as the number of genes increases, it becomes computationally burdensome, or even infeasible, to derive optimal or greedy intervention policies. This dissertation considers the problem of modeling and intervening in large GRNs. To overcome the computational challenges associated with large networks, two approaches are proposed: first, a reduction mapping that deletes genes from the network; and second, a greedy control policy that can be directly designed on large networks. Simulation results show that these methods achieve the goal of controlling large networks by shifting the steady-state distribution of the networks toward more desirable states. Furthermore, a new inference method is used to derive a large 17-gene Boolean network from microarray experiments on gastrointestinal cancer samples. The new algorithm has similarities to a previously developed well-known inference method, which uses seed genes to grow subnetworks, out of a large network; however, it has major differences with that algorithm. Most importantly, the objective of the new algorithm is to infer a network from a seed gene with an intention to derive the Gene Activity Profile toward more desirable phenotypes. The newly introduced reduction mappings approach is used to delete genes from the 17-gene GRN and when the network is small enough, an intervention policy is designed for the reduced network and induced back to the original network. In another experiment, the greedy control policy approach is used to directly design an intervention policy on the large 17-gene network to beneficially change the long-run behavior of the network. Finally, a novel algorithm is developed for selecting only non-isomorphic BNs, while generating synthetic networks, using a method that generates synthetic BNs, with a prescribed set of attractors. The goal of the new method described in this dissertation is to discard isomorphic networks

    Anthracnose: The sophisticated rot

    Get PDF
    The mold fungus Colletotrichum graminicola causes anthracnose, one of the most economically damaging corn diseases worldwide. Anthracnose can occur either as a stalk rot (ASR), or a leaf blight (ALB) (4; 27). The leaf blight phase is generally insignificant in North America as a cause of yield loss, although in the tropics and subtropics it is much more important. Resistance to ASR is usually not correlated with resistance to ALB, complicating efforts to breed resistant corn varieties (2; 4). Resistance to ASR and ALB is mostly quantitative, although sources of major gene resistance have been described (10; 29). Hybrids containing some of these major-gene resistance sources are likely to become available for management of ASR in the near future

    A \u3cem\u3eColletotrichum graminicola\u3c/em\u3e Mutant Deficient in the Establishment of Biotrophy Reveals Early Transcriptional Events in the Maize Anthracnose Disease Interaction

    Get PDF
    Background: Colletotrichum graminicola is a hemibiotrophic fungal pathogen that causes maize anthracnose disease. It progresses through three recognizable phases of pathogenic development in planta: melanized appressoria on the host surface prior to penetration; biotrophy, characterized by intracellular colonization of living host cells; and necrotrophy, characterized by host cell death and symptom development. A “Mixed Effects” Generalized Linear Model (GLM) was developed and applied to an existing Illumina transcriptome dataset, substantially increasing the statistical power of the analysis of C. graminicola gene expression during infection and colonization. Additionally, the in planta transcriptome of the wild-type was compared with that of a mutant strain impaired in the establishment of biotrophy, allowing detailed dissection of events occurring specifically during penetration, and during early versus late biotrophy. Results: More than 2000 fungal genes were differentially transcribed during appressorial maturation, penetration, and colonization. Secreted proteins, secondary metabolism genes, and membrane receptors were over-represented among the differentially expressed genes, suggesting that the fungus engages in an intimate and dynamic conversation with the host, beginning prior to penetration. This communication process probably involves reception of plant signals triggering subsequent developmental progress in the fungus, as well as production of signals that induce responses in the host. Later phases of biotrophy were more similar to necrotrophy, with increased production of secreted proteases, inducers of plant cell death, hydrolases, and membrane bound transporters for the uptake and egress of potential toxins, signals, and nutrients. Conclusions: This approach revealed, in unprecedented detail, fungal genes specifically expressed during critical phases of host penetration and biotrophic establishment. Many encoded secreted proteins, secondary metabolism enzymes, and receptors that may play roles in host-pathogen communication necessary to promote susceptibility, and thus may provide targets for chemical or biological controls to manage this important disease. The differentially expressed genes could be used as ‘landmarks’ to more accurately identify developmental progress in compatible versus incompatible interactions involving genetic variants of both host and pathogen

    Whole-Genome sequencing and genetic variant analysis of a quarter Horse mare

    Get PDF
    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids

    Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens

    Get PDF
    BACKGROUND: RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq. RESULTS: Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads. CONCLUSION: The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage

    Intervention in gene regulatory networks via greedy control policies based on long-run behavior

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A salient purpose for studying gene regulatory networks is to derive intervention strategies, the goals being to identify potential drug targets and design gene-based therapeutic intervention. Optimal stochastic control based on the transition probability matrix of the underlying Markov chain has been studied extensively for probabilistic Boolean networks. Optimization is based on minimization of a cost function and a key goal of control is to reduce the steady-state probability mass of undesirable network states. Owing to computational complexity, it is difficult to apply optimal control for large networks.</p> <p>Results</p> <p>In this paper, we propose three new greedy stationary control policies by directly investigating the effects on the network long-run behavior. Similar to the recently proposed mean-first-passage-time (MFPT) control policy, these policies do not depend on minimization of a cost function and avoid the computational burden of dynamic programming. They can be used to design stationary control policies that avoid the need for a user-defined cost function because they are based directly on long-run network behavior; they can be used as an alternative to dynamic programming algorithms when the latter are computationally prohibitive; and they can be used to predict the best control gene with reduced computational complexity, even when one is employing dynamic programming to derive the final control policy. We compare the performance of these three greedy control policies and the MFPT policy using randomly generated probabilistic Boolean networks and give a preliminary example for intervening in a mammalian cell cycle network.</p> <p>Conclusion</p> <p>The newly proposed control policies have better performance in general than the MFPT policy and, as indicated by the results on the mammalian cell cycle network, they can potentially serve as future gene therapeutic intervention strategies.</p

    The relationship between ultra processed food consumption and premature coronary artery disease: Iran premature coronary artery disease study (IPAD)

    Get PDF
    BackgroundUltra-processed foods (UPF) consumption may affect the risk of PCAD through affecting cardio metabolic risk factors. This study aimed to evaluate the association between UPFs consumption and premature coronary artery disease (PCAD).MethodsA case–control study was conducted on 2,354 Iranian adults (≥ 19 years). Dietary intake was assessed using a validated 110-item food frequency questionnaire (FFQ) and foods were classified based on the NOVA system, which groups all foods according to the nature, extent and purposes of the industrial processes they undergo. PCAD was defined as having an stenosis of at least single coronary artery equal and above 75% or left main coronary of equal or more than 50% in women less than 70 and men less than 60 years, determined by angiography. The odds of PCAD across the tertiles of UPFs consumption were assessed by binary logistic regression.ResultsAfter adjustment for potential confounders, participants in the top tertile of UPFs were twice as likely to have PCAD compared with those in the bottom tertile (OR: 2.52; 95% CI: 1.97–3.23). Moreover, those in the highest tertile of the UPFs consumption had more than two times higher risk for having severe PCAD than those in the first tertile (OR: 2.64; 95% CI: 2.16–3.22). In addition, there was a significant upward trend in PCAD risk and PCAD severity as tertiles increased (P-trend &lt; 0.001 for all models).ConclusionHigher consumption of UPFs was related to increased risk of PCAD and higher chance of having severe PCAD in Iranian adults. Although, future cohort studies are needed to confirm the results of this study, these findings indicated the necessity of reducing UPFs intake
    corecore