47 research outputs found

    Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology

    Get PDF
    Non-negative matrix factorization is a useful tool for reducing the dimension of large datasets. This work considers simultaneous non-negative matrix factorization of multiple sources of data. In particular, we perform the first study that involves more than two datasets. We discuss the algorithmic issues required to convert the approach into a practical computational tool and apply the technique to new gene expression data quantifying the molecular changes in four tissue types due to different dosages of an experimental panPPAR agonist in mouse. This study is of interest in toxicology because, whilst PPARs form potential therapeutic targets for diabetes, it is known that they can induce serious side-effects. Our results show that the practical simultaneous non-negative matrix factorization developed here can add value to the data analysis. In particular, we find that factorizing the data as a single object allows us to distinguish between the four tissue types, but does not correctly reproduce the known dosage level groups. Applying our new approach, which treats the four tissue types as providing distinct, but related, datasets, we find that the dosage level groups are respected. The new algorithm then provides separate gene list orderings that can be studied for each tissue type, and compared with the ordering arising from the single factorization. We find that many of our conclusions can be corroborated with known biological behaviour, and others offer new insights into the toxicological effects. Overall, the algorithm shows promise for early detection of toxicity in the drug discovery process

    Discretization Provides a Conceptually Simple Tool to Build Expression Networks

    Get PDF
    Biomarker identification, using network methods, depends on finding regular co-expression patterns; the overall connectivity is of greater importance than any single relationship. A second requirement is a simple algorithm for ranking patients on how relevant a gene-set is. For both of these requirements discretized data helps to first identify gene cliques, and then to stratify patients

    Sterol 14α-demethylase mutation leads to amphotericin B resistance in Leishmania mexicana

    Get PDF
    Amphotericin B has emerged as the therapy of choice for use against the leishmaniases. Administration of the drug in its liposomal formulation as a single injection is being promoted in a campaign to bring the leishmaniases under control. Understanding the risks and mechanisms of resistance is therefore of great importance. Here we select amphotericin B-resistant Leishmania mexicana parasites with relative ease. Metabolomic analysis demonstrated that ergosterol, the sterol known to bind the drug, is prevalent in wild-type cells, but diminished in the resistant line, where alternative sterols become prevalent. This indicates that the resistance phenotype is related to loss of drug binding. Comparing sequences of the parasites’ genomes revealed a plethora of single nucleotide polymorphisms that distinguish wild-type and resistant cells, but only one of these was found to be homozygous and associated with a gene encoding an enzyme in the sterol biosynthetic pathway, sterol 14α-demethylase (CYP51). The mutation, N176I, is found outside of the enzyme’s active site, consistent with the fact that the resistant line continues to produce the enzyme’s product. Expression of wild-type sterol 14α-demethylase in the resistant cells caused reversion to drug sensitivity and a restoration of ergosterol synthesis, showing that the mutation is indeed responsible for resistance. The amphotericin B resistant parasites become hypersensitive to pentamidine and also agents that induce oxidative stress. This work reveals the power of combining polyomics approaches, to discover the mechanism underlying drug resistance as well as offering novel insights into the selection of resistance to amphotericin B itself

    Discretized and correlation networks share many relationships.

    No full text
    <p>Tabular Venn-diagrams show the shared information between networks constructed using discretization and correlation methods; both methods were applied to the two subsets of the SAFHS. The networks from each subset, for each method, were compared and only the gene-pairs found in both subsets were used for the comparison. The comparison between discretized and correlation networks is described in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018634#s4" target="_blank">Methods</a>. All duplicate gene-pairs, resulting from multiple probes, were eliminated – leaving only one gene-pair for each relationship; here the direction of the <b><i>pm</i></b> relations is ignored. The size of each resulting network is included, in brackets.</p

    The use of independent studies to increase specificity in network determination.

    No full text
    <p>SynTReN was used to build a synthetic dataset of 400 samples, these were randomly subdivided into two subsets of 200 each. The discretization-based co-expression networks were calculated for each and the shared edges used to give a third network. The 10% of the genes with the lowest variance were selected and the possible gene-pairs for those determined, all of these genes were not defined by <b>ac</b>, <b>du</b> or <b>re</b> relationships. The low-variance based gene-pairs detected are preferentially discarded by this procedure, suggesting that this is one reasonable technique for discarding false relationships.</p

    Effectiveness of correlation network as a filter.

    No full text
    <p>The discretization analysis was performed at two levels of “bio-noise” 0.1 and 0.5. Positive correlation was used as a filter to remove edges not present by correlation from <b><i>pp</i></b> and <b><i>mm</i></b> networks. Negative correlation at the three <b><i>r</i></b> levels was required for <b><i>pm</i></b> edges to be retained. With 0.1 noise, correlation removes almost no TRUE edges while removing most of the FALSE (bgr_) pairs.</p

    Effect of changing Z-score on Analysed Network Estimation.

    No full text
    <p>The SynTReN simulated data for 100 samples was analysed using different Z-scores to select up- and down-regulated genes. Although the specificity increased at higher Z-scores the sensitivity was lower. Our strategy in looking for bio-markers is to accept relationships with lower significance at this stage but subsequently require that any useful pattern or clique is highly connected. In real situations, it is also important to require that the cliques are found in independent datasets. Our decision not to look at lower Z-scores than 0.4 is based on pragmatic biomarker requirements, where changes in expression have to be robust and indicate changes likely to be found by other methods.</p

    Discretized networks carry consistent information.

    No full text
    <p>Networks were constructed from discretized (Z = 0.4) data for all the Cheung and Spielman subjects, with the total number of edges shown in brackets. The left-hand 2 columns show the number of shared edges for un-shuffled discretized gene-sample data, while the right-hand 3 columns give the result of the comparison between the un-shuffled and shuffled gene-sample networks. Randomization was carried out for each row of the gene-sample discretized table using the R-package function “sample”.</p

    Comparison of discretized networks from 2 subsets of SAFHS subjects.

    No full text
    <p>“Duplicate” information is discarded in these comparisons; reasons for duplication include multiple probesets for single genes and in the <b><i>pm</i></b> networks relationships going in both directions. Networks were constructed by the discretizion (Z = 0.4) or correlation methods from two randomly selected sample subsets of the SAFHS dataset. The number of edges in each of the networks is given in brackets (×10<sup>3</sup>).</p

    Discretized networks carry consistent information.

    No full text
    <p>The networks were derived from discretized data (Z = 0.4) for both the SAFHS (S) and the Cheung and Spielman (C). For comparison purposes the platform specific identifiers were converted to gene-names and any resulting probe-set redundancy eliminated. Only the gene-names represented on both the Illumina and Affymetrix chips were used in this comparison. The numbers for comparisons between the different datasets are shown in bold.</p
    corecore