11 research outputs found

    Nonpolar Solvation Free Energy from Proximal Distribution Functions

    No full text
    Using precomputed near neighbor or proximal distribution functions (pDFs) that approximate solvent density about atoms in a chemically bonded context one can estimate the solvation structures around complex solutes and the corresponding solute–solvent energetics. In this contribution, we extend this technique to calculate the solvation free energies (Ī”<i>G</i>) of a variety of solutes. In particular we use pDFs computed for small peptide molecules to estimate Ī”<i>G</i> for larger peptide systems. We separately compute the non polar (Ī”<i>G</i><sub>vdW</sub>) and electrostatic (Ī”<i>G</i><sub>elec</sub>) components of the underlying potential model. Here we show how the former can be estimated by thermodynamic integration using pDF-reconstructed solute–solvent interaction energy. The electrostatic component can be approximated with Linear Response theory as half of the electrostatic solute–solvent interaction energy. We test the method by calculating the solvation free energies of butane, propanol, polyalanine, and polyglycine and by comparing with traditional free energy simulations. Results indicate that the pDF-reconstruction algorithm approximately reproduces Ī”<i>G</i><sub>vdW</sub> calculated by benchmark free energy simulations to within ∼ kcal/mol accuracy. The use of transferable pDFs for each solute atom allows for a rapid estimation of Ī”<i>G</i> for arbitrary molecular systems

    Asymmetric generalized coherent states fit the transcript length distributions of the human and yeast global sets.

    No full text
    <p>(<i>a</i>) The overall transcript profile of the human global set, i.e., the sum of the profiles of the human transcripts (line-joined), is approximately proportional to the asymmetric generalized coherent state of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e036" target="_blank">Equation (4</a>) with , i.e., the asymmetric Gaussian (dashed and shaded), with the equilibrium at the migration distance of 84 mm, where the correlation is 0.99. Graphs of describe the contributions of the subsets of transcript profiles, which peaks are at the migration distances of 124 (red) through 34 (violet) mm, to the overall transcript profile of the human global set. (<i>b</i>) The profiles of the human genes <i>COX7A2</i> (green), <i>CDK4</i> (blue) and <i>PFKP</i> (red) are approximately proportional to the asymmetric Gaussians (dashed and shaded) centered at the migration distances of 106, 86 and 72 mm, where the correlations are 0.99, 0.88 and 0.73, respectively. The transcript of <i>COX7A2</i>, which is involved in mitochondrial metabolism, is overexpressed in both the normal brain and GBM tumor, at each of the overexpression cutoffs of . The transcript of <i>CDK4</i> is overexpressed in the GBM tumor only. The transcript of <i>PFKP</i>, which is involved in glucose metabolism, is overexpressed in the normal brain only. (<i>c</i>) The overall transcript profile of the yeast global set (line-joined) is approximately proportional to the asymmetric Gaussian (dashed and shaded), with the equilibrium at the migration distance of 78 mm. Graphs of describe the contributions of the subsets of transcript profiles, which peaks are at the migration distances of 96 (red) through 42 (violet) mm, to the overall transcript profile of the yeast global set. (<i>d</i>) The profiles of the yeast genes <i>COX9</i> (green), <i>CDC28</i> (blue) and <i>PFK2</i> (red) are approximately proportional to the asymmetric Gaussian (dashed and shaded) centered at the migration distances of 90, 74 and 52 mm, where the correlations are 0.96, 0.83 and 0.89, respectively. Note that <i>COX9</i> is involved in mitochondrial metabolism, whereas <i>PFK2</i> is involved in glucose metabolism.</p

    SVD Identifies Transcript Length Distribution Functions from DNA Microarray Data and Reveals Evolutionary Forces Globally Affecting GBM Metabolism

    Get PDF
    <div><p>To search for evolutionary forces that might act upon transcript length, we use the singular value decomposition (SVD) to identify the length distribution functions of sets and subsets of human and yeast transcripts from profiles of mRNA abundance levels across gel electrophoresis migration distances that were previously measured by DNA microarrays. We show that the SVD identifies the transcript length distribution functions as ā€œasymmetric generalized coherent statesā€ from the DNA microarray data and with no <i>a-priori</i> assumptions. Comparing subsets of human and yeast transcripts of the same gene ontology annotations, we find that in both disparate eukaryotes, transcripts involved in protein synthesis or mitochondrial metabolism are significantly shorter than typical, and in particular, significantly shorter than those involved in glucose metabolism. Comparing the subsets of human transcripts that are overexpressed in glioblastoma multiforme (GBM) or normal brain tissue samples from The Cancer Genome Atlas, we find that GBM maintains normal brain overexpression of significantly short transcripts, enriched in transcripts that are involved in protein synthesis or mitochondrial metabolism, but suppresses normal overexpression of significantly longer transcripts, enriched in transcripts that are involved in glucose metabolism and brain activity. These global relations among transcript length, cellular metabolism and tumor development suggest a previously unrecognized physical mode for tumor and normal cells to differentially regulate metabolism in a transcript length-dependent manner. The identified distribution functions support a previous hypothesis from mathematical modeling of evolutionary forces that act upon transcript length in the manner of the restoring force of the harmonic oscillator.</p></div

    Eigenvectors and overall transcript profiles of the length distribution data of the subsets of human transcripts overexpressed in either the normal brain only, the GBM tumor only or both.

    No full text
    <p>(<i>a</i>) Line-joined graphs of the first (red), second (orange), third (green), fourth (blue) and fifth (violet) most significant eigenvectors of the subset of human transcripts that are most abundant in the normal brain but not the GBM tumor (including, e.g., <i>PFKP</i>), at the overexpression cutoff of . The th eigenvector is approximately proportional to the <i>q</i>th asymmetric Hermite function, where the correlation is in the range of 0.6 to 0.93. The inflection points of the th eigenvector approximately sample the asymmetric parabola (dashed and shaded). The equilibrium of the asymmetric parabola, and therefore also of the corresponding transcript length distribution function, is shifted from that of the human global set to the lesser migration distance of 80 mm and greater transcript length of 1,875±100 nt. (<i>b</i>) Eigenvectors of the subset of transcripts that are most abundant in the GBM tumor but not the normal brain (including, e.g., <i>CDK4</i>), at the cutoff of . The equilibrium is shifted from those of the normal brain only subset and global set to the greater migration distance of 90 mm and lesser transcript length of 1,375±100 nt. The width of the corresponding length distribution function of the tumor only subset is lesser than that of the normal only subset, where the asymmetry of the generalized Hooke's constant of the GBM tumor only subset is twice that in the normal brain only subset, while the magnitude <i>k</i> is similar. (<i>c</i>) Eigenvectors of the subset of transcripts that are most abundant in both the normal and tumor (including, e.g., <i>COX7A2</i>), at the cutoff of . The equilibrium is shifted to the greater migration distance of 96 mm and lesser transcript length of 1,125±75 nt. The width is lesser than those of the normal only subset as well as the tumor only subset, where the asymmetry is four times that in the normal only subset, while the magnitude is similar. (<i>d</i>) The asymmetric parabolas that fit the inflection points of the eigenvectors of the length distribution data of the subsets of human transcripts overexpressed in either the normal only (red and shaded), the tumor only (blue and shaded) or both (green and shaded). The equilibria of these parabolas are at increasing migration distances, corresponding to decreasing transcript lengths, and with decreasing widths. (<i>e</i>) The overall transcript profile of the subset of human transcripts that are most abundant in the normal brain only, i.e., the sum of the profiles of these transcripts (line-joined), is approximately proportional to the asymmetric Gaussian (dashed and shaded), with the equilibrium at the migration distance of 80 mm, where the correlation is >0.99. (<i>f</i>) The overall profile of the subset of human transcripts that are most abundant in the tumor only (line-joined) is approximately proportional to the asymmetric Gaussian (dashed and shaded), with the equilibrium at 90 mm. (<i>g</i>) The overall profile of the subset of human transcripts that are most abundant in both the normal and tumor (line-joined) is approximately proportional to the asymmetric Gaussian (dashed and shaded), with the equilibrium at 96 mm. (<i>h</i>) The asymmetric Gaussians that fit the overall transcript profiles of the length distribution data of the subsets of human transcripts overexpressed in either the normal only (red and shaded), the tumor only (blue and shaded) or both (green and shaded). The equilibria of these Gaussians are at increasing migration distances, corresponding to decreasing transcript lengths.</p

    Human and yeast subsets of average transcript lengths significantly greater than that of the corresponding respiratory electron transport chain (ETC) subset.

    No full text
    <p>The <i>P</i>-value of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e181" target="_blank">Equation (12</a>) is calculated for the average transcript length in nucleotides of each human or yeast subset of genes relative to the average transcript lengths of ā€Š=ā€Š1,460 and 995 nt of the human and yeast respiratory ETC subsets of ā€Š=ā€Š55 and 22 transcripts, respectively. The subsets of human transcripts that are most abundant in either the GBM tumor only or the normal brain only are considered at each of the overexpression cutoffs of .</p

    The SVD of the transcript length distribution data of the human and yeast global sets and protein synthesis subsets.

    No full text
    <p>(<i>a</i>) Raster display of the eigenvectors of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e006" target="_blank">Equation (1</a>) of the human global set, i.e., patterns of mRNA abundance level variation across the 50 human DNA microarrays, with overabundance (red), no change in abundance (black) and underabundance (green) around the ā€œground stateā€ of abundance, which is captured by the first, most significant eigenvector. The inflection points of the th eigenvector approximately sample the asymmetric parabola (blue), where is the generalized Hooke's constant of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e029" target="_blank">Equation (3</a>). (<i>b</i>) Bar chart of the corresponding eigenvalue fractions , with the normalized Shannon entropy . The eigenvalues and eigenvalue fractions approximately fit the geometric series (blue), with . (<i>c</i>) Line-joined graphs of the first (red), second (orange), third (green), fourth (blue) and fifth (violet) most significant eigenvectors of the human global set. The th eigenvector is approximately proportional to the <i>q</i>th asymmetric Hermite function of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e025" target="_blank">Equation (2</a>), where the correlation is in the range of 0.75 to 0.84. The equilibrium of the asymmetric parabola (dashed and shaded), and therefore also of the corresponding transcript length distribution function, is at the gel migration distance of 84 mm, corresponding to a transcript length of 1,700±100 nt. The asymmetry is . (<i>d</i>) Graphs of the first (red) through fifth (violet) eigenvectors of the human translation (GO:0006412) subset. The equilibrium is shifted from that of the human global set to the greater migration distance of 96 mm and lesser transcript length of 1,125±75 nt. The width is lesser than that of the human global set, where the magnitude <i>k</i> of the generalized Hooke's constant is twice that of the global set, while the asymmetry <i>s</i> is similar. (<i>e</i>) Eigenvectors of the human ribosome (GO:0005840) subset. The equilibrium is shifted from those of the global set and translation subset to the greater migration distance of 100 mm and lesser transcript length of 975±75 nt. The width is lesser than those of the global set or translation subset, where <i>k</i> is three times that of the global set, while <i>s</i> is similar. (<i>f</i>) Raster display of the eigenvectors of the yeast global set. (<i>g</i>) Bar chart of the corresponding eigenvalue fractions. The eigenvalues and eigenvalue fractions approximately fit the geometric series (blue), with for the yeast global set. (<i>h</i>) Line-joined graphs of the first (red) through fifth (violet) eigenvectors of the yeast global set. The th eigenvector is approximately proportional to the <i>q</i>th asymmetric Hermite function, where the correlation is in the range of 0.85 to 0.92. The equilibrium of the transcript length distribution function of the global yeast set is at the gel migration distance of 78 mm and the transcript length of 1,025±100 nt. The asymmetry is similar to that of the human global set. (<i>i</i>) Eigenvectors of the yeast translation subset. The equilibrium is shifted from that of the yeast global set to the greater migration distance of 84 mm and lesser transcript length of 775±75 nt. The width is lesser than that of the yeast global set, where the magnitude <i>k</i> of the generalized Hooke's constant is twice that of the global set, while the asymmetry <i>s</i> is similar. (<i>j</i>) Eigenvectors of the yeast ribosome subset. The equilibrium is similar to that of the yeast translation subset. The width is lesser than those of the global set or translation subset, where <i>k</i> is three times that of the global set, while <i>s</i> is similar.</p

    Typical gene ontology (GO) annotations significantly enriching the human subsets of transcripts and genes overexpressed in both the GBM tumor and normal brain, the normal brain overall or the normal brain only.

    No full text
    <p>The <i>P</i>-value of a given enrichment is calculated assuming hypergeometric probability distribution of the annotations among the transcripts or genes in the global set, and of the subset of annotations among the subset of transcripts or genes, . These enrichments of the subsets at the overexpression cutoff of are consistent with the enrichments of the corresponding subsets at the overexpression cutoffs of (Table S2 in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.s001" target="_blank">Appendix S1</a>). None of the multiple GO annotations consistently enrich the human subsets of transcripts and genes that are overexpressed in the GBM tumor only. None of the multiple GO annotations consistently enrich the human subsets of transcripts and genes that are overexpressed in the GBM tumor overall beyond those that enrich the subsets that are overexpressed in both the GBM tumor and normal brain.</p

    The SVD identifies the length distribution functions of the human and yeast global sets and subsets of transcripts as asymmetric generalized coherent states from the DNA microarray data and with no <i>a-priori</i> assumptions.

    No full text
    <p>In general, it is not necessarily possible to identify a distribution function from data that sample the function. This is because identifying a distribution function is mathematically equivalent to estimating the <i>infinite</i> number of moments that are associated with the function. The SVD of data that sample a distribution function, however, may approximately identify the distribution function from the data and with no <i>a-priori</i> assumptions. This is because identifying a distribution function is also equivalent to estimating its eigenfunctions and corresponding eigenvalues. (<i>a</i>) The SVD of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e006" target="_blank">Equation (1</a>) of the matrix <i>D</i> that tabulates the mRNA abundance levels of the human global set of transcripts, in increasing order of the transcript lengths as determined by Hurowitz <i>et al</i>, across <i>X</i> gel electrophoresis migration distances, uncovers <i>X</i> unique left singular vectors, <i>X</i> corresponding singular values and <i>X</i> corresponding right singular vectors. The orthonormal right singular vectors are also eigenvectors of the matrix , with the eigenvalues proportional to the singular values. The <i>finite</i> (and, possibly, few) most significant eigenvectors and corresponding eigenvalues – most significant in terms of the fractions of the information that they capture in the data – may approximate the data. (<i>b</i>) The <i>finite</i> and few most significant eigenvectors uncovered by the SVD of the human global transcript length distribution data fit a series of orthogonal asymmetric Hermite functions, where the th eigenvector is proportional to the <i>q</i>th asymmetric Hermite function of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e025" target="_blank">Equations (2</a>) and (3). (<i>c</i>) The corresponding eigenvalues and eigenvalue fractions fit a corresponding geometric series. (<i>d</i>) The series of asymmetric Hermite functions and the corresponding geometric series are known to be among the eigenfunctions and corresponding eigenvalues, respectively, of the asymmetric generalized coherent state of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e036" target="_blank">Equations (4</a>) and (5). Therefore, the asymmetric generalized coherent state, where each transcript's profile fits an asymmetric Gaussian, and where the distribution of the peaks of these profiles also fits an asymmetric Gaussian, is identified by the SVD as the distribution function that the data sample.</p

    Average transcript and gene lengths of the human subsets overexpressed in the normal brain or the GBM tumor.

    No full text
    <p>(<i>a</i>) Average transcript lengths of the human subsets that are overexpressed in the normal brain only (red), the normal brain overall (violet), the GBM tumor only (blue), the GBM tumor overall (orange) or both the normal brain and GBM tumor (green), at each of the overexpression cutoffs of , relative to the average transcript length of the global set of 4,109 transcripts (black). (<i>b</i>) Average maximum gene lengths of the human subsets that are overexpressed in the normal brain or the GBM tumor at each of the cutoffs, relative to the average maximum gene length of the global set of 11,631 genes. (<i>c</i>) Average minimum gene lengths of the human subsets relative to that of the global set.</p

    Overall transcript profiles and Venn diagrams of the subsets of human transcripts overexpressed in the normal brain or the GBM tumor.

    No full text
    <p>(<i>a</i>) The overall transcript profiles of the subsets of human transcripts that are most abundant in the normal brain only (red), the normal brain overall (violet), the GBM tumor only (blue), the GBM tumor overall (orange) or both the normal brain and GBM tumor (green). The equilibria of the profiles of the normal only subset, the human global set, the tumor only subset and the subset of transcripts that are overexpressed in both the normal and tumor are at the increasing migration distances of 80 (red), 84 (black), 90 (blue) and 96 (green) mm, spanning a difference of 16 mm of gel migration distance (shaded), and corresponding to decreasing transcript lengths. (<i>b</i>) The average transcript lengths of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e161" target="_blank">Equation (8</a>) of the subsets of <i>M</i> transcripts each that are most abundant in the normal only (red), the normal overall (violet), the tumor only (blue), the tumor overall (orange) or both the normal and tumor (green), relative to the average transcript length of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e156" target="_blank">Equation (6</a>) of the human global set of <i>N</i> transcripts, at the overexpression cutoff of . The relation between a gene's overexpression in either the normal overall, the tumor only, the tumor overall or both the normal and tumor and a transcript that is shorter than typical is statistically significant, with the <i>P</i>-value of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e169" target="_blank">Equation (11</a>) <0.05 for the observed differences in the average transcript lengths of these subsets and that of the human global set (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone-0078913-t001" target="_blank">Table 1</a>). (<i>c</i>) The overall transcript profiles of the subsets of human transcripts that are most abundant in the normal brain only (red), the normal brain overall (violet), the GBM tumor overall (orange) or both the normal brain and GBM tumor (green). (<i>d</i>) The average transcript length differences of the subsets of <i>L</i> transcripts each that are most abundant in the normal only (red), the tumor overall (orange) or both the normal and tumor (green), relative to the average transcript length of the normal overall subset of <i>M</i> transcripts, at the overexpression cutoff of . The relations between a gene's overexpression in the tumor overall or in both the normal and tumor and a transcript that is shorter than typical for a gene that is overexpressed in the normal overall are statistically significant, with the <i>P</i>-value of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e181" target="_blank">Equation (12</a>) <0.05 (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone-0078913-t002" target="_blank">Table 2</a>). Similarly, the relation between a gene's overexpression in the normal only and a transcript that is longer than typical for a gene that is overexpressed in the normal overall is statistically significant. (<i>e</i>) The overall transcript profiles of the subsets of human transcripts that are most abundant in the normal brain but not the GBM tumor (red) or in both the normal brain and GBM tumor (green). (<i>f</i>) The average transcript length differences of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e238" target="_blank">Equation (13</a>) of the subsets of <i>L</i> transcripts that are most abundant in the normal only (red) or in both the normal and tumor (green), relative to the average transcript length of the subsets of transcripts that are most abundant in both the normal and tumor (green) or in the normal only (red), respectively, at the overexpression cutoff of . The relation between a gene's overexpression in the normal brain but not the GBM tumor and a transcript that is longer than typical for a gene that is overexpressed in both the normal brain and GBM tumor is statistically significant, with the <i>P</i>-value of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0078913#pone.0078913.e242" target="_blank">Equation (15</a>) <0.05.</p
    corecore