43 research outputs found

    Systematic investigation of global coordination among mRNA and protein in cellular society

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cell functions depend on molecules organized in the cellular society. Two basic components are mRNA molecules and proteins. The interactions within and between those two components are crucial for carrying out sophisticated cell functions. The interplay can be analyzed by comparing expression levels of mRNA and proteins. This is critical for understanding the molecular interactions, (post-) transcriptional regulations and conservation of co-expression between mRNAs and proteins. By using high-throughput transcriptome and proteome data, this study aims to systematically investigate the general picture of such expression correlations. We analyze four groups of correlations: (i) transcript levels of different genes, (ii) protein levels of different genes, (iii) mRNA levels with protein levels of different genes and (iv) mRNA levels with protein levels of same genes. This helps to obtain global insights into the stability and variability of co-expression and correlation of mRNA and protein levels.</p> <p>Results</p> <p>Analysis of the simultaneous co-expression of mRNAs and proteins yields mainly weak correlations. Therefore we introduce the concept of time-delayed co-expression patterns. Based on a time-course dataset, we obtain a high fraction of time-delayed correlations. In group (i), 67% of different transcripts are significantly correlated. At the protein level (ii), 68% of different proteins are significantly correlated. Comparison of the different molecular levels results in a 74% fraction of correlated transcript and protein levels of different genes (iii) and 56% for the same genes (iv). Furthermore, a higher fraction of protein levels (simultaneously 20% and short time-delayed 29%) is correlated than at the transcript level (10% and 18% respectively). Analysis of the dynamics of the correlation shows that correlation at the transcript level is largely passed to the protein level. In contrast, specific co-expression patterns are changed in multiple ways.</p> <p>Conclusions</p> <p>Our analysis reveals that the regulation of transcription and translation contains a time-delayed component. The correlation at the protein level is more synchronous or delayed by shorter time than those at the transcript level. This supports the hypothesis that a higher degree of direct physical interactions require a higher synchronicity between the interacting partners. The conservation of correlation between the transcript level (i) and the protein level (ii) sheds light on the processes underlying transcription, translation and regulation. A future investigation of the conditions of conservation will give comprehensive insights in the complexity of the regulatory mechanisms.</p

    The Uptake of Integrated Perinatal Prevention of Mother-to-Child HIV Transmission Programs in Low- and Middle-Income Countries: A Systematic Review

    Get PDF
    BACKGROUND: The objective of this review was to assess the uptake of WHO recommended integrated perinatal prevention of mother-to-child transmission (PMTCT) of HIV interventions in low- and middle-income countries. METHODS AND FINDINGS: We searched 21 databases for observational studies presenting uptake of integrated PMTCT programs in low- and middle-income countries. Forty-one studies on programs implemented between 1997 and 2006, met inclusion criteria. The proportion of women attending antenatal care who were counseled and who were tested was high; 96% (range 30-100%) and 81% (range 26-100%), respectively. However, the overall median proportion of HIV positive women provided with antiretroviral prophylaxis in antenatal care and attending labor ward was 55% (range 22-99%) and 60% (range 19-100%), respectively. The proportion of women with unknown HIV status, tested for HIV at labor ward was 70%. Overall, 79% (range 44-100%) of infants were tested for HIV and 11% (range 3-18%) of them were HIV positive. We designed two PMTCT cascades using studies with outcomes for all perinatal PMTCT interventions which showed that an estimated 22% of all HIV positive women attending antenatal care and 11% of all HIV positive women delivering at labor ward were not notified about their HIV status and did not participate in PMTCT program. Only 17% of HIV positive antenatal care attendees and their infants are known to have taken antiretroviral prophylaxis. CONCLUSION: The existing evidence provides information only about the initial PMTCT programs which were based on the old WHO PMTCT guidelines. The uptake of counseling and HIV testing among pregnant women attending antenatal care was high, but their retention in PMTCT programs was low. The majority of women in the included studies did not receive ARV prophylaxis in antenatal care; nor did they attend labor ward. More studies evaluating the uptake in current PMTCT programs are urgently needed

    Statistik für Transkriptionsfaktor Bindestellen

    No full text
    1\. Introduction I Count Statistics 2\. DNA Motifs 3\. Word Count Statistics 4\. Generating Functions 5\. TF Count Statistics 6\. cis-regulatory modules (CRMs) II Applications 7\. Count Statistics 8\. Co-Occurrences and Co- Operativity 9\. Simiarlity of DNA Motifs 10\. Clustering of PFMs 11\. Quality of Representation 12\. ConclusionsTranscription factors (TFs) play a key role in gene regulation. They interact with specific binding sites or motifs on the DNA sequence and regulate expression of genes downstream of these binding sites. In silico prediction of potential binding of a TF to a binding site is an important task in computational biology. From a statistical point of view, the DNA sequence is a long text consisting of four different letters ('A','C','G', and 'T'). The binding of a TF to the sequence corresponds to the occurence of a word in the sequence, e.g. 'AACCTC'. Hence, word count statistics can be applied to problems such as number of binding sites and distances between binding sites. The major problem in word count statistics are dependencies between sequence positions. These dependencies arise due to possible overlaps of words. So far, exact formulae to compute the count distribution of clustered occurrences only exist based on generating functions. We newly derive a recursive formula and use it to obtain a normal approximation. In fact, a TF does not bind to one single word but allows mismatches and substitutions. This is captured in a statistical model called Position Frequency Matrix (PFM). A PFM assigns a score to each position of the word and letter. If the summed score reaches a certain threshold, the TF is assumed to bind to that sequence region. In fact, one can transform this representation to a set of words which are bound by the TF. Unfortunately, enumeration of the set of words takes exponential costs. In addition, the set of words grows enourmously for longer binding sites (around 500,000 for a binding site of length 15). Hence, word count statistics and its approximations become inefficient and very inaccurate. Therefore, the need for new statistics and efficient algorithms arises. Instead of enumerating all words, we use a statistical representation - the PFM - and model dependencies explicitly. In fact, probabilities for overlaps are dependencies of the summed scores between two positions. Hence, we reduce the problem to computing the two dimensional convolution of the score distributions for each possible overlap and derive an exact formula for the variance of PFM counts. Furthermore, we found an accurate approximation for the distribution of the number of occurrences using a compound Poisson distribution. Our approximation outperforms all alternative approaches. In addition, we give Poisson statistics for the number of occurrences without overlaps such that other standard word count statistics (like distances between occurences) can be applied. Third, we develop statistics to compute the significance of co- occurrences and co-operativity among sets of TFs. Fourth, we use the variance to define a natural measure of similarity between DNA motifs. We explicitly state formulae for PFMs. Compared to standard approaches, it shows higher correlation with empirical data. It also allows to cluster sets of TFs and gives results comparable with more sophisticated clustering algorithms. Finally, we use this similarity measure to compute the representation quality of PFMs for a set of experimentally verified binding sites. Besides a threshold optimization method which significantly improves the quality of PFMs in Transfac and Jaspar, we can indeed select DNA motifs, which violate PFM assumptions and, therefore, cannot be reasonbly represented as PFMs.Transkriptionsfaktoren (TF) spielen eine entscheidende Rolle in der Regulation von Genen. Sie interagieren mit spezifischen Bindestellen oder Motifen auf der DNA Sequenz. Daher ist eine wichtige Aufgabe der Bioinformatik, potentielle Bindestellen von TF in silico vorherzusagen. Nimmt man einen statistischen Standpunkt ein, dann ist die DNA Sequenz ein langer Text bestehend aus vier verschiedenen Buchstaben 'A', 'C', 'G' und 'T' für die vier verschiedenen Basen. Bindet ein TF an eine Bindestelle, so ist dies gleichbedeutend damit, dass das Wort, welches die Bindestelle beschreibt, in dem Text vorkommt. Daher kann man für verschiedene Statistiken auf schon bekannte zurückgreifen und somit Fragen nach der Wahrscheinlichkeit eine bestimmte Anzahl von Wörtern zu beobachten oder der Distanz zwischen zwei Vorkommen beantworten. Jedoch tritt bei der Herleitung solcher Statistiken immer wieder das gleiche Problem auf: Die Wörter können überlappen. Daher entstehen Abhängigkeiten zwischen den zugrunde liegenden Zufallsvariablen. Dadurch gibts es z.B. bisher noch keine exakte Formel - die nicht auf erzeugenden Funktionen beruht - zum Berechnen der Wahrscheinlichkeit eine bestimmte Anzahl von nicht-überlappenden Wörtern zu sehen. Wir leiten diese Formel her und erhalten dadurch auch eine Normalverteilungs-Approximation. Leider bindet ein TF aber nicht nur ein an einzelnes Wort, sondern normalerweise gibt es innerhalb des Wortes Position, die Variationen zu lassen. Daher werden TF meist in dem statistischen Modell PFM dargestellt. Dieses Modell weist jedem Buchstaben auf jeder Position ein Gewicht zu. Wenn die Summe aller Gewichte für eine gegebene Sequenz der Länge des Motifs einen Schwellenwert übersteigt, so ist diese Sequenz eine Bindestelle. Daher kann man auch alle derartigen Wörter aufzählen und erhält so eine Menge von Wörtern, die ein Motif beschreibt. Allerding kann diese Menge sehr gross werden. Z.B. für ein Motif der Länge 15 ist die Anzahl normalerweise um die 500.000. Abgesehen davon, dass das Aufzählen der Wörter exponentielle Laufzeit hat, kommen auch die bekannten Statistiken bei einer so grossen Anzahl von Wörtern an ihre Grenzen. Das heisst, sie sind nur sehr aufwändig zu berechnen und die Näherungsergebnisse sind nicht sehr genau. Daher werden neue Statistiken und effiziente Algorithmen benötigt. Wir haben solche Statistiken entwickelt. Dabei nutzen wir aus, dass wir die Wahrscheinlichkeit für überlappende Bindestellen ausrechnen können ohne die Wörter aufzuzählen. Genauer gesagt, benutzen wir das PFM Modell um eine zwei- dimensionale Gewichtsverteilung zu berechnen. Von dieser können wir besagte Wahrscheinlichkeit ablesen. Von diesem Ergebnis ausgehend, leiten wir die exakte Varianz der Anzahl von Vorkommen her. Ausserdem können wir die Verteilung der Vorkommen durch eine zusammengesetzte Poisson Verteilung beschreiben. Simulationen zeigen, dass dies die beste bekannte Approximation ist. Auch können wir für nicht überlappende Vorkommen entsprechende Statistiken auf Basis einer Poisson Verteilung berechnen. Erweiterung auf mehrere verschiedene DNA Motife führt zur Berechnung der Signifikanz von gemeinsamen Vorkommen und der Kooperation von TF. Zusätzlich führen wir die Kovarianz als Maß für die Ähnlichkeit von DNA Motifen ein. Dadurch erhalten wir ein natürliches und vor allem generelles Ähnlichkeitsmaß, das nicht von einem speziellen Modell ausgeht. Explizite Formeln leiten wir für das PFM Modell her und Vergleich mit Simulationen und anderen Maßen zeigt, dass unser Maß tatsächlich die von uns definierte Ähnlichkeit am Besten wiedergibt. Ein verwandtes Maß verwenden wir zum Gruppieren von Klassen von TF. Auch hier zeigt ein Vergleich mit optimierten Gruppierungsalgorithmen, dass wir vergleichbar gute Ergebnisse erhalten. Schließlich nutzen wir die Ähnlichkeit, um herauszufinden, wie gut ein DNA Motif mit einem bestimmten Modell dargestellt werden kann. Hierfür berechnen wir die Kovarianz zwischen den experimentell verifizierten Sequenzen und dem Modell. Dies entspricht der Representations-Qualität von DNA Motif Modellen. Wiederum leiten wir für PFMs explizite Formeln her. Darauf basierend zeigen wir, dass die Qualität auch dafür genutzt werden kann, Modellparameter (in unserem Fall der Schwellenwert) zu optimieren. Außerdem zeigen wir, dass die Qualität für Motife, die den Annahmen des PFM Modells nicht entsprechen, auch signifikant niedriger ist

    Statistics for Co-Occurrence of DNA Motifs

    No full text

    Rural, Urban and Migrant Differences in Non-Communicable Disease Risk-Factors in Middle Income Countries: A Cross-Sectional Study of WHO-SAGE Data

    No full text
    Understanding how urbanisation and rural-urban migration influence risk-factors for non-communicable disease (NCD) is crucial for developing effective preventative strategies globally. This study compares NCD risk-factor prevalence in urban, rural and migrant populations in China, Ghana, India, Mexico, Russia and South Africa.Study participants were 39,436 adults within the WHO Study on global AGEing and adult health (SAGE), surveyed 2007–2010. Risk ratios (RR) for each risk-factor were calculated using logistic regression in country-specific and all country pooled analyses, adjusted for age, sex and survey design. Fully adjusted models included income quintile, marital status and education.Regular alcohol consumption was lower in migrant and urban groups than in rural groups (pooled RR and 95%CI: 0.47 (0.31–0.68); 0.58, (0.46–0.72), respectively). Occupational physical activity was lower (0.86 (0.72–0.98); 0.76 (0.65–0.85)) while active travel and recreational physical activity were higher (pooled RRs for urban groups; 1.05 (1.00–1.09), 2.36 (1.95–2.83), respectively; for migrant groups: 1.07 (1.0 -1.12), 1.71 (1.11–2.53), respectively). Overweight, raised waist circumference and diagnosed diabetes were higher in urban groups (1.19 (1.04–1.35), 1.24 (1.07–1.42), 1.69 (1.15–2.47), respectively). Exceptions to these trends exist: obesity indicators were higher in rural Russia; active travel was lower in urban groups in Ghana and India; and in South Africa, urban groups had the highest alcohol consumption.Migrants and urban dwellers had similar NCD risk-factor profiles. These were not consistently worse than those seen in rural dwellers. The variable impact of urbanisation on NCD risk must be considered in the design and evaluation of strategies to reduce the growing burden of NCDs globally
    corecore