43 research outputs found
Systematic investigation of global coordination among mRNA and protein in cellular society
<p>Abstract</p> <p>Background</p> <p>Cell functions depend on molecules organized in the cellular society. Two basic components are mRNA molecules and proteins. The interactions within and between those two components are crucial for carrying out sophisticated cell functions. The interplay can be analyzed by comparing expression levels of mRNA and proteins. This is critical for understanding the molecular interactions, (post-) transcriptional regulations and conservation of co-expression between mRNAs and proteins. By using high-throughput transcriptome and proteome data, this study aims to systematically investigate the general picture of such expression correlations. We analyze four groups of correlations: (i) transcript levels of different genes, (ii) protein levels of different genes, (iii) mRNA levels with protein levels of different genes and (iv) mRNA levels with protein levels of same genes. This helps to obtain global insights into the stability and variability of co-expression and correlation of mRNA and protein levels.</p> <p>Results</p> <p>Analysis of the simultaneous co-expression of mRNAs and proteins yields mainly weak correlations. Therefore we introduce the concept of time-delayed co-expression patterns. Based on a time-course dataset, we obtain a high fraction of time-delayed correlations. In group (i), 67% of different transcripts are significantly correlated. At the protein level (ii), 68% of different proteins are significantly correlated. Comparison of the different molecular levels results in a 74% fraction of correlated transcript and protein levels of different genes (iii) and 56% for the same genes (iv). Furthermore, a higher fraction of protein levels (simultaneously 20% and short time-delayed 29%) is correlated than at the transcript level (10% and 18% respectively). Analysis of the dynamics of the correlation shows that correlation at the transcript level is largely passed to the protein level. In contrast, specific co-expression patterns are changed in multiple ways.</p> <p>Conclusions</p> <p>Our analysis reveals that the regulation of transcription and translation contains a time-delayed component. The correlation at the protein level is more synchronous or delayed by shorter time than those at the transcript level. This supports the hypothesis that a higher degree of direct physical interactions require a higher synchronicity between the interacting partners. The conservation of correlation between the transcript level (i) and the protein level (ii) sheds light on the processes underlying transcription, translation and regulation. A future investigation of the conditions of conservation will give comprehensive insights in the complexity of the regulatory mechanisms.</p
The Uptake of Integrated Perinatal Prevention of Mother-to-Child HIV Transmission Programs in Low- and Middle-Income Countries: A Systematic Review
BACKGROUND: The objective of this review was to assess the uptake of WHO recommended integrated perinatal prevention of mother-to-child transmission (PMTCT) of HIV interventions in low- and middle-income countries. METHODS AND FINDINGS: We searched 21 databases for observational studies presenting uptake of integrated PMTCT programs in low- and middle-income countries. Forty-one studies on programs implemented between 1997 and 2006, met inclusion criteria. The proportion of women attending antenatal care who were counseled and who were tested was high; 96% (range 30-100%) and 81% (range 26-100%), respectively. However, the overall median proportion of HIV positive women provided with antiretroviral prophylaxis in antenatal care and attending labor ward was 55% (range 22-99%) and 60% (range 19-100%), respectively. The proportion of women with unknown HIV status, tested for HIV at labor ward was 70%. Overall, 79% (range 44-100%) of infants were tested for HIV and 11% (range 3-18%) of them were HIV positive. We designed two PMTCT cascades using studies with outcomes for all perinatal PMTCT interventions which showed that an estimated 22% of all HIV positive women attending antenatal care and 11% of all HIV positive women delivering at labor ward were not notified about their HIV status and did not participate in PMTCT program. Only 17% of HIV positive antenatal care attendees and their infants are known to have taken antiretroviral prophylaxis. CONCLUSION: The existing evidence provides information only about the initial PMTCT programs which were based on the old WHO PMTCT guidelines. The uptake of counseling and HIV testing among pregnant women attending antenatal care was high, but their retention in PMTCT programs was low. The majority of women in the included studies did not receive ARV prophylaxis in antenatal care; nor did they attend labor ward. More studies evaluating the uptake in current PMTCT programs are urgently needed
Statistik für Transkriptionsfaktor Bindestellen
1\. Introduction I Count Statistics 2\. DNA Motifs 3\. Word Count Statistics
4\. Generating Functions 5\. TF Count Statistics 6\. cis-regulatory modules
(CRMs) II Applications 7\. Count Statistics 8\. Co-Occurrences and Co-
Operativity 9\. Simiarlity of DNA Motifs 10\. Clustering of PFMs 11\. Quality
of Representation 12\. ConclusionsTranscription factors (TFs) play a key role in gene regulation. They interact
with specific binding sites or motifs on the DNA sequence and regulate
expression of genes downstream of these binding sites. In silico prediction of
potential binding of a TF to a binding site is an important task in
computational biology. From a statistical point of view, the DNA sequence is a
long text consisting of four different letters ('A','C','G', and 'T'). The
binding of a TF to the sequence corresponds to the occurence of a word in the
sequence, e.g. 'AACCTC'. Hence, word count statistics can be applied to
problems such as number of binding sites and distances between binding sites.
The major problem in word count statistics are dependencies between sequence
positions. These dependencies arise due to possible overlaps of words. So far,
exact formulae to compute the count distribution of clustered occurrences only
exist based on generating functions. We newly derive a recursive formula and
use it to obtain a normal approximation. In fact, a TF does not bind to one
single word but allows mismatches and substitutions. This is captured in a
statistical model called Position Frequency Matrix (PFM). A PFM assigns a
score to each position of the word and letter. If the summed score reaches a
certain threshold, the TF is assumed to bind to that sequence region. In fact,
one can transform this representation to a set of words which are bound by the
TF. Unfortunately, enumeration of the set of words takes exponential costs. In
addition, the set of words grows enourmously for longer binding sites (around
500,000 for a binding site of length 15). Hence, word count statistics and its
approximations become inefficient and very inaccurate. Therefore, the need for
new statistics and efficient algorithms arises. Instead of enumerating all
words, we use a statistical representation - the PFM - and model dependencies
explicitly. In fact, probabilities for overlaps are dependencies of the summed
scores between two positions. Hence, we reduce the problem to computing the
two dimensional convolution of the score distributions for each possible
overlap and derive an exact formula for the variance of PFM counts.
Furthermore, we found an accurate approximation for the distribution of the
number of occurrences using a compound Poisson distribution. Our approximation
outperforms all alternative approaches. In addition, we give Poisson
statistics for the number of occurrences without overlaps such that other
standard word count statistics (like distances between occurences) can be
applied. Third, we develop statistics to compute the significance of co-
occurrences and co-operativity among sets of TFs. Fourth, we use the variance
to define a natural measure of similarity between DNA motifs. We explicitly
state formulae for PFMs. Compared to standard approaches, it shows higher
correlation with empirical data. It also allows to cluster sets of TFs and
gives results comparable with more sophisticated clustering algorithms.
Finally, we use this similarity measure to compute the representation quality
of PFMs for a set of experimentally verified binding sites. Besides a
threshold optimization method which significantly improves the quality of PFMs
in Transfac and Jaspar, we can indeed select DNA motifs, which violate PFM
assumptions and, therefore, cannot be reasonbly represented as PFMs.Transkriptionsfaktoren (TF) spielen eine entscheidende Rolle in der Regulation
von Genen. Sie interagieren mit spezifischen Bindestellen oder Motifen auf der
DNA Sequenz. Daher ist eine wichtige Aufgabe der Bioinformatik, potentielle
Bindestellen von TF in silico vorherzusagen. Nimmt man einen statistischen
Standpunkt ein, dann ist die DNA Sequenz ein langer Text bestehend aus vier
verschiedenen Buchstaben 'A', 'C', 'G' und 'T' für die vier verschiedenen
Basen. Bindet ein TF an eine Bindestelle, so ist dies gleichbedeutend damit,
dass das Wort, welches die Bindestelle beschreibt, in dem Text vorkommt. Daher
kann man für verschiedene Statistiken auf schon bekannte zurückgreifen und
somit Fragen nach der Wahrscheinlichkeit eine bestimmte Anzahl von Wörtern zu
beobachten oder der Distanz zwischen zwei Vorkommen beantworten. Jedoch tritt
bei der Herleitung solcher Statistiken immer wieder das gleiche Problem auf:
Die Wörter können überlappen. Daher entstehen Abhängigkeiten zwischen den
zugrunde liegenden Zufallsvariablen. Dadurch gibts es z.B. bisher noch keine
exakte Formel - die nicht auf erzeugenden Funktionen beruht - zum Berechnen
der Wahrscheinlichkeit eine bestimmte Anzahl von nicht-überlappenden Wörtern
zu sehen. Wir leiten diese Formel her und erhalten dadurch auch eine
Normalverteilungs-Approximation. Leider bindet ein TF aber nicht nur ein an
einzelnes Wort, sondern normalerweise gibt es innerhalb des Wortes Position,
die Variationen zu lassen. Daher werden TF meist in dem statistischen Modell
PFM dargestellt. Dieses Modell weist jedem Buchstaben auf jeder Position ein
Gewicht zu. Wenn die Summe aller Gewichte für eine gegebene Sequenz der Länge
des Motifs einen Schwellenwert übersteigt, so ist diese Sequenz eine
Bindestelle. Daher kann man auch alle derartigen Wörter aufzählen und erhält
so eine Menge von Wörtern, die ein Motif beschreibt. Allerding kann diese
Menge sehr gross werden. Z.B. für ein Motif der Länge 15 ist die Anzahl
normalerweise um die 500.000. Abgesehen davon, dass das Aufzählen der Wörter
exponentielle Laufzeit hat, kommen auch die bekannten Statistiken bei einer so
grossen Anzahl von Wörtern an ihre Grenzen. Das heisst, sie sind nur sehr
aufwändig zu berechnen und die Näherungsergebnisse sind nicht sehr genau.
Daher werden neue Statistiken und effiziente Algorithmen benötigt. Wir haben
solche Statistiken entwickelt. Dabei nutzen wir aus, dass wir die
Wahrscheinlichkeit für überlappende Bindestellen ausrechnen können ohne die
Wörter aufzuzählen. Genauer gesagt, benutzen wir das PFM Modell um eine zwei-
dimensionale Gewichtsverteilung zu berechnen. Von dieser können wir besagte
Wahrscheinlichkeit ablesen. Von diesem Ergebnis ausgehend, leiten wir die
exakte Varianz der Anzahl von Vorkommen her. Ausserdem können wir die
Verteilung der Vorkommen durch eine zusammengesetzte Poisson Verteilung
beschreiben. Simulationen zeigen, dass dies die beste bekannte Approximation
ist. Auch können wir für nicht überlappende Vorkommen entsprechende
Statistiken auf Basis einer Poisson Verteilung berechnen. Erweiterung auf
mehrere verschiedene DNA Motife führt zur Berechnung der Signifikanz von
gemeinsamen Vorkommen und der Kooperation von TF. Zusätzlich führen wir die
Kovarianz als Maß für die Ähnlichkeit von DNA Motifen ein. Dadurch erhalten
wir ein natürliches und vor allem generelles Ähnlichkeitsmaß, das nicht von
einem speziellen Modell ausgeht. Explizite Formeln leiten wir für das PFM
Modell her und Vergleich mit Simulationen und anderen Maßen zeigt, dass unser
Maß tatsächlich die von uns definierte Ähnlichkeit am Besten wiedergibt. Ein
verwandtes Maß verwenden wir zum Gruppieren von Klassen von TF. Auch hier
zeigt ein Vergleich mit optimierten Gruppierungsalgorithmen, dass wir
vergleichbar gute Ergebnisse erhalten. Schließlich nutzen wir die Ähnlichkeit,
um herauszufinden, wie gut ein DNA Motif mit einem bestimmten Modell
dargestellt werden kann. Hierfür berechnen wir die Kovarianz zwischen den
experimentell verifizierten Sequenzen und dem Modell. Dies entspricht der
Representations-Qualität von DNA Motif Modellen. Wiederum leiten wir für PFMs
explizite Formeln her. Darauf basierend zeigen wir, dass die Qualität auch
dafür genutzt werden kann, Modellparameter (in unserem Fall der Schwellenwert)
zu optimieren. Außerdem zeigen wir, dass die Qualität für Motife, die den
Annahmen des PFM Modells nicht entsprechen, auch signifikant niedriger ist
Recommended from our members
Impacts of a national strategy to reduce population salt intake in England: serial cross sectional study.
BackgroundThe UK introduced an ambitious national strategy to reduce population levels of salt intake in 2003. The aim of this study was to evaluate the impact of this strategy on salt intake in England, including potential effects on health inequalities.MethodsSecondary analysis of data from the Health Survey for England. Our main outcome measure was trends in estimated daily salt intake from 2003-2007, as measured by spot urine. Secondary outcome measures were knowledge of government guidance and voluntary use of salt in food preparation over this time period.ResultsThere were significant reductions in salt intake between 2003 and 2007 (-0.175 grams per day per year, p<0.001). Intake decreased uniformly across all other groups but remained significantly higher in younger persons, men, ethnic minorities and lower social class groups and those without hypertension in 2007. Awareness of government guidance on salt use was lowest in those groups with the highest intake (semi-skilled manual v professional; 64.9% v 71.0% AOR 0.76 95% CI 0.58-0.99). Self reported use of salt added at the table reduced significantly during the study period (56.5% to 40.2% p<0.001). Respondents from ethnic minority groups remained significantly more likely to add salt during cooking (white 42.8%, black 74.1%, south Asian 88.3%) and those from lower social class groups (unskilled manual 46.6%, professional 35.2%) were more likely to add salt at the table.ConclusionsThe introduction a national salt reduction strategy was associated with uniform but modest reductions in salt intake in England, although it is not clear precisely which aspects of the strategy contributed to this. Knowledge of government guidance was lower and voluntary salt use and total salt intake was higher among occupational and ethnic groups at greatest risk of cardiovascular disease
Rural, Urban and Migrant Differences in Non-Communicable Disease Risk-Factors in Middle Income Countries: A Cross-Sectional Study of WHO-SAGE Data
Understanding how urbanisation and rural-urban migration influence risk-factors for non-communicable disease (NCD) is crucial for developing effective preventative strategies globally. This study compares NCD risk-factor prevalence in urban, rural and migrant populations in China, Ghana, India, Mexico, Russia and South Africa.Study participants were 39,436 adults within the WHO Study on global AGEing and adult health (SAGE), surveyed 2007–2010. Risk ratios (RR) for each risk-factor were calculated using logistic regression in country-specific and all country pooled analyses, adjusted for age, sex and survey design. Fully adjusted models included income quintile, marital status and education.Regular alcohol consumption was lower in migrant and urban groups than in rural groups (pooled RR and 95%CI: 0.47 (0.31–0.68); 0.58, (0.46–0.72), respectively). Occupational physical activity was lower (0.86 (0.72–0.98); 0.76 (0.65–0.85)) while active travel and recreational physical activity were higher (pooled RRs for urban groups; 1.05 (1.00–1.09), 2.36 (1.95–2.83), respectively; for migrant groups: 1.07 (1.0 -1.12), 1.71 (1.11–2.53), respectively). Overweight, raised waist circumference and diagnosed diabetes were higher in urban groups (1.19 (1.04–1.35), 1.24 (1.07–1.42), 1.69 (1.15–2.47), respectively). Exceptions to these trends exist: obesity indicators were higher in rural Russia; active travel was lower in urban groups in Ghana and India; and in South Africa, urban groups had the highest alcohol consumption.Migrants and urban dwellers had similar NCD risk-factor profiles. These were not consistently worse than those seen in rural dwellers. The variable impact of urbanisation on NCD risk must be considered in the design and evaluation of strategies to reduce the growing burden of NCDs globally