170 research outputs found
Spanish version of the Oral Health Impact Profile (OHIP-Sp)
BACKGROUND: The need for appraisal of oral health-related quality of life has been increasingly recognized over the last decades. The aims of this study were to develop a Spanish version (OHIP-Sp) of the Oral Health Impact Profile and to evaluate its convergent and discriminative validity, and its internal consistency. METHODS: The original 49-items OHIP was translated to Spanish, revised for understanding and semantics by two independent dentists, and then translated back to English by an independent bilingual dentist. The data originated in a cross sectional study conducted among high school students from the Province of Santiago, Chile. The study group was sampled using a multistage random cluster procedure yielding 9,203 students aged 12–21 years. All selected students were invited to participate and all filled a questionnaire with information on socio-demographic factors; oral health related behaviors; and self-reported oral health status (good, fair or poor). From this group, 9,163 students also accepted to fill a detailed questionnaire on socio-economic indicators and to receive a clinical examination comprising direct recordings of clinical attachment levels (CAL) in molars and incisors, tooth loss, and the presence of necrotizing ulcerative gingival lesions. RESULTS: The participation rate and the questionnaire completeness were high with OHIP-Sp total scores being computed for 9,133 subjects. Self-perceived oral health status was associated with the total OHIP-Sp score and all its domains (Spearman rank correlation). The OHIP-Sp total score was also directly associated with the 4 dental outcomes investigated (Mann-Whitney test) and the largest impact was found for the outcomes, 'tooth loss' with a mean OHIP-Sp score = 13.5 and 'CAL >= 3 mm' with a mean OHIP-Sp score = 13.0. CONCLUSION: The OHIP-Sp revealed suitable convergent and discriminative validity and appropriate internal consistency (Cronbach's α). Further studies on OHIP-Sp warrant the inclusion of populations with a higher disease burden; and the use of test-retest reliability exercises to evaluate the stability of the test
Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression
<p>Abstract</p> <p>Background</p> <p>When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison.</p> <p>Results</p> <p>The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR.</p> <p>Conclusions</p> <p>The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.</p
Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles
<p>Abstract</p> <p>Background</p> <p>Experimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by ranking newly-published articles' relevance to PPI, a task which we approach here by designing a machine-learning-based PPI classifier. All classifiers require labeled data, and the more labeled data available, the more reliable they become. Although many PPI databases with large numbers of labeled articles are available, incorporating these databases into the base training data may actually reduce classification performance since the supplementary databases may not annotate exactly the same PPI types as the base training data. Our first goal in this paper is to find a method of selecting likely positive data from such supplementary databases. Only extracting likely positive data, however, will bias the classification model unless sufficient negative data is also added. Unfortunately, negative data is very hard to obtain because there are no resources that compile such information. Therefore, our second aim is to select such negative data from unlabeled PubMed data. Thirdly, we explore how to exploit these likely positive and negative data. And lastly, we look at the somewhat unrelated question of which term-weighting scheme is most effective for identifying PPI-related articles.</p> <p>Results</p> <p>To evaluate the performance of our PPI text classifier, we conducted experiments based on the BioCreAtIvE-II IAS dataset. Our results show that adding likely-labeled data generally increases AUC by 3~6%, indicating better ranking ability. Our experiments also show that our newly-proposed term-weighting scheme has the highest AUC among all common weighting schemes. Our final model achieves an F-measure and AUC 2.9% and 5.0% higher than those of the top-ranking system in the IAS challenge.</p> <p>Conclusion</p> <p>Our experiments demonstrate the effectiveness of integrating unlabeled and likely labeled data to augment a PPI text classification system. Our mixed model is suitable for ranking purposes whereas our hierarchical model is better for filtering. In addition, our results indicate that supervised weighting schemes outperform unsupervised ones. Our newly-proposed weighting scheme, TFBRF, which considers documents that do not contain the target word, avoids some of the biases found in traditional weighting schemes. Our experiment results show TFBRF to be the most effective among several other top weighting schemes.</p
Length of sick leave – Why not ask the sick-listed? Sick-listed individuals predict their length of sick leave more accurately than professionals
BACKGROUND: The knowledge of factors accurately predicting the long lasting sick leaves is sparse, but information on medical condition is believed to be necessary to identify persons at risk. Based on the current practice, with identifying sick-listed individuals at risk of long-lasting sick leaves, the objectives of this study were to inquire the diagnostic accuracy of length of sick leaves predicted in the Norwegian National Insurance Offices, and to compare their predictions with the self-predictions of the sick-listed. METHODS: Based on medical certificates, two National Insurance medical consultants and two National Insurance officers predicted, at day 14, the length of sick leave in 993 consecutive cases of sick leave, resulting from musculoskeletal or mental disorders, in this 1-year follow-up study. Two months later they reassessed 322 cases based on extended medical certificates. Self-predictions were obtained in 152 sick-listed subjects when their sick leave passed 14 days. Diagnostic accuracy of the predictions was analysed by ROC area, sensitivity, specificity, likelihood ratio, and positive predictive value was included in the analyses of predictive validity. RESULTS: The sick-listed identified sick leave lasting 12 weeks or longer with an ROC area of 80.9% (95% CI 73.7–86.8), while the corresponding estimates for medical consultants and officers had ROC areas of 55.6% (95% CI 45.6–65.6%) and 56.0% (95% CI 46.6–65.4%), respectively. The predictions of sick-listed males were significantly better than those of female subjects, and older subjects predicted somewhat better than younger subjects. Neither formal medical competence, nor additional medical information, noticeably improved the diagnostic accuracy based on medical certificates. CONCLUSION: This study demonstrates that the accuracy of a prognosis based on medical documentation in sickness absence forms, is lower than that of one based on direct communication with the sick-listed themselves
Knockdown of SF-1 and RNF31 Affects Components of Steroidogenesis, TGFβ, and Wnt/β-catenin Signaling in Adrenocortical Carcinoma Cells
The orphan nuclear receptor Steroidogenic Factor-1 (SF-1, NR5A1) is a critical regulator of development and homeostasis of the adrenal cortex and gonads. We recently showed that a complex containing E3 ubiquitin ligase RNF31 and the known SF-1 corepressor DAX-1 (NR0B1) interacts with SF-1 on target promoters and represses transcription of steroidogenic acute regulatory protein (StAR) and aromatase (CYP19) genes. To further evaluate the role of SF-1 in the adrenal cortex and the involvement of RNF31 in SF-1-dependent pathways, we performed genome-wide gene-expression analysis of adrenocortical NCI-H295R cells where SF-1 or RNF31 had been knocked down using RNA interference. We find RNF31 to be deeply connected to cholesterol metabolism and steroid hormone synthesis, strengthening its role as an SF-1 coregulator. We also find intriguing evidence of negative crosstalk between SF-1 and both transforming growth factor (TGF) β and Wnt/β-catenin signaling. This crosstalk could be of importance for adrenogonadal development, maintenance of adrenocortical progenitor cells and the development of adrenocortical carcinoma. Finally, the SF-1 gene profile can be used to distinguish malignant from benign adrenocortical tumors, a finding that implicates SF-1 in the development of malignant adrenocortical carcinoma
Candidate gene prioritization by network analysis of differential expression using machine learning approaches
<p>Abstract</p> <p>Background</p> <p>Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.</p> <p>To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.</p> <p>Results</p> <p>We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.</p> <p>Conclusion</p> <p>In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p
Obesity and risk of pancreatic cancer among postmenopausal women: the Women's Health Initiative (United States)
A total of 138 503 women in the Women's Health Initiative in the United States were followed (for an average of 7.7 years) through 12 September 2005 to examine obesity, especially central obesity in relation to pancreatic cancer (n=251). Women in the highest quintile of waist-to-hip ratio had 70% (95% confidence interval 10–160%) excess risk of pancreatic cancer compared with women in the lowest quintile
Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data
Inference of protein functions is one of the most important aims of modern
biology. To fully exploit the large volumes of genomic data typically produced
in modern-day genomic experiments, automated computational methods for protein
function prediction are urgently needed. Established methods use sequence or
structure similarity to infer functions but those types of data do not suffice
to determine the biological context in which proteins act. Current
high-throughput biological experiments produce large amounts of data on the
interactions between proteins. Such data can be used to infer interaction
networks and to predict the biological process that the protein is involved in.
Here, we develop a probabilistic approach for protein function prediction using
network data, such as protein-protein interaction measurements. We take a
Bayesian approach to an existing Markov Random Field method by performing
simultaneous estimation of the model parameters and prediction of protein
functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to
more accurate parameter estimates and consequently to improved prediction
performance compared to the standard Markov Random Fields method. We tested our
method using a high quality S.cereviciae validation network
with 1622 proteins against 90 Gene Ontology terms of different levels of
abstraction. Compared to three other protein function prediction methods, our
approach shows very good prediction performance. Our method can be directly
applied to protein-protein interaction or coexpression networks, but also can be
extended to use multiple data sources. We apply our method to physical protein
interaction data from S. cerevisiae and provide novel
predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we
evaluate the predictions using the available literature
- …