11 research outputs found
Novel classification for global gene signature model for predicting severity of systemic sclerosis
<div><p>Progression of systemic scleroderma (SSc), a chronic connective tissue disease that causes a fibrotic phenotype, is highly heterogeneous amongst patients and difficult to accurately diagnose. To meet this clinical need, we developed a novel three-layer classification model, which analyses gene expression profiles from SSc skin biopsies to diagnose SSc severity. Two SSc skin biopsy microarray datasets were obtained from Gene Expression Omnibus. The skin scores obtained from the original papers were used to further categorize the data into subgroups of low (<18) and high (≥18) severity. Data was pre-processed for normalization, background correction, centering and scaling. A two-layered cross-validation scheme was employed to objectively evaluate the performance of classification models of unobserved data. Three classification models were used: support vector machine, random forest, and naive Bayes in combination with feature selection methods to improve performance accuracy. For both input datasets, random forest classifier combined with correlation-based feature selection (CFS) method and naive Bayes combined with CFS or support vector machine based recursive feature elimination method yielded the best results. Additionally, we performed a principal component analysis to show that low and high severity groups are readily separable by gene expression signatures. Ultimately, we found that our novel classification prediction model produced global gene signatures that significantly correlated with skin scores. This study represents the first report comparing the performance of various classification prediction models for gene signatures from SSc patients, using current clinical diagnostic factors. In summary, our three-classification model system is a powerful tool for elucidating gene signatures from SSc skin biopsies and can also be used to develop a prognostic gene signature for SSc and other fibrotic disorders.</p></div
Microarray probe IDs associated with the top 10 highest absolute value of loading values for principal component 1 based on principal component analysis of genes identified by Chi-squared feature selection.
<p>Microarray probe IDs associated with the top 10 highest absolute value of loading values for principal component 1 based on principal component analysis of genes identified by Chi-squared feature selection.</p
Summary of patient information for microarray biopsy samples used in models.
<p>Summary of patient information for microarray biopsy samples used in models.</p
Principal component analyses (PCA) of gene expression separation between low and high severity groups.
<p>Results based on CFS feature selection method are shown in A (Dataset 1) and B (Dataset 2). Results based on SVM-RFE feature selection method are shown in C (Dataset 1) and D (Dataset 2).</p
Performance evaluation of various classifier and feature selection methods.
<p>Performance evaluation of various classifier and feature selection methods.</p
Heat map showing Log2 normalized expression values for patient samples from Dataset 2 for probe IDs identified by CFS feature selection method.
<p>Heat map showing Log2 normalized expression values for patient samples from Dataset 2 for probe IDs identified by CFS feature selection method.</p
Predicted signaling network between OSM and downstream genes related to SSc severity.
<p>Red shading of gene indicates upregulation in dataset compared to low severity patients, green shading downregulation, and intensity of color depicts strength of regulation. Relationships between genes that are predicted based on literature are indicated by lines connecting genes, with red symbolizing predicted upregulation and blue predicted downregulation.</p
Heat map showing Log2 normalized expression values for patient samples from Dataset 1 for probe IDs identified by CFS feature selection method.
<p>Heat map showing Log2 normalized expression values for patient samples from Dataset 1 for probe IDs identified by CFS feature selection method.</p
Relative mRNA expression of TNFα analyzed by RT-qPCR in peripheral blood mononuclear cells (PBMC) stimulated with LPS in subjects with protective rs1329060-rs1329057-rs5030717 CTA or risk TCG <i>TLR4</i> haplotype.
<p>The statistical difference between groups was assessed using the Mann-Whitney U-test. The expression level is shown as relative units calculated by the ddCT-method. Boxes indicate interquartile range (25%–75%) with the horizontal bar within the box indicating the median. Whiskers show minimum and maximum values, o = outlier.</p
Comparison of the <i>TLR4</i> variant rs13209060 association to otitis media.
<p>The size of the circle has been scaled to show the relative sizes of each dataset (N = number of cases in case-control datasets or N = twice the number of probands in family-based analysis). Horizontal lines have been scaled to show the 95% confidence intervals (95% CI) of the odds ratios of the T-allele, the position of the circles on the x-axis shows the odds ratio for that dataset. For simplicity and to highlight the result with the strongest signal in the Finnish population, we only plotted results of the association of rs13209060-T with otitis media; additional data can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0132551#pone.0132551.s009" target="_blank">S6 Table</a>.</p