78 research outputs found

    Stability of Ranked Gene Lists in Large Microarray Analysis Studies

    Get PDF
    This paper presents an empirical study that aims to explain the relationship between the number of samples and stability of different gene selection techniques for microarray datasets. Unlike other similar studies where number of genes in a ranked gene list is variable, this study uses an alternative approach where stability is observed at different number of samples that are used for gene selection. Three different metrics of stability, including a novel metric in bioinformatics, were used to estimate the stability of the ranked gene lists. Results of this study demonstrate that the univariate selection methods produce significantly more stable ranked gene lists than the multivariate selection methods used in this study. More specifically, thousands of samples are needed for these multivariate selection methods to achieve the same level of stability any given univariate selection method can achieve with only hundreds

    Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays

    Get PDF
    Background Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely different platforms for gene expression measurements. This study introduces a novel methodology for gene-ranking stability analysis that is applied to the evaluation of gene-ranking reproducibility on NGS and microarray data. Results The same data used in a well-known MicroArray Quality Control (MAQC) study was also used in this study to compare ranked lists of genes from MAQC samples A and B, obtained from Affymetrix HG-U133 Plus 2.0 and Roche 454 Genome Sequencer FLX platforms. An initial evaluation, where the percentage ofoverlapping genes was observed, demonstrates higher reproducibility on microarray data in 10 out of 11 gene-ranking methods. A gene set enrichment analysis shows similar enrichment of top gene sets when NGS is compared with microarrays on a pathway level. Our novel approach demonstrates high accuracy of decision trees when used for knowledge extraction from multiple bootstrapped gene set enrichment analysis runs. A comparison of the two approaches in sample preparation for high-throughput sequencing shows that alternating decision trees represent the optimal knowledge representation method in comparison with classical decision trees. Conclusions Usual reproducibility measurements are mostly based on statistical techniques that offer very limited biological insights into the studied gene expression data sets. This paper introduces the meta-learning-based gene set enrichment analysis that can be used to complement the analysis of gene-ranking stabilityestimation techniques such as percentage of overlapping genes or classic gene set enrichment analysis. It is useful and practical when reproducibility of gene ranking results or different gene selection techniquesis observed. The proposed method reveals very accurate descriptive models that capture the co-enrichment of gene sets which are differently enriched in the compared data sets

    R you ready? Using the R programme for statistical analysis and graphics

    Get PDF
    © 2019 Wiley Periodicals, Inc. For conducting research, nurses typically use commercial statistical packages. R software is a free, powerful, and flexible alternative, but is less familiar and used less frequently in nursing research. In this paper, we use data from a previous study to demonstrate a few typical steps in exploratory data analysis using R. A step-by-step description of some basic analyses in R is provided here, including examples of specific functions to read and manipulate the data, calculate scores from individual questionnaire items, and prepare a correlation plot and summary table

    Stability Selection using a Genetic Algorithm and Logistic Linear Regression on Healthcare Records

    Get PDF
    This paper presents a Genetic Algorithm (GA) application to measuring feature importance in machine learning (ML) from a large-scale database. Too many input features may cause over-fitting, therefore a feature selection is desirable. Some ML algorithms have feature selection embedded, e.g., lasso penalized linear regression or random forests. Others do not include such functionality and are sensitive to over-fitting, e.g., unregularized linear regression. The latter algorithms require that proper features are chosen before learning. Therefore, we propose a novel stability selection (SS) approach using GA-based feature selection. The proposed SS approach iteratively applies GA on a subsample of records and features. Each GA individual represents a binary vector of selected features in the subsample. An unregularized logistic linear regression model is then trained and tested using GA-selected features through cross-validation of the subsamples. GA fitness is evaluated by area under the curve (AUC) and optimized during a GA run. AUC is assessed with an unregularized logistic regression model on multiple-subsampled healthcare records, collected under the Healthcare Cost, and Utilization Project (HCUP), utilizing the National (Nationwide) Inpatient Sample (NIS) database. Reported results show that averaging feature importance from top-4 SS and the SS using GA (GASS), improves these AUC results

    Perceptions of caring between Slovene and Russian members of nursing teams

    Get PDF
    Purpose: To measure the perceptions of caring between Slovene and Russian members of nursing teams and compare the results with earlier findings in other European Union (EU) countries.Methods: A cross sectional study that included nurses and nursing assistants in Slovenia (n = 294) and Russia (n = 531). Data were collected using the 25-item Caring Dimensions Inventory.Results: The most endorsed item for Slovene and Russian members of nursing teams was an item related to medication administration. All items that were endorsed by Russian participants were also endorsed by Slovenian participants; however, they ascribed a different level of importance to individual aspects of caring. Discussion: Compared with other EU countries, such as the UK and Spain, Slovenian and Russian members of nursing teams endorsed more technical aspects of nursing duties as caring, suggesting cultural differences and previous influences of the biomedical model on nursing education and practice

    The KIDSCREEN-27 scale: translation and validation study of the Slovenian version

    Get PDF
    Background: There are many methods available for measuring social support and quality of life (QoL) of adolescents, of these, the KIDSCREEN tools are most widely used. Thus, we aimed to translate and validate the KIDSCREEN-27 scale for the usage among adolescents aged between 10 and 19 years old in Slovenia. Methods: A cross-sectional study was conducted among 2852 adolescents in primary and secondary school from November 2019 to January 2020 in Slovenia. 6-steps method of validation was used to test psychometric properties of the KIDSCREEN-27 scale. We checked descriptive statistics, performed a Mokken scale analysis, parametric item response theory, factor analysis, classical test theory and total (sub)scale scores. Results: All five subscales of the KIDSCREEN-27 formed a unidimensional scale with good homogeneity and reliability. The confirmatory factor analysis showed poor fit in user model versus baseline model metrics (CFI = 0.847TLI = 0.862) and good fit in root mean square error (RMSEA = 0.072p(χ2) < 0.001). A scale reliability was calculated using Cronbach\u27s α (0.93), beta (0.86), G6 (0.95) and omega (0.93). Conclusions: The questionnaire showed average psychometric properties and can be used among adolescents in Slovenia to find out about their quality of life. Further research is needed to explore why fit in user model metrics is poor
    corecore