145,865 research outputs found

    Pairwise gene GO-based measures for biclustering of high-dimensional expression data

    Get PDF
    Background: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. Results: The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. Conclusions: It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.Ministerio de Economía y Competitividad TIN2014-55894-C2-

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    The validity of smartphone data and its relationship to clinical symptomatology and brain biology: an exploratory analysis

    Full text link
    BACKGROUND: Presently, there is very little research on the clinical validity of mental health smartphone application data, its relationship to brain biology, and its ability to inform clinical decisions. This paper seeks to explore these relationships within a sample of schizophrenic patients through the analysis of data collected on the mental health smartphone application Biewe. OBJECTIVES: To validate mental health smartphone applications and support their potential to augment clinical practice. METHODS: The application involved a series of 21 questions from several questionnaires including Patient Health Questionnaire-8 (PHQ-8), Generalized Anxiety Disorder-7 (GAD-7), Warning Signals Scale (WSS), Pittsburgh Sleep Quality Index, and the psychosis subscale of the Mini Mental State Examination. Data was collected over a period of 3 months, and patients attended a total of 4 clinic visits during this timeframe. Seven study participants also had brain scan data available from the BSNIP, PARDIP and Biceps studies currently in progress at MMHC which has been used for analysis. The structural MPRAGE T1 scans were processed using Free Surfer 6 in which thickness and volume measures were extracted. All statistical analyses on the data were carried out using R statistics software. RESULTS: Clinic and application responses within the same week were not significantly different from each other. The application answers, however, appeared to be more sensitive to structural abnormalities in the brain. Symptoms defined as a lack of normal emotional responses (i.e. negative symptoms of schizophrenia) were negatively correlated to home time and positively correlated to distance travelled, which was a counterintuitive result. CONCLUSIONS: The results show that mobile monitoring has the potential to be a valid and reliable method of data collection and that it may be able to augment clinical decision making

    The Dark UNiverse Explorer (DUNE): Proposal to ESA's Cosmic Vision

    Full text link
    The Dark UNiverse Explorer (DUNE) is a wide-field space imager whose primary goal is the study of dark energy and dark matter with unprecedented precision. For this purpose, DUNE is optimised for the measurement of weak gravitational lensing but will also provide complementary measurements of baryonic accoustic oscillations, cluster counts and the Integrated Sachs Wolfe effect. Immediate auxiliary goals concern the evolution of galaxies, to be studied with unequalled statistical power, the detailed structure of the Milky Way and nearby galaxies, and the demographics of Earth-mass planets. DUNE is an Medium-class mission which makes use of readily available components, heritage from other missions, and synergy with ground based facilities to minimise cost and risks. The payload consists of a 1.2m telescope with a combined visible/NIR field-of-view of 1 deg^2. DUNE will carry out an all-sky survey, ranging from 550 to 1600nm, in one visible and three NIR bands which will form a unique legacy for astronomy. DUNE will yield major advances in a broad range of fields in astrophysics including fundamental cosmology, galaxy evolution, and extrasolar planet search. DUNE was recently selected by ESA as one of the mission concepts to be studied in its Cosmic Vision programme.Comment: Accepted in Experimental Astronom

    Heterogeneity of Research Results: A New Perspective From Which to Assess and Promote Progress in Psychological Science

    Get PDF
    Heterogeneity emerges when multiple close or conceptual replications on the same subject produce results that vary more than expected from the sampling error. Here we argue that unexplained heterogeneity reflects a lack of coherence between the concepts applied and data observed and therefore a lack of understanding of the subject matter. Typical levels of heterogeneity thus offer a useful but neglected perspective on the levels of understanding achieved in psychological science. Focusing on continuous outcome variables, we surveyed heterogeneity in 150 meta-analyses from cognitive, organizational, and social psychology and 57 multiple close replications. Heterogeneity proved to be very high in meta-analyses, with powerful moderators being conspicuously absent. Population effects in the average meta-analysis vary from small to very large for reasons that are typically not understood. In contrast, heterogeneity was moderate in close replications. A newly identified relationship between heterogeneity and effect size allowed us to make predictions about expected heterogeneity levels. We discuss important implications for the formulation and evaluation of theories in psychology. On the basis of insights from the history and philosophy of science, we argue that the reduction of heterogeneity is important for progress in psychology and its practical applications, and we suggest changes to our collective research practice toward this end

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
    corecore