145,865 research outputs found
Pairwise gene GO-based measures for biclustering of high-dimensional expression data
Background: Biclustering algorithms search for groups of genes that share the same
behavior under a subset of samples in gene expression data. Nowadays, the biological
knowledge available in public repositories can be used to drive these algorithms to
find biclusters composed of groups of genes functionally coherent. On the other hand,
a distance among genes can be defined according to their information stored in Gene
Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each
pair of genes which establishes their functional similarity. A scatter search-based
algorithm that optimizes a merit function that integrates GO information is studied in
this paper. This merit function uses a term that addresses the information through a GO
measure.
Results: The effect of two possible different gene pairwise GO measures on the
performance of the algorithm is analyzed. Firstly, three well known yeast datasets with
approximately one thousand of genes are studied. Secondly, a group of human
datasets related to clinical data of cancer is also explored by the algorithm. Most of
these data are high-dimensional datasets composed of a huge number of genes. The
resultant biclusters reveal groups of genes linked by a same functionality when the
search procedure is driven by one of the proposed GO measures. Furthermore, a
qualitative biological study of a group of biclusters show their relevance from a cancer
disease perspective.
Conclusions: It can be concluded that the integration of biological information
improves the performance of the biclustering process. The two different GO measures
studied show an improvement in the results obtained for the yeast dataset. However, if
datasets are composed of a huge number of genes, only one of them really improves
the algorithm performance. This second case constitutes a clear option to explore
interesting datasets from a clinical point of view.Ministerio de Economía y Competitividad TIN2014-55894-C2-
A systematic review of data quality issues in knowledge discovery tasks
Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
The validity of smartphone data and its relationship to clinical symptomatology and brain biology: an exploratory analysis
BACKGROUND: Presently, there is very little research on the clinical validity of mental health smartphone application data, its relationship to brain biology, and its ability to inform clinical decisions. This paper seeks to explore these relationships within a sample of schizophrenic patients through the analysis of data collected on the mental health smartphone application Biewe.
OBJECTIVES: To validate mental health smartphone applications and support their potential to augment clinical practice.
METHODS: The application involved a series of 21 questions from several questionnaires including Patient Health Questionnaire-8 (PHQ-8), Generalized Anxiety Disorder-7 (GAD-7), Warning Signals Scale (WSS), Pittsburgh Sleep Quality Index, and the psychosis subscale of the Mini Mental State Examination. Data was collected over a period of 3 months, and patients attended a total of 4 clinic visits during this timeframe. Seven study participants also had brain scan data available from the BSNIP, PARDIP and Biceps studies currently in progress at MMHC which has been used for analysis. The structural MPRAGE T1 scans were processed using Free Surfer 6 in which thickness and volume measures were extracted. All statistical analyses on the data were carried out using R statistics software.
RESULTS: Clinic and application responses within the same week were not significantly different from each other. The application answers, however, appeared to be more sensitive to structural abnormalities in the brain. Symptoms defined as a lack of normal emotional responses (i.e. negative symptoms of schizophrenia) were negatively correlated to home time and positively correlated to distance travelled, which was a counterintuitive result.
CONCLUSIONS: The results show that mobile monitoring has the potential to be a valid and reliable method of data collection and that it may be able to augment clinical decision making
The Dark UNiverse Explorer (DUNE): Proposal to ESA's Cosmic Vision
The Dark UNiverse Explorer (DUNE) is a wide-field space imager whose primary
goal is the study of dark energy and dark matter with unprecedented precision.
For this purpose, DUNE is optimised for the measurement of weak gravitational
lensing but will also provide complementary measurements of baryonic accoustic
oscillations, cluster counts and the Integrated Sachs Wolfe effect. Immediate
auxiliary goals concern the evolution of galaxies, to be studied with
unequalled statistical power, the detailed structure of the Milky Way and
nearby galaxies, and the demographics of Earth-mass planets. DUNE is an
Medium-class mission which makes use of readily available components, heritage
from other missions, and synergy with ground based facilities to minimise cost
and risks. The payload consists of a 1.2m telescope with a combined visible/NIR
field-of-view of 1 deg^2. DUNE will carry out an all-sky survey, ranging from
550 to 1600nm, in one visible and three NIR bands which will form a unique
legacy for astronomy. DUNE will yield major advances in a broad range of fields
in astrophysics including fundamental cosmology, galaxy evolution, and
extrasolar planet search. DUNE was recently selected by ESA as one of the
mission concepts to be studied in its Cosmic Vision programme.Comment: Accepted in Experimental Astronom
Heterogeneity of Research Results: A New Perspective From Which to Assess and Promote Progress in Psychological Science
Heterogeneity emerges when multiple close or conceptual replications on the same subject produce results that vary more than expected from the sampling error. Here we argue that unexplained heterogeneity reflects a lack of coherence between the concepts applied and data observed and therefore a lack of understanding of the subject matter. Typical levels of heterogeneity thus offer a useful but neglected perspective on the levels of understanding achieved in psychological science. Focusing on continuous outcome variables, we surveyed heterogeneity in 150 meta-analyses from cognitive, organizational, and social psychology and 57 multiple close replications. Heterogeneity proved to be very high in meta-analyses, with powerful moderators being conspicuously absent. Population effects in the average meta-analysis vary from small to very large for reasons that are typically not understood. In contrast, heterogeneity was moderate in close replications. A newly identified relationship between heterogeneity and effect size allowed us to make predictions about expected heterogeneity levels. We discuss important implications for the formulation and evaluation of theories in psychology. On the basis of insights from the history and philosophy of science, we argue that the reduction of heterogeneity is important for progress in psychology and its practical applications, and we suggest changes to our collective research practice toward this end
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
- …