5,075 research outputs found
Linear filtering reveals false negatives in species interaction data
Species interaction datasets, often represented as sparse matrices, are usually collected through observation studies targeted at identifying species interactions. Due to the extensive required sampling effort, species interaction datasets usually contain many false negatives, often leading to bias in derived descriptors. We show that a simple linear filter can be used to detect false negatives by scoring interactions based on the structure of the interaction matrices. On 180 different datasets of various sizes, sparsities and ecological interaction types, we found that on average in about 75% of the cases, a false negative interaction got a higher score than a true negative interaction. Furthermore, we show that this filter is very robust, even when the interaction matrix contains a very large number of false negatives. Our results demonstrate that unobserved interactions can be detected in species interaction datasets, even without resorting to information about the species involved
VIGAN: Missing View Imputation with Generative Adversarial Networks
In an era when big data are becoming the norm, there is less concern with the
quantity but more with the quality and completeness of the data. In many
disciplines, data are collected from heterogeneous sources, resulting in
multi-view or multi-modal datasets. The missing data problem has been
challenging to address in multi-view data analysis. Especially, when certain
samples miss an entire view of data, it creates the missing view problem.
Classic multiple imputations or matrix completion methods are hardly effective
here when no information can be based on in the specific view to impute data
for such samples. The commonly-used simple method of removing samples with a
missing view can dramatically reduce sample size, thus diminishing the
statistical power of a subsequent analysis. In this paper, we propose a novel
approach for view imputation via generative adversarial networks (GANs), which
we name by VIGAN. This approach first treats each view as a separate domain and
identifies domain-to-domain mappings via a GAN using randomly-sampled data from
each view, and then employs a multi-modal denoising autoencoder (DAE) to
reconstruct the missing view from the GAN outputs based on paired data across
the views. Then, by optimizing the GAN and DAE jointly, our model enables the
knowledge integration for domain mappings and view correspondences to
effectively recover the missing view. Empirical results on benchmark datasets
validate the VIGAN approach by comparing against the state of the art. The
evaluation of VIGAN in a genetic study of substance use disorders further
proves the effectiveness and usability of this approach in life science.Comment: 10 pages, 8 figures, conferenc
A systematic review of data quality issues in knowledge discovery tasks
Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
- …