1,474 research outputs found
Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature
Understanding functional gene relationships is a challenging problem for biological applications. High-throughput technologies such as DNA microarrays have inundated biologists with a wealth of information, however, processing that information remains problematic. To help with this problem, researchers have begun applying text mining techniques to the biological literature. This work extends previous work based on Latent Semantic Indexing (LSI) by examining Nonnegative Matrix Factorization (NMF). Whereas LSI incorporates the singular value decomposition (SVD) to approximate data in a dense, mixed-sign space, NMF produces a parts-based factorization that is directly interpretable. This space can, in theory, be used to augment existing ontologies and annotations by identifying themes within the literature. Of course, performing NMF does not come without a price—namely, the large number of parameters. This work attempts to analyze the effects of some of the NMF parameters on both convergence and labeling accuracy. Since there is a dearth of automated label evaluation techniques as well as “gold standard” hierarchies, a method to produce “correct” trees is proposed as well as a technique to label trees and to evaluate those labels
Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO)
Understanding functional gene relationships is a major challenge in bioninformatics and computational biology. Currently, many approaches extract gene relationships via term co-occurrence models from the biomedical literature. Unfortunately, however, many genes that are experimentally identified to be related have not been previously studied together. As a result, many automated models fail to help researchers understand the nature of the relationships. In this work, the particular schema used tomine genomic data is called LatentSemantic Indexing (LSI). LSI performs a singular-value decomposition (SVD) to produce a low-rank approximation of the data set. Effectively, it allows queries to be interpreted in a more concept-based space and can allow for gene relationships to be discovered that would ordinarily be overlooked by other models
From splashing to bouncing: the influence of viscosity on the impact of suspension droplets on a solid surface
We experimentally investigated the splashing of dense suspension droplets
impacting a solid surface, extending prior work to the regime where the
viscosity of the suspending liquid becomes a significant parameter. The overall
behavior can be described by a combination of two trends. The first one is that
the splashing becomes favored when the kinetic energy of individual particles
at the surface of a droplet overcomes the confinement produced by surface
tension. This is expressed by a particle-based Weber number . The second
is that splashing is suppressed by increasing the viscosity of the solvent.
This is expressed by the Stokes number , which influences the effective
coefficient of restitution of colliding particles. We developed a phase diagram
where the splashing onset is delineated as a function of both and .
A surprising result occurs at very small Stokes number, where not only
splashing is suppressed but also plastic deformation of the droplet. This leads
to a situation where droplets can bounce back after impact, an observation we
are able to reproduce using discrete particle numerical simulations that take
into account viscous interaction between particles and elastic energy
Gene Tree Labeling Using Nonnegative Matrix Factorization on Biomedical Literature
Identifying functional groups of genes is a challenging problem for biological applications.
Text mining approaches can be used to build hierarchical clusters or trees from the information in the biological literature. In particular, the nonnegative matrix factorization (NMF) is examined as one approach to label hierarchical trees. A generic labeling algorithm as well as an evaluation technique is proposed, and the effects of different NMF parameters with regard to convergence and labeling accuracy are discussed. The primary goals of this study are to provide a qualitative assessment of the NMF and its various parameters and initialization, to provide an automated way to classify biomedical data, and to provide a method for evaluating labeled data assuming a static input tree. As a byproduct, a method for generating gold standard trees is proposed
- …