56,128 research outputs found
Classification via sequential testing
The problem of generating the sequence of tests required to reach a diagnostic conclusion with minimum average cost, which is also known as test sequencing problem, is considered. The test sequencing problem is formulated as an optimal binary AND/OR decision tree construction problem, whose solution is known to be NP-complete. The problem can be solved optimally using dynamic programming or AND/OR graph search methods (AO*, CF, and HS). However, for large systems, the associated computational effort with dynamic programming or AND/OR graph search methods is substantial, due to the rapidly increasing number of nodes in AND/OR search graph. In order to prevent the computational explosion, one-step or multistep lookahead heuristic algorithms have been developed to solve the test sequencing problem. Our approach is based on integrating concepts from the one-step lookahead heuristic algorithms and the strategies used in Huffman coding. The effectiveness of the algorithms is demonstrated on several test cases. The traditional test sequencing problem is generalized here to include asymmetrical tests. Our approach to test sequencing can be adapted to solve a wide variety of binary identification problems arising in decision table programming, medical diagnosis, database query processing, quality assurance, and pattern recognition
Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing
We propose a flexible change-point model for inhomogeneous Poisson Processes,
which arise naturally from next-generation DNA sequencing, and derive score and
generalized likelihood statistics for shifts in intensity functions. We
construct a modified Bayesian information criterion (mBIC) to guide model
selection, and point-wise approximate Bayesian confidence intervals for
assessing the confidence in the segmentation. The model is applied to DNA Copy
Number profiling with sequencing data and evaluated on simulated spike-in and
real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Quantitative Comparison of Abundance Structures of Generalized Communities: From B-Cell Receptor Repertoires to Microbiomes
The \emph{community}, the assemblage of organisms co-existing in a given
space and time, has the potential to become one of the unifying concepts of
biology, especially with the advent of high-throughput sequencing experiments
that reveal genetic diversity exhaustively. In this spirit we show that a tool
from community ecology, the Rank Abundance Distribution (RAD), can be turned by
the new MaxRank normalization method into a generic, expressive descriptor for
quantitative comparison of communities in many areas of biology. To illustrate
the versatility of the method, we analyze RADs from various \emph{generalized
communities}, i.e.\ assemblages of genetically diverse cells or organisms,
including human B cells, gut microbiomes under antibiotic treatment and of
different ages and countries of origin, and other human and environmental
microbial communities. We show that normalized RADs enable novel quantitative
approaches that help to understand structures and dynamics of complex
generalize communities
- …