15,415 research outputs found
Detection of recombination in DNA multiple alignments with hidden markov models
CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected
Radar-based Road User Classification and Novelty Detection with Recurrent Neural Network Ensembles
Radar-based road user classification is an important yet still challenging
task towards autonomous driving applications. The resolution of conventional
automotive radar sensors results in a sparse data representation which is tough
to recover by subsequent signal processing. In this article, classifier
ensembles originating from a one-vs-one binarization paradigm are enriched by
one-vs-all correction classifiers. They are utilized to efficiently classify
individual traffic participants and also identify hidden object classes which
have not been presented to the classifiers during training. For each classifier
of the ensemble an individual feature set is determined from a total set of 98
features. Thereby, the overall classification performance can be improved when
compared to previous methods and, additionally, novel classes can be identified
much more accurately. Furthermore, the proposed structure allows to give new
insights in the importance of features for the recognition of individual
classes which is crucial for the development of new algorithms and sensor
requirements.Comment: 8 pages, 9 figures, accepted paper for 2019 IEEE Intelligent Vehicles
Symposium (IV), Paris, France, June 201
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Info-Greedy sequential adaptive compressed sensing
We present an information-theoretic framework for sequential adaptive
compressed sensing, Info-Greedy Sensing, where measurements are chosen to
maximize the extracted information conditioned on the previous measurements. We
show that the widely used bisection approach is Info-Greedy for a family of
-sparse signals by connecting compressed sensing and blackbox complexity of
sequential query algorithms, and present Info-Greedy algorithms for Gaussian
and Gaussian Mixture Model (GMM) signals, as well as ways to design sparse
Info-Greedy measurements. Numerical examples demonstrate the good performance
of the proposed algorithms using simulated and real data: Info-Greedy Sensing
shows significant improvement over random projection for signals with sparse
and low-rank covariance matrices, and adaptivity brings robustness when there
is a mismatch between the assumed and the true distributions.Comment: Preliminary results presented at Allerton Conference 2014. To appear
in IEEE Journal Selected Topics on Signal Processin
An HMM-based Comparative Genomic Framework for Detecting Introgression in Eukaryotes
One outcome of interspecific hybridization and subsequent effects of
evolutionary forces is introgression, which is the integration of genetic
material from one species into the genome of an individual in another species.
The evolution of several groups of eukaryotic species has involved
hybridization, and cases of adaptation through introgression have been already
established. In this work, we report on a new comparative genomic framework for
detecting introgression in genomes, called PhyloNet-HMM, which combines
phylogenetic networks, that capture reticulate evolutionary relationships among
genomes, with hidden Markov models (HMMs), that capture dependencies within
genomes. A novel aspect of our work is that it also accounts for incomplete
lineage sorting and dependence across loci.
Application of our model to variation data from chromosome 7 in the mouse
(Mus musculus domesticus) genome detects a recently reported adaptive
introgression event involving the rodent poison resistance gene Vkorc1, in
addition to other newly detected introgression regions. Based on our analysis,
it is estimated that about 12% of all sites withinchromosome 7 are of
introgressive origin (these cover about 18 Mbp of chromosome 7, and over 300
genes). Further, our model detects no introgression in two negative control
data sets. Our work provides a powerful framework for systematic analysis of
introgression while simultaneously accounting for dependence across sites,
point mutations, recombination, and ancestral polymorphism
- …