4,183 research outputs found
MaxSSmap: A GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
Programs based on hash tables and Burrows-Wheeler are very fast for mapping
short reads to genomes but have low accuracy in the presence of mismatches and
gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm
but it can take hours and days to map millions of reads even for bacteria
genomes. We introduce a GPU program called MaxSSmap with the aim of achieving
comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most
programs MaxSSmap identifies a local region of the genome followed by exact
alignment. Instead of using hash tables or Burrows-Wheeler in the first part,
MaxSSmap calculates maximum scoring subsequence score between the read and
disjoint fragments of the genome in parallel on a GPU and selects the highest
scoring fragment for exact alignment. We evaluate MaxSSmap's accuracy and
runtime when mapping simulated Illumina E.coli and human chromosome one reads
of different lengths and 10\% to 30\% mismatches with gaps to the E.coli genome
and human chromosome one. We also demonstrate applications on real data by
mapping ancient horse DNA reads to modern genomes and unmapped paired reads
from NA12878 in 1000 genomes. We show that MaxSSmap attains comparable high
accuracy and low error to fast Smith-Waterman programs yet has much lower
runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap
with high accuracy and low error much faster than if Smith-Waterman were used.
On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower
accuracy compared to at higher lengths. On real data MaxSSmap produces many
alignments with high score and mapping quality that are not given by NextGenMap
and BWA. The MaxSSmap source code is freely available from
http://www.cs.njit.edu/usman/MaxSSmap
A Lagrangian for Hamiltonian vector fields on singular Poisson manifolds
On a manifold equipped with a bivector field, we introduce for every
Hamiltonian a Lagrangian on paths valued in the cotangent space whose
stationary points projects onto Hamiltonian vector fields. We show that the
remaining components of those stationary points tell whether the bivector field
is Poisson or at least defines an integrable distribution - a class of bivector
fields generalizing twisted Poisson structures that we study in detail.Comment: 27 page
A re-examination of the Salicornias (Amaranthaceae) of Saudi Arabia and their polymorphs
During the period from 1964 to 1999 Saudi Arabian species of Salicornia were wrongly treated under the European species, S. europaea L. Recent explorations proved that there are two separate allopatric species of Salicornia in Saudi Arabia, one inhabiting the inland salt-marshes of the Najd (highlands) and the other inhabiting the Arabian Gulf Coast (lowlands). Morphological, ecological and exploratory studies confirm that they are two distinct species. The two species differ in features of bark, axillary spikes, basal vegetative segment(s) of spike, fertile segments, colour of senescent plants, and flowering, fruiting and germination phenology. As both the species have been described earlier from Iran, they are now new records for Saudi Arabia. The species are, S. persica ssp. iranica (Akhani) Kadereit & Piirainen and S. sinus-persica Akhani. S. sinus-persica, of which the status was thought doubtful has been confirmed. Both the species have been described and illustrated. Each species comprises a number of polymorphs. As leaves and flowers are rudimentary, confusing species circumscriptions, a proliferation of binomials has resulted in the taxonomy of Salicornia. To mitigate such confusion, the full range of variability of the Saudi Arabian species has been documented
Development and evaluation of machine learning algorithms for biomedical applications
Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.
This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches
American Options Based on Malliavin Calculus and Nonparametric Variance Reduction Methods
This paper is devoted to pricing American options using Monte Carlo and the
Malliavin calculus. Unlike the majority of articles related to this topic, in
this work we will not use localization fonctions to reduce the variance. Our
method is based on expressing the conditional expectation E[f(St)/Ss] using the
Malliavin calculus without localization. Then the variance of the estimator of
E[f(St)/Ss] is reduced using closed formulas, techniques based on a
conditioning and a judicious choice of the number of simulated paths. Finally,
we perform the stopping times version of the dynamic programming algorithm to
decrease the bias. On the one hand, we will develop the Malliavin calculus
tools for exponential multi-dimensional diffusions that have deterministic and
no constant coefficients. On the other hand, we will detail various
nonparametric technics to reduce the variance. Moreover, we will test the
numerical efficiency of our method on a heterogeneous CPU/GPU multi-core
machine
Two series of polyhedral fundamental domains for Lorentz bi-quotients
The main aim of this paper is to give two infinite series of examples of
Lorentz space forms that can be obtained from Lorentz polyhedra by
identification of faces. These Lorentz space forms are bi-quotients of the form
, where
is a simply connected Lie group with the Lorentz metric given by the
Killing form, and are discrete subgroups of and
is cyclic. A construction of polyhedral fundamental domains for the
action of on via was given
in the earlier work of the second author. In this paper we give an explicit
description of the fundamental domains obtained by this construction for two
infinite series of groups. These results are connected to singularity theory as
the bi-quotients appear as links of certain
quasi-homogeneous -Gorenstein surface singularities, i.e.\ the
intersections of the singular variety with sufficiently small spheres around
the isolated singular point.Comment: 16 pages, 6 figures, 2 tables of figure
Developing an IS-impact decision tool: A literature based design science roadmap
This paper derives from research-in-progress intending both Design Research (DR) and Design Science (DS) outputs; the former a management decision tool based in IS-Impact (Gable et al. 2008) kernel theory; the latter being methodological learnings deriving from synthesis of the literature and reflection on the DR ‘case study’ experience. The paper introduces a generic, detailed and pragmatic DS ‘Research Roadmap’ or methodology, deriving at this stage primarily from synthesis and harmonization of relevant concepts identified through systematic archival analysis of related literature. The scope of the Roadmap too has been influenced by the parallel study aim to undertake DR applying and further evolving the Roadmap. The Roadmap is presented in attention to the dearth of detailed guidance available to novice Researchers in Design Science Research (DSR), and though preliminary, is expected to evolve and gradually be substantiated through experience of its application. A key distinction of the Roadmap from other DSR methods is its breadth of coverage of published DSR concepts and activities; its detail and scope. It represents a useful synthesis and integration of otherwise highly disparate DSR-related concepts
Determination of Seed Viability of Eight Wild Saudi Arabian Species by Germination and X-Ray Tests
Our purpose was to evaluate the usefulness of the germination vs. the X-ray test in determining the initial viability of seeds of eight wild species (Salvia spinosa, Salvia aegyptiaca, Ochradenus baccatus, Ochradenus arabicus, Suaeda aegyptiaca, Suaeda vermiculata, Prosopisfarcta and Panicumturgidum) from Saudi Arabia. Several days were required to determine viability of all eight species via germination tests, while immediate results on filled/viable seeds were obtained with the X-ray test. Seeds of all the species, except Sa.aegyptiaca, showed high viability in both the germination (98–70% at 25/15 °C, 93–66% at 35/25 °C) and X-ray (100–75%) test. Furthermore, there was general agreement between the germination (10% at 25/15 °C and 8% at 35/25 °C) and X-ray (5%) tests that seed viability of Sa.aegyptiaca was very low, and X-ray analysis revealed that this was due to poor embryo development. Seeds of P.farcta have physical dormancy, which was broken by scarification in concentrated sulfuric acid (10 min), and they exhibited high viability in both the germination (98% at 25/15 °C and 93% at 35/25 °C) and X-ray (98%) test. Most of the nongerminated seeds of the eight species except those of Sa.aegyptiaca were alive as judged by the tetrazolium test (TZ). Thus, for the eight species examined, the X-ray test was a good and rapid predictor of seed viability
- …