Search CORE

4,504 research outputs found

A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

Author: Martin Donald E. K.
Noé Laurent
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2014
Field of study

Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

arXiv.org e-Print Archive

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

PubMed Central

Designing seeds for similarity search in genomic DNA

Author: Buhler Jeremy
Keich Uri
Sun Yanni
Publication venue: Published by Elsevier Inc.
Publication date: 31/05/2005
Field of study

AbstractLarge-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or “seed’’ of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging.This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good choices. Finally, we describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice

Elsevier - Publisher Connector

Improved hit criteria for DNA local alignment

Author: Kucherov Gregory
Noé Laurent
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The hit criterion is a key component of heuristic local alignment algorithms. It specifies a class of patterns assumed to witness a potential similarity, and this choice is decisive for the selectivity and sensitivity of the whole method. RESULTS: In this paper, we propose two ways to improve the hit criterion. First, we define the group criterion combining the advantages of the single-seed and double-seed approaches used in existing algorithms. Second, we introduce transition-constrained seeds that extend spaced seeds by the possibility of distinguishing transition and transversion mismatches. We provide analytical data as well as experimental results, obtained with the YASS software, supporting both improvements. CONCLUSIONS: Proposed algorithmic ideas allow to obtain a significant gain in sensitivity of similarity search without increase in execution time. The method has been implemented in YASS software available at

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

Invasion speeds for structured populations in fluctuating environments

Author: AK Sakai
C Tebaldi
CC Heyde
CC Horvitz
CC Horvitz
CC Horvitz
CR Carroll
CS Elton
CS Kolar
CV Haridas
D Schemske
DR Easterling
E Jongejans
H Caswell
H Caswell
H Caswell
H Jacquemyn
HF Weinberger
HF Weinberger
HF Weinberger
HG Andrewartha
J Fieberg
JF Silva
L Arnold
M Kot
Maureen E. Ryan
MC Runge
MG Neubert
MG Neubert
MS Boyce
NC Ellstrand
PL Chesson
PL Chesson
PM Vitousek
R Durrett
R Hengeveld
RA Fisher
RA Horn
RC Lewontin
RM May
RN Mack
S Tuljapurkar
S Tuljapurkar
S Tuljapurkar
S Tuljapurkar
S Tuljapurkar
SD Tuljapurkar
Sebastian J. Schreiber
T Camino-Beck de
WF Morris
YC Collingham
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/06/2010
Field of study

We live in a time where climate models predict future increases in environmental variability and biological invasions are becoming increasingly frequent. A key to developing effective responses to biological invasions in increasingly variable environments will be estimates of their rates of spatial spread and the associated uncertainty of these estimates. Using stochastic, stage-structured, integro-difference equation models, we show analytically that invasion speeds are asymptotically normally distributed with a variance that decreases in time. We apply our methods to a simple juvenile-adult model with stochastic variation in reproduction and an illustrative example with published data for the perennial herb, \emph{Calathea ovandensis}. These examples buttressed by additional analysis reveal that increased variability in vital rates simultaneously slow down invasions yet generate greater uncertainty about rates of spatial spread. Moreover, while temporal autocorrelations in vital rates inflate variability in invasion speeds, the effect of these autocorrelations on the average invasion speed can be positive or negative depending on life history traits and how well vital rates ``remember'' the past

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Designing Efficient Spaced Seeds for SOLiD Read Mapping

Author: Gîrdea Marta
Kucherov Gregory
Noé Laurent
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2010
Field of study

The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.We propose a rigorous and flexible algorithmic solution to mapping SOLiD color-space reads to a reference genome. The solution relies on an advanced method of seed design that uses a faithful probabilistic model of read matches and, on the other hand, a novel seeding principle especially adapted to read mapping. Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors. We illustrate our approach by several seed designs and demonstrate their efficiency

CiteSeerX

HAL - Lille 3

Crossref

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

Seed Ecology and Regeneration Process to Inform Seed-Based Wetland Restoration

Author: Tarsa Emily E.
Publication venue: DigitalCommons@USU
Publication date: 01/12/2022
Field of study

Wetlands provide immense value to wildlife and humans but have been degrading rapidly around the world. One major challenge is the loss of native plant species in wetlands, which limits the ability of wetlands to function as they should. Restoring wetlands requires a combination of removing the cause of degradation (such as invasive plant species) and, in many cases, actively returning native plants to the site especially via seeding. Further, early plant life stages are the most vulnerable for plants and is often the time in which sown species die and fail to establish. Thus, understanding how and why seeds die or survive across species and environmental conditions can provide guidance for seed-based wetland restoration. Here, we sought to answer these important knowledge gaps through a series of greenhouse and lab experiments. First, we sought to answer what native sowing rate was needed to maximize native plant performance across a gradient of invasive species seed density, environmental conditions, and timing of seed addition. Separately, we performed a lab and growth chamber experiment in which we measured important characteristics about seeds and seedlings (grown in different environmental conditions) to better understand (and ultimately predict) why some species do well and in what conditions that can occur. Finally, in a separate greenhouse experiment, we grew native and invasive wetland plants for eight-weeks and tracked whether seeds germinated, survived, or died in order to quantify plant transitions through these early life stages. We also assessed ‘end-of-season’ percent cover and the rate of clonal production to gauge how early stages of plant growth contributes to invasion resistance. We found native plant establishment increased with higher native sowing densities, especially when native seeds were sown early in the season. However, the biggest driver in plant community composition following seeding was the density of invasive Phragmites australis seeds in the soil. Low water levels yielded higher native plant performance and more effectively suppressed P. australis growth. We also identified characteristics of seeds and seedlings that explained their germination and early growth patterns—species that had light seeds with thin seed coats and shallow seed dormancy had faster time to germination and higher growth rates, while species with heavy seeds had thick seed coats, deep seed dormancy, slower germination, and higher resource allocation to plant structures. Finally, we found that high-water levels enhanced the probability of seed germination, and that high temperatures lead to higher clonal development in seedlings. Overall, Phragmites australis was a superior performer is early life stages, but Distichlis spicata performed well due to high germination probabilities and Eleocharis palustris performed well due to extensive clonal production. As seed-based wetland restoration becomes increasingly necessary, the findings from this dissertation provide guidance on which native species should be used, where seeds should be sourced, and what environmental conditions should be targeted to maximize native plant establishment and restore wetland functions

DigitalCommons@USU

Efficient Node Proximity and Node Significance Computations in Graphs

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. The other problem is that their effectiveness is less when information for a graph is uncertain. For an uncertain graph, they require exponential computation costs to calculate ranking scores with considering all possible worlds. In this thesis, I first introduce Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) which is an approximate personalized PageRank calculating node rankings for the locality information for seeds without calculating the entire graph and reusing the precomputed locality information for different locality combinations. For the identification of locality information, I present Impact Neighborhood Indexing (INI) to find impact neighborhoods with nodes' fingerprints propagation on the network. For the accuracy challenge, I introduce Degree Decoupled PageRank (D2PR) technique to improve the effectiveness of PageRank based knowledge discovery, especially considering the significance of neighbors and degree of a given node. To tackle the uncertain challenge, I introduce Uncertain Personalized PageRank (UPPR) to approximately compute personalized PageRank values on uncertainties of edge existence and Interval Personalized PageRank with Integration (IPPR-I) and Interval Personalized PageRank with Mean (IPPR-M) to compute ranking scores for the case when uncertainty exists on edge weights as interval values.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Predicting epidemic risk from past temporal contact data

Author: Colizza Vittoria
Giovannini Armando
Palma Diana
Poletto Chiara
Savini Lara
Valdano Eugenio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 12/03/2015
Field of study

Understanding how epidemics spread in a system is a crucial step to prevent and control outbreaks, with broad implications on the system's functioning, health, and associated costs. This can be achieved by identifying the elements at higher risk of infection and implementing targeted surveillance and control measures. One important ingredient to consider is the pattern of disease-transmission contacts among the elements, however lack of data or delays in providing updated records may hinder its use, especially for time-varying patterns. Here we explore to what extent it is possible to use past temporal data of a system's pattern of contacts to predict the risk of infection of its elements during an emerging outbreak, in absence of updated data. We focus on two real-world temporal systems; a livestock displacements trade network among animal holdings, and a network of sexual encounters in high-end prostitution. We define the node's loyalty as a local measure of its tendency to maintain contacts with the same elements over time, and uncover important non-trivial correlations with the node's epidemic risk. We show that a risk assessment analysis incorporating this knowledge and based on past structural and temporal pattern properties provides accurate predictions for both systems. Its generalizability is tested by introducing a theoretical model for generating synthetic temporal networks. High accuracy of our predictions is recovered across different settings, while the amount of possible predictions is system-specific. The proposed method can provide crucial information for the setup of targeted intervention strategies.Comment: 24 pages, 5 figures + SI (18 pages, 15 figures

arXiv.org e-Print Archive

Directory of Open Access Journals

HAL-Inserm

PubMed Central

FigShare