Search CORE

32 research outputs found

An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation

Author: Bingham E.
Brooks M.
Daniel Lemire
Edelsbrunner H.
Fitzgerald W.
Goldberger A. L.
Haiminen N.
Han J.
Lemire D.
Lemire D.
Lemire D.
Martin Brooks
Ramsay J. O.
Yuhong Yan
Publication venue: 'Informa UK Limited'
Publication date: 23/02/2007
Field of study

Monotonicity is a simple yet significant qualitative characteristic. We consider the problem of segmenting a sequence in up to K segments. We want segments to be as monotonic as possible and to alternate signs. We propose a quality metric for this problem using the l_inf norm, and we present an optimal linear time algorithm based on novel formalism. Moreover, given a precomputation in time O(n log n) consisting of a labeling of all extrema, we compute any optimal segmentation in constant time. We compare experimentally its performance to two piecewise linear segmentation heuristics (top-down and bottom-up). We show that our algorithm is faster and more accurate. Applications include pattern recognition and qualitative modeling.Comment: This is the extended version of our ICDM'05 paper (arXiv:cs/0702142

arXiv.org e-Print Archive

R-libre

NRC Publications Archive

Crossref

Metagenomics: A viable tool for reconstructing herbivore diet

Author: Andrews S.
Borchtchevski V.
Gayot M.
Haiminen N.
Hicks A. L.
Krueger F.
Paula D. P.
Pegard A.
Picozzi N.
Soininen E. M.
Srivathsan A.
Srivathsan A.
Wegge P.
Publication venue: Wiley
Publication date: 01/01/2021
Field of study

Metagenomics can generate data on the diet of herbivores, without the need for primer selection and PCR enrichment steps as is necessary in metabarcoding. Metagenomic approaches to diet analysis have remained relatively unexplored, requiring validation of bioinformatic steps. Currently, no metagenomic herbivore diet studies have utilized both chloroplast and nuclear markers as reference sequences for plant identification, which would increase the number of reads that could be taxonomically informative. Here, we explore how in silico simulation of metagenomic data sets resembling sequences obtained from faecal samples can be used to validate taxonomic assignment. Using a known list of sequences to create simulated data sets, we derived reliable identification parameters for taxonomic assignments of sequences. We applied these parameters to characterize the diet of western capercaillies (Tetrao urogallus) located in Norway, and compared the results with metabarcoding trnL P6 loop data generated from the same samples. Both methods performed similarly in the number of plant taxa identified (metagenomics 42 taxa, metabarcoding 43 taxa), with no significant difference in species resolution (metagenomics 24%, metabarcoding 23%). We further observed that while metagenomics was strongly affected by the age of faecal samples, with fresh samples outperforming old samples, metabarcoding was not affected by sample age. On the other hand, metagenomics allowed us to simultaneously obtain the mitochondrial genome of the western capercaillies, thereby providing additional ecological information. Our study demonstrates the potential of utilizing metagenomics for diet reconstruction but also highlights key considerations as compared to metabarcoding for future utilization of this technique

Crossref

ZENODO

PubMed Central

Copenhagen University Research Information System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

Author: Andrey Rzhetsky
CS Keith
D Hernandez
David N. Kuhn
DM Church
DR Zerbino
F Sanger
G Narzisi
Isidore Rigoutsos
J Shendure
JA Reinhardt
JC Dohm
JR Miller
JR Miller
JT Simpson
K Mavromatis
Laxmi Parida
MJ Chaisson
ML Metzker
Niina Haiminen
R Blakesley
R Cronn
R Li
R Li
S Altschul
S DiGuistini
S Gnerre
S Gnerre
S Ossowski
S Rounsley
SL Salzberg
W Zhang
WR Jeck
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences

Author: Carrieri AP
Gardiner LJ
Grimshaw S
Hadjidoukas P
Haiminen N
Hawkins S
Hoptroff M
Kenny JG
MacGuire-Flanagan A
Maudsley-Barton S
Mayes AE
Murphy B
Parida L
Paterson S
Pyzer-Knapp EO
Rowe WPM
Shand C
Tazzioli J
Winn M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Alterations in the human microbiome have been observed in a variety of conditions such as asthma, gingivitis, dermatitis and cancer, and much remains to be learned about the links between the microbiome and human health. The fusion of artificial intelligence with rich microbiome datasets can offer an improved understanding of the microbiome’s role in human health. To gain actionable insights it is essential to consider both the predictive power and the transparency of the models by providing explanations for the predictions. We combine the collection of leg skin microbiome samples from two healthy cohorts of women with the application of an explainable artificial intelligence (EAI) approach that provides accurate predictions of phenotypes with explanations. The explanations are expressed in terms of variations in the relative abundance of key microbes that drive the predictions. We predict skin hydration, subject's age, pre/post-menopausal status and smoking status from the leg skin microbiome. The changes in microbial composition linked to skin hydration can accelerate the development of personalized treatments for healthy skin, while those associated with age may offer insights into the skin aging process. The leg microbiome signatures associated with smoking and menopausal status are consistent with previous findings from oral/respiratory tract microbiomes and vaginal/gut microbiomes respectively. This suggests that easily accessible microbiome samples could be used to investigate health-related phenotypes, offering potential for non-invasive diagnosis and condition monitoring. Our EAI approach sets the stage for new work focused on understanding the complex relationships between microbial communities and phenotypes. Our approach can be applied to predict any condition from microbiome samples and has the potential to accelerate the development of microbiome-based personalized therapeutics and non-invasive diagnostics

University of Liverpool Repository

E-space: Manchester Metropolitan University's Research Repository

ePubs: the open archive for STFC research publications

Deciphering Heterogeneity in Pig Genome Assembly Sscrofa9 by Isochore and Isochore-Like Region Analyses

Author: A Benecke
A Eyre-Walker
A Nekrutenko
AE Vinogradov
C Federico
C Gautier
C Melodelima
CT Zhang
CT Zhang
D Smedley
Deli Zhang
E Elhaik
F Gao
FB Guo
G Bernardi
G Bernardi
G Kudla
G Marais
G Sabeur
GY Sofronov
HY Ou
J Jurka
JA Nickoloff
Jingfei Huang
JJ Yunis
JL Chojnowski
JL Oliver
JL Oliver
JP Thiery
Jörg Hoheisel
L Duret
L Duret
Li Dai
LL Chen
M Costantini
M Costantini
M Costantini
M Semon
MJ Lercher
N Galtier
N Galtier
N Haiminen
P Carpena
P Fearnhead
P Gottipati
P Soriano
Pengfang Zhou
R Versteeg
R Zhang
R Zhang
S Gonzalez-Barrera
T Schmidt
TC Brown
W Li
Wenchao Lin
Wenqian Zhang
Wenwu Wu
Yang Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: The isochore, a large DNA sequence with relatively small GC variance, is one of the most important structures in eukaryotic genomes. Although the isochore has been widely studied in humans and other species, little is known about its distribution in pigs. Principal Findings: In this paper, we construct a map of long homogeneous genome regions (LHGRs), i.e., isochores and isochore-like regions, in pigs to provide an intuitive version of GC heterogeneity in each chromosome. The LHGR pattern study not only quantifies heterogeneities, but also reveals some primary characteristics of the chromatin organization, including the followings: (1) the majority of LHGRs belong to GC-poor families and are in long length; (2) a high gene density tends to occur with the appearance of GC-rich LHGRs; and (3) the density of LINE repeats decreases with an increase in the GC content of LHGRs. Furthermore, a portion of LHGRs with particular GC ranges (50%–51 % and 54%–55%) tend to have abnormally high gene densities, suggesting that biased gene conversion (BGC), as well as time- and energy-saving principles, could be of importance to the formation of genome organization. Conclusion: This study significantly improves our knowledge of chromatin organization in the pig genome. Correlations between the different biological features (e.g., gene density and repeat density) and GC content of LHGRs provide a uniqu

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing

Author: A Bankevich
A Desai
A Gurevich
AP Masella
AS Mikheyev
Cheng-Hsun Chiu
Cheng-Yang Lee
Chi-Ching Lee
CS Chin
D Hernandez
D Sims
DR Kelley
DR Zerbino
G Benson
H Li
J Butler
J Shendure
J Zhang
JA Reinhardt
JR Miller
MJ Chaisson
MJ Chaisson
MT Tammi
N Haiminen
N Whiteford
NJ Loman
PA Pevzner
Petrus Tang
Po-Jung Huang
R Li
R Luo
RC McCoy
Ruei-Chi Gan
S Koren
S Koren
T Tatusova
Timothy H. Wu
Ting-Wen Chen
W Zhang
Wei-Chao Liao
Y Peng
Yi-Feng Chang
Yi-Ywan M. Chen
YY Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref