Search CORE

150 research outputs found

Efficient exact motif discovery

Author: Ettwiller
Fratkin
Li
Lladser
Pavesi
Reinert
S. Rahmann
Sandve
Sandve
Sinha
T. Marschall
Tompa
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-defined formalizations, the problem is frequently solved in an ad hoc manner with heuristics that do not guarantee to find the best motif

CiteSeerX

Crossref

PubMed Central

Sequential Monte Carlo multiple testing

Author: Barski
BESAG
E. Ferkingstad
G. K. Sandve
Goecks
McPherson
North
Pounds
S. Nygard
Sandve
SCHWEDER
Seaman
Shendure
Wang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: In molecular biology, as in many other scientific fields, the scale of analyses is ever increasing. Often, complex Monte Carlo simulation is required, sometimes within a large-scale multiple testing setting. The resulting computational costs may be prohibitively high

CiteSeerX

Crossref

PubMed Central

Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Author: Greiff Victor
Hajj Ghadi S. Al
Pavlović Milena
Pensar Johan
Sandve Geir Kjetil
Sollid Ludvig M.
Wood Mollie
Publication venue
Publication date: 20/04/2022
Field of study

Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we argue that a causal perspective improves the identification of these challenges, and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs). We discuss how the main biological and experimental factors of the AIRR domain may influence the learned biomarkers and provide easily adjustable simulations of such effects. In conclusion, we find that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables and by guiding the adjustment of the relations and variables that vary between populations

arXiv.org e-Print Archive

Vitamin D receptor ChIP-seq in primary CD4+ cells: relationship to serum 25-hydroxyvitamin D levels and autoimmune disease

Author: A Sandelin
A Sanyal
Adam E Handel
AE Handel
Antonio J Berlanga-Taylor
AP Boyle
B Langmead
B Lehmann
BE Bernstein
C Carlberg
CE Grant
CS Ross-Innes
CY McLean
D Berglund
E Wingender
F Birzele
Finn Drabløs
G Pavesi
Gavin Giovannoni
Geir K Sandve
George C Ebers
Giulio Disanto
Giuseppe Gallone
GK Sandve
Heather Hanwell
IV Kulakovskiy
J Orgaz-Molina
J-C Souberbielle
JHA Martens
K Li
KL Munger
LA Hindorff
LL Issa
M Ashburner
M Caliskan
M Lutz
M Thomas-Chollier
MA Kriegel
MD Shirley
ML McCullough
NU Rashid
O Weth
PA Fujita
PA Marshall
R Salehi-Tabar
RM Tolón
S Gundersen
S Heikkinen
Sreeram V Ramagopalan
SV Ramagopalan
T Liu
TA Owen
TL Bailey
TL Bailey
TL Bailey
Y Zhang
Y-C Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

PMCID: PMC3710212This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Crossref

Springer - Publisher Connector

PubMed Central

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Queen Mary Research Online

NORA - Norwegian Open Research Archives

The Chromosome-Level Genome Assembly of European Grayling Reveals Aspects of a Unique Genome Evolution Process Within Salmonids

Author: Guiguen Y
Guyomard R
Lien S
Papakostas S
Primmer CR
Sandve SR
Sävilammi T
Varadharajan S
Vollestad LA
Publication venue: 'Genetics Society of America'
Publication date: 27/10/2022
Field of study

Salmonids represent an intriguing taxonomical group for investigating genome evolution in vertebrates due to their relatively recent last common whole genome duplication event, which occurred between 80 and 100 million years ago. Here, we report on the chromosome-level genome assembly of European grayling (Thymallus thymallus), which represents one of the earliest diverged salmonid subfamilies. To achieve this, we first generated relatively long genomic scaffolds by using a previously published draft genome assembly along with long-read sequencing data and a linkage map. We then merged those scaffolds by applying synteny evidence from the Atlantic salmon (Salmo salar) genome. Comparisons of the European grayling genome assembly to the genomes of Atlantic salmon and Northern pike (Esox lucius), the latter used as a nonduplicated outgroup, detailed aspects of the characteristic chromosome evolution process that has taken place in European grayling. While Atlantic salmon and other salmonid genomes are portrayed by the typical occurrence of numerous chromosomal fusions, European grayling chromosomes were confirmed to be fusion-free and were characterized by a relatively large proportion of paracentric and pericentric inversions. We further reported on transposable elements specific to either the European grayling or Atlantic salmon genome, on the male-specific sdY gene in the European grayling chromosome 11A, and on regions under residual tetrasomy in the homeologous European grayling chromosome pairs 9A-9B and 25A-25B. The same chromosome pairs have been observed under residual tetrasomy in Atlantic salmon and in other salmonids, suggesting that this feature has been conserved since the subfamily split

UTUPub

Bayesian Centroid Estimation for Motif Discovery

Author: A Dempster
A Neuwald
B Webb-Robertson
C Lawrence
C Lawrence
C Murrea
D GuhaThakurta
E Xing
F Roth
G Pavesi
G Sandve
G Stormo
G Thijs
J Besag
J Gower
J Hu
J Liu
K MacIsaac
L Carvalho
L Newberg
Luis Carvalho
M Barbieri
M Régnier
M Tompa
MA Lones
Matteo G. A. Paris
S Geman
T Bailey
W Thompson
Y Ding
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2012
Field of study

Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We present a Bayesian model that is an extended version of the model adopted by the Gibbs motif sampler, and propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the maximum a posteriori estimator.Comment: 24 pages, 9 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

FITBAR: a web tool for the robust prediction of prokaryotic regulons

Author: AY Mitrophanov
CT Brown
E Soupene
FP Roth
G Condemine
G Pavesi
GD Stormo
GK Sandve
GK Sandve
H Huang
HC Wang
J Oberto
JA Swets
Jacques Oberto
K Klepper
K Quandt
M Djordjevic
M Fourment
M Thomas-Chollier
M Tompa
MK Das
PS Novichkov
R Durbin
R Munch
R Staden
S El Qaidi
S El Qaidi
S Yellaboina
TD Schneider
TD Schneider
TL Bailey
W Wei
X Liu
Y Barash
Y Zhao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The binding of regulatory proteins to their specific DNA targets determines the accurate expression of the neighboring genes. The <it>in silico </it>prediction of new binding sites in completely sequenced genomes is a key aspect in the deeper understanding of gene regulatory networks. Several algorithms have been described to discriminate against false-positives in the prediction of new binding targets; however none of them has been implemented so far to assist the detection of binding sites at the genomic scale. Results FITBAR (Fast Investigation Tool for Bacterial and Archaeal Regulons) is a web service designed to identify new protein binding sites on fully sequenced prokaryotic genomes. This tool consists in a workbench where the significance of the predictions can be compared using different statistical methods, a feature not found in existing resources. The Local Markov Model and the Compound Importance Sampling algorithms have been implemented to compute the P-value of newly discovered binding sites. In addition, FITBAR provides two optimized genomic scanning algorithms using either log-odds or entropy-weighted position-specific scoring matrices. Other significant features include the production of a detailed genomic context map for each detected binding site and the export of the search results in spreadsheet and portable document formats. FITBAR discovery of a high affinity <it>Escherichia coli </it>NagC binding site was validated experimentally <it>in vitro </it>as well as <it>in vivo </it>and published. Conclusions FITBAR was developed in order to allow fast, accurate and statistically robust predictions of prokaryotic regulons. This feature constitutes the main advantage of this web tool over other matrix search programs and does not impair its performance. The web service is available at <url>http://archaea.u-psud.fr/fitbar</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

Hal-Diderot

Immunologic Profiling of the Atlantic Salmon Gill by Single Nuclei Transcriptomics

Author: Hazlerigg David G.
Ince Louise M.
Iversen Marianne
Jørgensen Even H.
Loudon Andrew S. I.
Martin Samuel A. M.
Mizoro Yasutaka
Nome Torfinn
Sandve Simen Rød
West Alex C.
Wood Shona H.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

ACKNOWLEDGMENTS The authors thank all of the animal staff at Kårvik havbruksstasjonen for their expert care of the research animals, and the University of Manchester Genomics Technology core facility (UK) for performing chromium 10x library preparation for snRNAseq. We also thanks the reviewers for their constructive comments on the original manuscript FUNDING AW is supported by the Tromsø forskningsstiftelse (TFS) grant awarded to DH (TFS2016DH). The Sentinel North Transdisciplinary Research Program Université Laval and UiT awarded to DH supports this work. SW is supported a grant from the Tromsø forskningsstiftelse (TFS) starter grant TFS2016SW. Experimental costs were covered by HFSP grant “Evolution of seasonal timers” RGP0030/2015 awarded to AL and DH. Storage resources were provided by the Norwegian National Infrastructure for Research Data (NIRD, project NS9055K).Peer reviewedPublisher PD

Brage NMBU

Aberdeen University Research

Open Repository and Bibliography - Liège

Munin - Open Research Archive

The University of Manchester - Institutional Repository

NORA - Norwegian Open Research Archives

Genomic Regions Associated with Multiple Sclerosis Are Active in B Cells

Author: A Bar-Or
Antonio J. Berlanga-Taylor
B Barun
Christoph Kleinschnitz
D Franciotta
E Birney
F Sellebjerg
FG Joseph
Gavin Giovannoni
Geir Kjetil Sandve
Giulio Disanto
GK Sandve
HC von Budingen
J Brettschneider
J Brettschneider
J Ernst
Julia M. Morahan
L Piccio
LH Kasper
M Khademi
MS Freedman
OW Howell
PL De Jager
R Schneider
Ruth Dobson
S Sawcer
SL Hauser
Sreeram V. Ramagopalan
SV Ramagopalan
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

More than 50 genomic regions have now been shown to influence the risk of multiple sclerosis (MS). However, the mechanisms of action, and the cell types in which these associated variants act at the molecular level remain largely unknown. This is especially true for associated regions containing no known genes. Given the evidence for a role for B cells in MS, we hypothesized that MS associated genomic regions co-localized with regions which are functionally active in B cells. We used publicly available data on 1) MS associated regions and single nucleotide polymorphisms (SNPs) and 2) chromatin profiling in B cells as well as three additional cell types thought to be unrelated to MS (hepatocytes, fibroblasts and keratinocytes). Genomic intervals and SNPs were tested for overlap using the Genomic Hyperbrowser. We found that MS associated regions are significantly enriched in strong enhancer, active promoter and strong transcribed regions (p = 0.00005) and that this overlap is significantly higher in B cells than control cells. In addition, MS associated SNPs also land in active promoter (p = 0.00005) and enhancer regions more than expected by chance (strong enhancer p = 0.0006; weak enhancer p = 0.00005). These results confirm the important role of the immune system and specifically B cells in MS and suggest that MS risk variants exert a gene regulatory role. Previous studies assessing MS risk variants in T cells may be missing important effects in B cells. Similar analyses in other immunological cell types relevant to MS and functional studies are necessary to fully elucidate how genes contribute to MS pathogenesis

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Queen Mary Research Online

FigShare

FunClust: a web server for the identification of structural motifs in a set of non-homologous protein structures

Author: A Henschel
A Stark
A Via
AC Wallace
AD Hill
Allegra Via
Anna Tramontano
CT Porter
F Ferre
G Ausiello
G Ausiello
Gabriele Ausiello
GK Sandve
GR Stockwell
HM Berman
KA Denessiouk
LH Greene
M Novotny
M Shatsky
M Shatsky
Manuela Helmer-Citterich
N Hulo
P Puntervoll
P. Marcatili
Paolo Marcatili
PF Gherardini
Pier Federico Gherardini
S Jones
SL Moodie
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

The occurrence of very similar structural motifs brought about by different parts of non homologous proteins is often indicative of a common function. Indeed, relatively small local structures can mediate binding to a common partner, be it a protein, a nucleic acid, a cofactor or a substrate. While it is relatively easy to identify short amino acid or nucleotide sequence motifs in a given set of proteins or genes, and many methods do exist for this purpose, much more challenging is the identification of common local substructures, especially if they are formed by non consecutive residues in the sequence

Crossref

Springer - Publisher Connector

PubMed Central

ART