Search CORE

10,901 research outputs found

Current challenges in software solutions for mass spectrometry-based quantitative proteomics

Author: A Alexandridou
A Gruhler
A Leitner
A Michalski
A Panchaud
A Thompson
A Wolf-Yadlin
AD Polpitiya
AHP America
AI Nesvizhskii
AI Nesvizhskii
AI Nesvizhskii
AJR Heck
Albert J. R. Heck
AM Mayampurath
B Breukelen van
B Carrillo
B Ma
B Macek
B Schwanhäusser
B Zybailov
BAP Roxas
Bas van Breukelen
BO Keller
C Christin
C Ji
C Kumar
CH Becker
CK Frese
C–C Tsou
D Chelius
D Hoof Van
D MacDougall
D Tsur
D Valkenborg
DH Lundgren
DK Han
DL Swaney
DL Tabb
DL Tabb
DM Good
DN Perkins
E Deutsch
E Qeli
EL Hendrickson
G Audi
GL Finney
H Lam
H Lam
H Liu
H Steen
H Steen
I Beer
IP Shadforth
J Cox
J Cox
J Elias
J Gouw
J Grossmann
J Klimek
J Listgarten
J Meija
J Rappsilber
J Seidler
J Zhang
JC Silva
JF Kellie
JF Timms
JV Olsen
K Flikka
K Kultima
K Podwojski
KA Neilson
KC Hansen
KL Simpson
L Martens
L Ting
LF Waanders
LK Iwai
LN Mueller
M Bantscheff
M Bantscheff
M Bern
M Junqueira
M Kohl
M Mann
M Sandin
M Senko
M Unlü
M Vandenbogaert
MA Baldwin
MA Grobei
MA Kuzyk
MC Codrea
ME Belov
ME Sardiu
MH Elliott
MJ MacCoss
MM Savitski
MW Duncan
N Colaert
N Mischerikow
N Wang
NM Griffin
OA Mirgorodskaya
P Lu
P Mallick
P Mortensen
Pedro R. Cutillas
Peter R. Baker
PJ Boersema
PL Ross
PR Baker
PR Cutillas
PR Cutillas
R Aebersold
R Clarke
R Matthiesen
R Matthiesen
R Purves
R Usaite
R Zhang
RA Bradshaw
RD Smith
RE Moore
RJ Chalkley
RJ Chalkley
RJ Jacob
S Cappadona
S Cappadona
S Carr
S Dasari
S Houel
S Julka
S Ong
S-E Ong
SA Beausoleil
SA Gerber
Salvatore Cappadona
SJ Callister
SK Park
SP Gygi
SY Ow
T Shinkawa
TM Annesley
TS Collier
TT Aye
V Faca
V Lange
VG Tusher
VP Andreev
W Weiss
W Yan
W Zhu
WM Old
WX Schulze
X Yang
Y Ishihama
Y Oda
YJ Kim
Z Khan
Z Khan
Z-Q Ma
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

Crossref

Springer - Publisher Connector

Queen Mary Research Online

Clustering by compression

Author: Cilibrasi Rudi
Vitanyi Paul
Publication venue
Publication date: 09/04/2004
Field of study

We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data

Author: A Christoffels
A Kalyanaraman
A Masoudi-Nejad
B Lee
C Wei
D Karolchik
E Eyras
E Kim
ER Mardis
Ernesto Picardi
Flavio Mignone
G Pertea
G Pesole
GD Schuler
Graziano Pesole
J Burke
J Forment
J Harrow
J Kleffe
J Kleffe
J Parkinson
JP Wang
L Florea
M Arumugam
M de la Bastide
M Stanke
MB Gerstein
MS Boguski
R Apweiler
RT Miller
S Djebali
S Hazelhurst
SF Altschul
SH Nagaraj
SH Nagaraj
T Castrignano
TD Wu
WJ Kent
X Huang
Y Lee
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed <it>loci</it>. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an <it>ad hoc </it>genomic mapping. Methods EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site. Results The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human <it>HOXA </it>gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome-based web service tools such as ASmodeler and BIPASS. EasyCluster has also been used to provide a first compilation of gene-oriented clusters in the <it>Ricinus communis </it>oilseed plant for which no Unigene clusters are yet available, as well as an evaluation of the alternative splicing in this plant species.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Bari

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström Oscar
Publication venue: 'Uppsala University'
Publication date: 01/01/2013
Field of study

Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language

Publikationer från Uppsala Universitet

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

Author: Lee Byunghan
Moon Taesup
Weissman Tsachy
Yoon Sungroh
Publication venue
Publication date: 01/01/2017
Field of study

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare