Search CORE

170 research outputs found

A general coverage theory for shotgun DNA sequencing

Author: Wendl Michael C
Publication venue: Digital Commons@Becker
Publication date: 01/01/2006
Field of study

Algebraic correction methods for computational assessment of clone overlaps in DNA fingerprint mapping

Author: Wendl Michael C
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The Sulston score is a well-established, though approximate metric for probabilistically evaluating postulated clone overlaps in DNA fingerprint mapping. It is known to systematically over-predict match probabilities by various orders of magnitude, depending upon project-specific parameters. Although the exact probability distribution is also available for the comparison problem, it is rather difficult to compute and cannot be used directly in most cases. A methodology providing both improved accuracy and computational economy is required. Results We propose a straightforward algebraic correction procedure, which takes the Sulston score as a provisional value and applies a power-law equation to obtain an improved result. Numerical comparisons indicate dramatically increased accuracy over the range of parameters typical of traditional agarose fingerprint mapping. Issues with extrapolating the method into parameter ranges characteristic of newer capillary electrophoresis-based projects are also discussed. Conclusion Although only marginally more expensive to compute than the raw Sulston score, the correction provides a vastly improved probabilistic description of hypothesized clone overlaps. This will clearly be important in overlap assessment and perhaps for other tasks as well, for example in using the ranking of overlap probabilities to assist in clone ordering.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Characteristics of de novo structural changes in the human genome

Author: et al
Wendl Michael C.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

Digital Commons@Becker

The theory of discovering rare variants via DNA sequencing

Author: Wendl Michael C
Wilson Richard K
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to <it>ad hoc </it>mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. Results We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. Conclusion The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Statistical aspects of discerning indel-type structural variation via DNA sequence alignment

Author: Wendl Michael C
Wilson Richard K
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequencing reads. Quantitative aspects underlying this method remain poorly understood, despite its importance and conceptual simplicity. We report the statistical theory characterizing the length-discrepancy scheme for Gaussian libraries, including coverage-related effects that preceding models are unable to account for. Results Deletion and insertion statistics both depend heavily on physical coverage, but otherwise differ dramatically, refuting a commonly held doctrine of symmetry. Specifically, coverage restrictions render insertions much more difficult to capture. Increased read length has the counterintuitive effect of worsening insertion detection characteristics of short inserts. Variance in library insert length is also a critical factor here and should be minimized to the greatest degree possible. Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels. Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable. We also consider the proposition of characterizing variation over the entire spectrum of variant sizes under constant risk of false-positive errors. At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate. We show that a few modifications largely close this gap and we give a few examples of feasible spectrum-covering designs. Conclusion The theory resolves several outstanding issues and furnishes a general methodology for designing future projects from the standpoint of a spectrum-wide constant risk.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Extension of Lander-Waterman theory for sequencing filtered DNA libraries

Author: Barbazuk W Brad
Wendl Michael C
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The degree to which conventional DNA sequencing techniques will be successful for highly repetitive genomes is unclear. Investigators are therefore considering various filtering methods to select against high-copy sequence in DNA clone libraries. The standard model for random sequencing, Lander-Waterman theory, does not account for two important issues in such libraries, discontinuities and position-based sampling biases (the so-called "edge effect"). We report an extension of the theory for analyzing such configurations. RESULTS: The edge effect cannot be neglected in most cases. Specifically, rates of coverage and gap reduction are appreciably lower than those for conventional libraries, as predicted by standard theory. Performance decreases as read length increases relative to island size. Although opposite of what happens in a conventional library, this apparent paradox is readily explained in terms of the edge effect. The model agrees well with prototype gene-tagging experiments for Zea mays and Sorghum bicolor. Moreover, the associated density function suggests well-defined probabilistic milestones for the number of reads necessary to capture a given fraction of the gene space. An exception for applying standard theory arises if sequence redundancy is less than about 1-fold. Here, evolution of the random quantities is independent of library gaps and edge effects. This observation effectively validates the practice of using standard theory to estimate the genic enrichment of a library based on light shotgun sequencing. CONCLUSION: Coverage performance using a filtered library is significantly lower than that for an equivalent-sized conventional library, suggesting that directed methods may be more critical for the former. The proposed model should be useful for analyzing future projects

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

A General Coverage Theory for Shotgun DNA Sequencing

Author: Michael C. Wendl
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

PolyScan: An automatic indel and SNP detection approach to the analysis of human resequencing data

Author: Chen Ken
Ding Li
Kasai Yumi
McLellan Michael D.
Wendl Michael C.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2007
Field of study

Digital Commons@Becker

Algebraic Torsion in Contact Manifolds

Author: A. Weinstein
C. Wendl
C. Wendl
C. Wendl
C.H. Taubes
Chris Wendl
D. Salamon
D.L. Dragnev
D.T. Gay
F. Bourgeois
F. Bourgeois
F. Bourgeois
H. Geiges
H. Geiges
H. Hofer
H. Hofer
H. Hofer
H. Hofer
J.B. Etnyre
Janko Latschev
K. Cieliebak
M. Gromov
M. Hutchings
M. Hutchings
M. Hutchings
M. Hutchings
M.-L. Yau
Michael Hutchings
P. Albers
R. Lutz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/09/2011
Field of study

We extract a nonnegative integer-valued invariant, which we call the "order of algebraic torsion", from the Symplectic Field Theory of a closed contact manifold, and show that its finiteness gives obstructions to the existence of symplectic fillings and exact symplectic cobordisms. A contact manifold has algebraic torsion of order zero if and only if it is algebraically overtwisted (i.e. has trivial contact homology), and any contact 3-manifold with positive Giroux torsion has algebraic torsion of order one (though the converse is not true). We also construct examples for each nonnegative k of contact 3-manifolds that have algebraic torsion of order k but not k - 1, and derive consequences for contact surgeries on such manifolds. The appendix by Michael Hutchings gives an alternative proof of our cobordism obstructions in dimension three using a refinement of the contact invariant in Embedded Contact Homology.Comment: 53 pages, 4 figures, with an appendix by Michael Hutchings; v.3 is a final update to agree with the published paper, and also corrects a minor error that appeared in the published version of the appendi

arXiv.org e-Print Archive

Crossref

UCL Discovery

Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma

Author: Cao Song
et al.
Gillette Michael A
Govindan Ramaswamy
Li Yize
Liang Wen-Wei
Wendl Michael C
Publication venue: 'Elsevier BV'
Publication date: 09/07/2020
Field of study

To explore the biology of lung adenocarcinoma (LUAD) and identify new therapeutic opportunities, we performed comprehensive proteogenomic characterization of 110 tumors and 101 matched normal adjacent tissues (NATs) incorporating genomics, epigenomics, deep-scale proteomics, phosphoproteomics, and acetylproteomics. Multi-omics clustering revealed four subgroups defined by key driver mutations, country, and gender. Proteomic and phosphoproteomic data illuminated biology downstream of copy number aberrations, somatic mutations, and fusions and identified therapeutic vulnerabilities associated with driver events involving KRAS, EGFR, and ALK. Immune subtyping revealed a complex landscape, reinforced the association of STK11 with immune-cold behavior, and underscored a potential immunosuppressive role of neutrophil degranulation. Smoking-associated LUADs showed correlation with other environmental exposure signatures and a field effect in NATs. Matched NATs allowed identification of differentially expressed proteins with potential diagnostic and therapeutic utility. This proteogenomics dataset represents a unique public resource for researchers and clinicians seeking to better understand and treat lung adenocarcinomas

Digital Commons@Becker