88 research outputs found

    Redundancy in Genotyping Arrays

    Get PDF
    Despite their unprecedented density, current SNP genotyping arrays contain large amounts of redundancy, with up to 40 oligonucleotide features used to query each SNP. By using publicly available reference genotype data from the International HapMap, we show that 93.6% sensitivity at <5% false positive rate can be obtained with only four probes per SNP, compared with 98.3% with the full data set. Removal of this redundancy will allow for more comprehensive whole-genome association studies with increased SNP density and larger sample sizes

    Applications of microarray technology in breast cancer research

    Get PDF
    Microarrays provide a versatile platform for utilizing information from the Human Genome Project to benefit human health. This article reviews the ways in which microarray technology may be used in breast cancer research. Its diverse applications include monitoring chromosome gains and losses, tumour classification, drug discovery and development, DNA resequencing, mutation detection and investigating the mechanism of tumour development

    A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip® whole-genome resequencing platform

    Get PDF
    DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip® array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%

    BRCA1 mutations and other sequence variants in a population-based sample of Australian women with breast cancer

    Get PDF
    The frequency, in women with breast cancer, of mutations and other variants in the susceptibility gene, BRCA1, was investigated using a population-based case–control-family study. Cases were women living in Melbourne or Sydney, Australia, with histologically confirmed, first primary, invasive breast cancer, diagnosed before the age of 40 years, recorded on the state Cancer Registries. Controls were women without breast cancer, frequency-matched for age, randomly selected from electoral rolls. Full manual sequencing of the coding region of BRCA1 was conducted in a randomly stratified sample of 91 cases; 47 with, and 44 without, a family history of breast cancer in a first- or second-degree relative. All detected variants were tested in a random sample of 67 controls. Three cases with a (protein-truncating) mutation were detected. Only one case had a family history; her mother had breast cancer, but did not carry the mutation. The proportion of Australian women with breast cancer before age 40 who carry a germline mutation in BRCA1 was estimated to be 3.8% (95% Cl 0.3–12.6%). Seven rare variants were also detected, but for none was there evidence of a strong effect on breast cancer susceptibility. Therefore, on a population basis, rare variants are likely to contribute little to breast cancer incidence. © 1999 Cancer Research Campaig

    Application of Broad-Spectrum, Sequence-Based Pathogen Identification in an Urban Population

    Get PDF
    A broad spectrum detection platform that provides sequence level resolution of target regions would have a significant impact in public health, case management, and means of expanding our understanding of the etiology of diseases. A previously developed respiratory pathogen microarray (RPM v.1) demonstrated the capability of this platform for this purpose. This newly developed RPM v.1 was used to analyze 424 well-characterized nasal wash specimens from patients presenting with febrile respiratory illness in the Washington, D. C. metropolitan region. For each specimen, the RPM v.1 results were compared against composite reference assay (viral and bacterial culture and, where appropriate, RT-PCR/PCR) results. Across this panel, the RPM assay showed ≥98% overall agreement for all the organisms detected compared with reference methods. Additionally, the RPM v.1 results provide sequence information which allowed phylogenetic classification of circulating influenza A viruses in ∼250 clinical specimens, and allowed monitoring the genetic variation as well as antigenic variability prediction. Multiple pathogens (2–4) were detected in 58 specimens (13.7%) with notably increased abundances of respiratory colonizers (esp. S. pneumoniae) during viral infection. This first-ever comparison of a broad-spectrum viral and bacterial identification technology of this type against a large battery of conventional “gold standard” assays confirms the utility of the approach for both medical surveillance and investigations of complex etiologies of illness caused by respiratory co-infections

    Efficient algorithms for reconstructing gene content by co-evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In a previous study we demonstrated that co-evolutionary information can be utilized for improving the accuracy of ancestral gene content reconstruction. To this end, we defined a new computational problem, the Ancestral Co-Evolutionary (ACE) problem, and developed algorithms for solving it.</p> <p>Results</p> <p>In the current paper we generalize our previous study in various ways. First, we describe new efficient computational approaches for solving the ACE problem. The new approaches are based on reductions to classical methods such as linear programming relaxation, quadratic programming, and min-cut. Second, we report new computational hardness results related to the ACE, including practical cases where it can be solved in polynomial time.</p> <p>Third, we generalize the ACE problem and demonstrate how our approach can be used for inferring parts of the genomes of <it>non-ancestral</it> organisms. To this end, we describe a heuristic for finding the portion of the genome ('dominant set’) that can be used to reconstruct the rest of the genome with the lowest error rate. This heuristic utilizes both evolutionary information and co-evolutionary information.</p> <p>We implemented these algorithms on a large input of the ACE problem (95 unicellular organisms, 4,873 protein families, and 10, 576 of co-evolutionary relations), demonstrating that some of these algorithms can outperform the algorithm used in our previous study. In addition, we show that based on our approach a ’dominant set’ cab be used reconstruct a major fraction of a genome (up to 79%) with relatively low error-rate (<it>e.g.</it> 0.11). We find that the ’dominant set’ tends to include metabolic and regulatory genes, with high evolutionary rate, and low protein abundance and number of protein-protein interactions.</p> <p>Conclusions</p> <p>The <it>ACE</it> problem can be efficiently extended for inferring the genomes of organisms that exist today. In addition, it may be solved in polynomial time in many practical cases. Metabolic and regulatory genes were found to be the most important groups of genes necessary for reconstructing gene content of an organism based on other related genomes.</p

    Disease-associated alleles in genome-wide association studies are enriched for derived low frequency alleles relative to HapMap and neutral expectations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies give insight into the genetic basis of common diseases. An open question is whether the allele frequency distributions and ancestral vs. derived states of disease-associated alleles differ from the rest of the genome. Characteristics of disease-associated alleles can be used to increase the yield of future studies.</p> <p>Methods</p> <p>The set of all common disease-associated alleles found in genome-wide association studies prior to January 2010 was analyzed and compared with HapMap and theoretical null expectations. In addition, allele frequency distributions of different disease classes were assessed. Ages of HapMap and disease-associated alleles were also estimated.</p> <p>Results</p> <p>The allele frequency distribution of HapMap alleles was qualitatively similar to neutral expectations. However, disease-associated alleles were more likely to be low frequency derived alleles relative to null expectations. 43.7% of disease-associated alleles were ancestral alleles. The mean frequency of disease-associated alleles was less than randomly chosen CEU HapMap alleles (0.394 vs. 0.610, after accounting for probability of detection). Similar patterns were observed for the subset of disease-associated alleles that have been verified in multiple studies. SNPs implicated in genome-wide association studies were enriched for young SNPs compared to randomly selected HapMap loci. Odds ratios of disease-associated alleles tended to be less than 1.5 and varied by frequency, confirming previous studies.</p> <p>Conclusions</p> <p>Alleles associated with genetic disease differ from randomly selected HapMap alleles and neutral expectations. The evolutionary history of alleles (frequency and ancestral vs. derived state) influences whether they are implicated in genome-wide assocation studies.</p

    EPMA position paper in cancer: current overview and future perspectives

    Get PDF
    corecore