120,829 research outputs found

    A generalized least-squares framework for rare-variant analysis in family data.

    Get PDF
    Rare variants may, in part, explain some of the hereditability missing in current genome-wide association studies. Many gene-based rare-variant analysis approaches proposed in recent years are aimed at population-based samples, although analysis strategies for family-based samples are clearly warranted since the family-based design has the potential to enhance our ability to enrich for rare causal variants. We have recently developed the generalized least squares, sequence kernel association test, or GLS-SKAT, approach for the rare-variant analyses in family samples, in which the kinship matrix that was computed from the high dimension genetic data was used to decorrelate the family structure. We then applied the SKAT-O approach for gene-/region-based inference in the decorrelated data. In this study, we applied this GLS-SKAT method to the systolic blood pressure data in the simulated family sample distributed by the Genetic Analysis Workshop 18. We compared the GLS-SKAT approach to the rare-variant analysis approach implemented in family-based association test-v1 and demonstrated that the GLS-SKAT approach provides superior power and good control of type I error rate

    Gene-based genome-wide association studies and meta-analyses of conotruncal heart defects.

    Get PDF
    Conotruncal heart defects (CTDs) are among the most common and severe groups of congenital heart defects. Despite evidence of an inherited genetic contribution to CTDs, little is known about the specific genes that contribute to the development of CTDs. We performed gene-based genome-wide analyses using microarray-genotyped and imputed common and rare variants data from two large studies of CTDs in the United States. We performed two case-parent trio analyses (N = 640 and 317 trios), using an extension of the family-based multi-marker association test, and two case-control analyses (N = 482 and 406 patients and comparable numbers of controls), using a sequence kernel association test. We also undertook two meta-analyses to combine the results from the analyses that used the same approach (i.e. family-based or case-control). To our knowledge, these analyses are the first reported gene-based, genome-wide association studies of CTDs. Based on our findings, we propose eight CTD candidate genes (ARF5, EIF4E, KPNA1, MAP4K3, MBNL1, NCAPG, NDFUS1 and PSMG3). Four of these genes (ARF5, KPNA1, NDUFS1 and PSMG3) have not been previously associated with normal or abnormal heart development. In addition, our analyses provide additional evidence that genes involved in chromatin-modification and in ribonucleic acid splicing are associated with congenital heart defects

    Exome-wide association study of pancreatic cancer risk

    Get PDF
    We conducted a case-control exome-wide association study to discover germline variants in coding regions that affect risk for pancreatic cancer, combining data from 5 studies. We analyzed exome and genome sequencing data from 437 patients with pancreatic cancer (cases) and 1922 individuals not known to have cancer (controls). In the primary analysis, BRCA2 had the strongest enrichment for rare inactivating variants (17/437 cases vs 3/1922 controls) (P=3.27x10(-6); exome-wide statistical significance threshold P<2.5x10(-6)). Cases had more rare inactivating variants in DNA repair genes than controls, even after excluding 13 genes known to predispose to pancreatic cancer (adjusted odds ratio, 1.35, P=.045). At the suggestive threshold (P<.001), 6 genes were enriched for rare damaging variants (UHMK1, AP1G2, DNTA, CHST6, FGFR3, and EPHA1) and 7 genes had associations with pancreatic cancer risk, based on the sequence-kernel association test. We confirmed variants in BRCA2 as the most common high-penetrant genetic factor associated with pancreatic cancer and we also identified candidate pancreatic cancer genes. Large collaborations and novel approaches are needed to overcome the genetic heterogeneity of pancreatic cancer predisposition

    Random walks - a sequential approach

    Full text link
    In this paper sequential monitoring schemes to detect nonparametric drifts are studied for the random walk case. The procedure is based on a kernel smoother. As a by-product we obtain the asymptotics of the Nadaraya-Watson estimator and its as- sociated sequential partial sum process under non-standard sampling. The asymptotic behavior differs substantially from the stationary situation, if there is a unit root (random walk component). To obtain meaningful asymptotic results we consider local nonpara- metric alternatives for the drift component. It turns out that the rate of convergence at which the drift vanishes determines whether the asymptotic properties of the monitoring procedure are determined by a deterministic or random function. Further, we provide a theoretical result about the optimal kernel for a given alternative

    Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification

    Get PDF
    Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus
    corecore