120,829 research outputs found
A generalized least-squares framework for rare-variant analysis in family data.
Rare variants may, in part, explain some of the hereditability missing in current genome-wide association studies. Many gene-based rare-variant analysis approaches proposed in recent years are aimed at population-based samples, although analysis strategies for family-based samples are clearly warranted since the family-based design has the potential to enhance our ability to enrich for rare causal variants. We have recently developed the generalized least squares, sequence kernel association test, or GLS-SKAT, approach for the rare-variant analyses in family samples, in which the kinship matrix that was computed from the high dimension genetic data was used to decorrelate the family structure. We then applied the SKAT-O approach for gene-/region-based inference in the decorrelated data. In this study, we applied this GLS-SKAT method to the systolic blood pressure data in the simulated family sample distributed by the Genetic Analysis Workshop 18. We compared the GLS-SKAT approach to the rare-variant analysis approach implemented in family-based association test-v1 and demonstrated that the GLS-SKAT approach provides superior power and good control of type I error rate
Gene-based genome-wide association studies and meta-analyses of conotruncal heart defects.
Conotruncal heart defects (CTDs) are among the most common and severe groups of congenital heart defects. Despite evidence of an inherited genetic contribution to CTDs, little is known about the specific genes that contribute to the development of CTDs. We performed gene-based genome-wide analyses using microarray-genotyped and imputed common and rare variants data from two large studies of CTDs in the United States. We performed two case-parent trio analyses (N = 640 and 317 trios), using an extension of the family-based multi-marker association test, and two case-control analyses (N = 482 and 406 patients and comparable numbers of controls), using a sequence kernel association test. We also undertook two meta-analyses to combine the results from the analyses that used the same approach (i.e. family-based or case-control). To our knowledge, these analyses are the first reported gene-based, genome-wide association studies of CTDs. Based on our findings, we propose eight CTD candidate genes (ARF5, EIF4E, KPNA1, MAP4K3, MBNL1, NCAPG, NDFUS1 and PSMG3). Four of these genes (ARF5, KPNA1, NDUFS1 and PSMG3) have not been previously associated with normal or abnormal heart development. In addition, our analyses provide additional evidence that genes involved in chromatin-modification and in ribonucleic acid splicing are associated with congenital heart defects
Exome-wide association study of pancreatic cancer risk
We conducted a case-control exome-wide association study to discover germline variants in coding regions that affect risk for pancreatic cancer, combining data from 5 studies. We analyzed exome and genome sequencing data from 437 patients with pancreatic cancer (cases) and 1922 individuals not known to have cancer (controls). In the primary analysis, BRCA2 had the strongest enrichment for rare inactivating variants (17/437 cases vs 3/1922 controls) (P=3.27x10(-6); exome-wide statistical significance threshold P<2.5x10(-6)). Cases had more rare inactivating variants in DNA repair genes than controls, even after excluding 13 genes known to predispose to pancreatic cancer (adjusted odds ratio, 1.35, P=.045). At the suggestive threshold (P<.001), 6 genes were enriched for rare damaging variants (UHMK1, AP1G2, DNTA, CHST6, FGFR3, and EPHA1) and 7 genes had associations with pancreatic cancer risk, based on the sequence-kernel association test. We confirmed variants in BRCA2 as the most common high-penetrant genetic factor associated with pancreatic cancer and we also identified candidate pancreatic cancer genes. Large collaborations and novel approaches are needed to overcome the genetic heterogeneity of pancreatic cancer predisposition
Random walks - a sequential approach
In this paper sequential monitoring schemes to detect nonparametric drifts
are studied for the random walk case. The procedure is based on a kernel
smoother. As a by-product we obtain the asymptotics of the Nadaraya-Watson
estimator and its as- sociated sequential partial sum process under
non-standard sampling. The asymptotic behavior differs substantially from the
stationary situation, if there is a unit root (random walk component). To
obtain meaningful asymptotic results we consider local nonpara- metric
alternatives for the drift component. It turns out that the rate of convergence
at which the drift vanishes determines whether the asymptotic properties of the
monitoring procedure are determined by a deterministic or random function.
Further, we provide a theoretical result about the optimal kernel for a given
alternative
Recommended from our members
Transcriptome-Wide Association Supplements Genome-Wide Association in Zea mays.
Modern improvement of complex traits in agricultural species relies on successful associations of heritable molecular variation with observable phenotypes. Historically, this pursuit has primarily been based on easily measurable genetic markers. The recent advent of new technologies allows assaying and quantifying biological intermediates (hereafter endophenotypes) which are now readily measurable at a large scale across diverse individuals. The usefulness of endophenotypes for delineating the regulatory landscape of the genome and genetic dissection of complex trait variation remains underexplored in plants. The work presented here illustrated the utility of a large-scale (299-genotype and seven-tissue) gene expression resource to dissect traits across multiple levels of biological organization. Using single-tissue- and multi-tissue-based transcriptome-wide association studies (TWAS), we revealed that about half of the functional variation acts through altered transcript abundance for maize kernel traits, including 30 grain carotenoid abundance traits, 20 grain tocochromanol abundance traits, and 22 field-measured agronomic traits. Comparing the efficacy of TWAS with genome-wide association studies (GWAS) and an ensemble approach that combines both GWAS and TWAS, we demonstrated that results of TWAS in combination with GWAS increase the power to detect known genes and aid in prioritizing likely causal genes. Using a variance partitioning approach in the largely independent maize Nested Association Mapping (NAM) population, we also showed that the most strongly associated genes identified by combining GWAS and TWAS explain more heritable variance for a majority of traits than the heritability captured by the random genes and the genes identified by GWAS or TWAS alone. This not only improves the ability to link genes to phenotypes, but also highlights the phenotypic consequences of regulatory variation in plants
Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification
Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus
- …
