9 research outputs found
Collaborative Aspects of Open Data in Software Engineering
Engineers require high-quality data for the design and implementation of today’s software, especially in the context of machine learning (ML). This puts an emphasis on the need for the publication and sharing of data from and between organizations, public as well as private. Following the paradigm of open innovation, open data provide a mechanism to increase the availability of information, offering utility and improving innovation and user choice through the inevitable interoperability this enables
Cystic fibrosis mutation analysis: Report from 22 U.K. regional genetics laboratories
We have collated the results of cystic fibrosis (CF) mutation analysis conducted in 22 laboratories in the United Kingdom. A total of 9,807 CF chromosomes have been analysed, demonstrating 56 different mutations so far observed and accounting for 86% of CF genes in the native Caucasian population of the United Kingdom. ΔF508 is the most common at 753% of CF mutations (range 56.5–83.7%), followed by G551D (3.08%; range 0.71–7.60%), G542X (1.68%; range 0.85–3.66%), 621 + 1 (G>T) (0.93%; range 0.41–3.16%), 1717-1(G>A) (0.57%; range 0.17-1.14%), 1898+ 1)(G>A) (0.46%), R117H (0.46%), N1303K (0.46%), and R553X (0.46%). The data show a clear geographical variation in the distribution of some of the mutations, most notably a marked regional variation in the distribution of 621 + 1 (G>T)and 1989+ 1(G>A), which are both apparently more frequent in Wales. R560T and R117H appear to be more frequent in Ireland and Scotland, and G551D more frequent in Scotland. In summary, these data illustrate that the mutations present within a particular population need to be defined in order to provide meaningful carrier screening and testing for rare mutations in affected individuals. Furthermore, it is apparent that the ethnic origin of a patient, even within a small country such as the United Kingdom, should be taken into account. © 1995 Wiley-Liss, Inc
Genome-wide association study identifies novel breast cancer susceptibility loci
Breast cancer exhibits familial aggregation, consistent with variation in genetic susceptibility to the disease. Known susceptibility genes account for less than 25% of the familial risk of breast cancer, and the residual genetic variance is likely to be due to variants conferring more moderate risks. To identify further susceptibility alleles, we conducted a two-stage genome-wide association study in 4,398 breast cancer cases and 4,316 controls, followed by a third stage in which 30 single nucleotide polymorphisms (SNPs) were tested for confirmation in 21,860 cases and 22,578 controls from 22 studies. We used 227,876 SNPs that were estimated to correlate with 77% of known common SNPs in Europeans at r2 > 0.5. SNPs in five novel independent loci exhibited strong and consistent evidence of association with breast cancer (P < 10(-7)). Four of these contain plausible causative genes (FGFR2, TNRC9, MAP3K1 and LSP1). At the second stage, 1,792 SNPs were significant at the P < 0.05 level compared with an estimated 1,343 that would be expected by chance, indicating that many additional common susceptibility alleles may be identifiable by this approach.<br/
Polygenic risk scores for prediction of breast cancer and breast cancer subtypes
Abstract
Stratification of women according to their risk of breast cancer based on polygenic risk scores (PRSs) could improve screening and prevention strategies. Our aim was to develop PRSs, optimized for prediction of estrogen receptor (ER)-specific disease, from the largest available genome-wide association dataset and to empirically validate the PRSs in prospective studies. The development dataset comprised 94,075 case subjects and 75,017 control subjects of European ancestry from 69 studies, divided into training and validation sets. Samples were genotyped using genome-wide arrays, and single-nucleotide polymorphisms (SNPs) were selected by stepwise regression or lasso penalized regression. The best performing PRSs were validated in an independent test set comprising 11,428 case subjects and 18,323 control subjects from 10 prospective studies and 190,040 women from UK Biobank (3,215 incident breast cancers). For the best PRSs (313 SNPs), the odds ratio for overall disease per 1 standard deviation in ten prospective studies was 1.61 (95%CI: 1.57–1.65) with area under receiver-operator curve (AUC) = 0.630 (95%CI: 0.628–0.651). The lifetime risk of overall breast cancer in the top centile of the PRSs was 32.6%. Compared with women in the middle quintile, those in the highest 1% of risk had 4.37- and 2.78-fold risks, and those in the lowest 1% of risk had 0.16- and 0.27-fold risks, of developing ER-positive and ER-negative disease, respectively. Goodness-of-fit tests indicated that this PRS was well calibrated and predicts disease risk accurately in the tails of the distribution. This PRS is a powerful and reliable predictor of breast cancer risk that may improve breast cancer prevention programs