29 research outputs found
A large data resource of genomic copy number variation across neurodevelopmental disorders
Copy number variations (CNVs) are implicated across many neurodevelopmental disorders (NDDs) and contribute to their shared genetic etiology. Multiple studies have attempted to identify shared etiology among NDDs, but this is the first genome-wide CNV analysis across autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), schizophrenia (SCZ), and obsessive-compulsive disorder (OCD) at once. Using microarray (Affymetrix CytoScan HD), we genotyped 2,691 subjects diagnosed with an NDD (204 SCZ, 1,838 ASD, 427 ADHD and 222 OCD) and 1,769 family members, mainly parents. We identified rare CNVs, defined as those found in \u3c0.1% of 10,851 population control samples. We found clinically relevant CNVs (broadly defined) in 284 (10.5%) of total subjects, including 22 (10.8%) among subjects with SCZ, 209 (11.4%) with ASD, 40 (9.4%) with ADHD, and 13 (5.6%) with OCD. Among all NDD subjects, we identified 17 (0.63%) with aneuploidies and 115 (4.3%) with known genomic disorder variants. We searched further for genes impacted by different CNVs in multiple disorders. Examples of NDD-associated genes linked across more than one disorder (listed in order of occurrence frequency) are NRXN1, SEH1L, LDLRAD4, GNAL, GNG13, MKRN1, DCTN2, KNDC1, PCMTD2, KIF5A, SYNM, and long non-coding RNAs: AK127244 and PTCHD1-AS. We demonstrated that CNVs impacting the same genes could potentially contribute to the etiology of multiple NDDs. The CNVs identified will serve as a useful resource for both research and diagnostic laboratories for prioritization of variants
Advanced analytical methodologies for measuring healthy ageing and its determinants, using factor analysis and machine learning techniques: the ATHLOS project
A most challenging task for scientists that are involved in the study of ageing is the development of a measure to quantify health status across populations and over time. In the present study, a Bayesian multilevel Item Response Theory approach is used to create a health score that can be compared across different waves in a longitudinal study, using anchor items and items that vary across waves. The same approach can be applied to compare health scores across different longitudinal studies, using items that vary across studies. Data from the English Longitudinal Study of Ageing (ELSA) are employed. Mixed-effects multilevel regression and Machine Learning methods were used to identify relationships between socio-demographics and the health score created. The metric of health was created for 17,886 subjects (54.6% of women) participating in at least one of the first six ELSA waves and correlated well with already known conditions that affect health. Future efforts will implement this approach in a harmonised data set comprising several longitudinal studies of ageing. This will enable valid comparisons between clinical and community dwelling populations and help to generate norms that could be useful in day-to-day clinical practice
Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
BACKGROUND: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE. METHODS: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers. RESULTS: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results. CONCLUSIONS: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer
Transporters in Drug Development: 2018 ITC Recommendations for Transporters of Emerging Clinical Importance
This white paper provides updated International Transporter Consortium (ITC) recommendations on transporters that are important in drug development following the 3rd ITC workshop. New additions include prospective evaluation of organic cation transporter 1 (OCT1) and retrospective evaluation of organic anion transporting polypeptide (OATP)2B1 because of their important roles in drug absorption, disposition, and effects. For the first time, the ITC underscores the importance of transporters involved in drug-induced vitamin deficiency (THTR2) and those involved in the disposition of biomarkers of organ function (OAT2 and bile acid transporters)
Rare copy number variation in posttraumatic stress disorder
Posttraumatic stress disorder (PTSD) is a heritable (h2 = 24-71%) psychiatric illness. Copy number variation (CNV) is a form of rare genetic variation that has been implicated in the etiology of psychiatric disorders, but no large-scale investigation of CNV in PTSD has been performed. We present an association study of CNV burden and PTSD symptoms in a sample of 114,383 participants (13,036 cases and 101,347 controls) of European ancestry. CNVs were called using two calling algorithms and intersected to a consensus set. Quality control was performed to remove strong outlier samples. CNVs were examined for association with PTSD within each cohort using linear or logistic regression analysis adjusted for population structure and CNV quality metrics, then inverse variance weighted meta-analyzed across cohorts. We examined the genome-wide total span of CNVs, enrichment of CNVs within specified gene-sets, and CNVs overlapping individual genes and implicated neurodevelopmental regions. The total distance covered by deletions crossing over known neurodevelopmental CNV regions was significant (beta = 0.029, SE = 0.005, P = 6.3 × 10-8). The genome-wide neurodevelopmental CNV burden identified explains 0.034% of the variation in PTSD symptoms. The 15q11.2 BP1-BP2 microdeletion region was significantly associated with PTSD (beta = 0.0206, SE = 0.0056, P = 0.0002). No individual significant genes interrupted by CNV were identified. 22 gene pathways related to the function of the nervous system and brain were significant in pathway analysis (FDR q < 0.05), but these associations were not significant once NDD regions were removed. A larger sample size, better detection methods, and annotated resources of CNV are needed to explore this relationship further
Predicting the effect of variants on splicing using Convolutional Neural Networks
Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10−7). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition
19th Asia Pacific Symposium
This PALO volume constitutes the Proceedings of the 19th Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES 2015), held in Bangkok, Thailand, November 22-25, 2015. The IES series of conference is an annual event that was initiated back in 1997 in Canberra, Australia. IES aims to bring together researchers from countries of the Asian Pacific Rim, in the fields of intelligent systems and evolutionary computation, to exchange ideas, present recent results and discuss possible collaborations. Researchers beyond Asian Pacific Rim countries are also welcome and encouraged to participate. The theme for IES 2015 is “Transforming Big Data into Knowledge and Technological Breakthroughs”. The host organization for IES 2015 is the School of Information Technology (SIT), King Mongkut’s University of Technology Thonburi (KMUTT), and it is technically sponsored by the International Neural Network Society (INNS). IES 2015 is collocated with three other conferences; namely, The 6th International Conference on Computational Systems-Biology and Bioinformatics 2015 (CSBio 2015), The 7th International Conference on Advances in Information Technology 2015 (IAIT 2015) and The 10th International Conference on e-Business 2015 (iNCEB 2015), as a major part of series of events to celebrate the SIT 20th anniversary and the KMUTT 55th anniversary