1,268 research outputs found

    Privacy Preserving Data Publishing

    Get PDF
    Recent years have witnessed increasing interest among researchers in protecting individual privacy in the big data era, involving social media, genomics, and Internet of Things. Recent studies have revealed numerous privacy threats and privacy protection methodologies, that vary across a broad range of applications. To date, however, there exists no powerful methodologies in addressing challenges from: high-dimension data, high-correlation data and powerful attackers. In this dissertation, two critical problems will be investigated: the prospects and some challenges for elucidating the attack capabilities of attackers in mining individuals’ private information; and methodologies that can be used to protect against such inference attacks, while guaranteeing significant data utility. First, this dissertation has proposed a series of works regarding inference attacks laying emphasis on protecting against powerful adversaries with auxiliary information. In the context of genomic data, data dimensions and computation feasibility is highly challenging in conducting data analysis. This dissertation proved that the proposed attack can effectively infer the values of the unknown SNPs and traits in linear complexity, which dramatically improve the computation cost compared with traditional methods with exponential computation cost. Second, putting differential privacy guarantee into high-dimension and high-correlation data remains a challenging problem, due to high-sensitivity, output scalability and signal-to-noise ratio. Consider there are tens-of-millions of genomes in a human DNA, it is infeasible for traditional methods to introduce noise to sanitize genomic data. This dissertation has proposed a series of works and demonstrated that the proposed differentially private method satisfies differential privacy; moreover, data utility is improved compared with the states of the arts by largely lowering data sensitivity. Third, putting privacy guarantee into social data publishing remains a challenging problem, due to tradeoff requirements between data privacy and utility. This dissertation has proposed a series of works and demonstrated that the proposed methods can effectively realize privacy-utility tradeoff in data publishing. Finally, two future research topics are proposed. The first topic is about Privacy Preserving Data Collection and Processing for Internet of Things. The second topic is to study Privacy Preserving Big Data Aggregation. They are motivated by the newly proposed data mining, artificial intelligence and cybersecurity methods

    Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm: A Review

    Get PDF
    Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans

    Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm

    Get PDF
    Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans

    Recent Large-Scale Genotyping and Phenotyping of Plant Genetic Resources of Vegetatively Propagated Crops

    Get PDF
    Several recent national and international projects have focused on large-scale genotyping of plant genetic resources in vegetatively propagated crops like fruit and berries, potatoes and woody ornamentals. The primary goal is usually to identify true-to-type plant material, detect possible synonyms, and investigate genetic diversity and relatedness among accessions. A secondary goal may be to create sustainable databases that can be utilized in research and breeding for several years ahead. Commonly applied DNA markers (like microsatellite DNA and SNPs) and next-generation sequencing each have their pros and cons for these purposes. Methods for large-scale phenotyping have lagged behind, which is unfortunate since many commercially important traits (yield, growth habit, storability, and disease resistance) are difficult to score. Nevertheless, the analysis of gene action and development of robust DNA markers depends on environmentally controlled screening of very large sets of plant material. Although more time-consuming, co-operative projects with broad-scale data collection are likely to produce more reliable results. In this review, we will describe some of the approaches taken in genotyping and/or phenotyping projects concerning a wide variety of vegetatively propagated crops

    The landscape of the methodology in drug repurposing using human genomic data:a systematic review

    Get PDF
    The process of drug development is expensive and time-consuming. In contrast, drug repurposing can be introduced to clinical practice more quickly and at a reduced cost. Over the last decade, there has been a significant expansion of large biobanks that link genomic data to electronic health record (EHR) data, public availability of various databases containing biological and clinical information, and rapid development of novel methodologies and algorithms in integrating different sources of data. This review aims to provide a thorough summary of different strategies that utilize genomic data to seek drug-repositioning opportunities. We searched MEDLINE and EMBASE databases to identify eligible studies up until 1st May 2023, with a total of 102 studies finally included after two-step parallel screening. We summarized commonly used strategies for drug repurposing, including Mendelian randomization, multi-omic-based and network-based studies, and illustrated each strategy with examples, as well as the data sources implemented. By leveraging existing knowledge and infrastructure to expedite the drug discovery process and reduce costs, drug repurposing potentially identifies new therapeutic uses for approved drugs in a more efficient and targeted manner. However, technical challenges when integrating different types of data and biased or incomplete understanding of drug interactions are important hindrances that cannot be disregarded in the pursuit of identifying novel therapeutic applications. This review offers an overview of drug repurposing methodologies, providing valuable insights and guiding future directions for advancing drug repurposing studies

    A systems biology framework integrating GWAS and RNA-seq to shed light on the molecular basis of sperm quality in swine

    Get PDF
    Background Genetic pressure in animal breeding is sparking the interest of breeders for selecting elite boars with higher sperm quality to optimize ejaculate doses and fertility rates. However, the molecular basis of sperm quality is not yet fully understood. Our aim was to identify candidate genes, pathways and DNA variants associated to sperm quality in swine by analysing 25 sperm-related phenotypes and integrating genome-wide association studies (GWAS) and RNA-seq under a systems biology framework. Results By GWAS, we identified 12 quantitative trait loci (QTL) associated to the percentage of head and neck abnormalities, abnormal acrosomes and motile spermatozoa. Candidate genes included CHD2, KATNAL2, SLC14A2 and ABCA1. By RNA-seq, we identified a wide repertoire of mRNAs (e.g. PRM1, OAZ3, DNAJB8, TPPP2 and TNP1) and miRNAs (e.g. ssc-miR-30d, ssc-miR-34c, ssc-miR-30c-5p, ssc-miR-191, members of the let-7 family and ssc-miR-425-5p) with functions related to sperm biology. We detected 6128 significant correlations (P-value ≤ 0.05) between sperm traits and mRNA abundances. By expression (e)GWAS, we identified three trans-expression QTL involving the genes IQCJ, ACTR2 and HARS. Using the GWAS and RNA-seq data, we built a gene interaction network. We considered that the genes and interactions that were present in both the GWAS and RNA-seq networks had a higher probability of being actually involved in sperm quality and used them to build a robust gene interaction network. In addition, in the final network we included genes with RNA abundances correlated with more than four semen traits and miRNAs interacting with the genes on the network. The final network was enriched for genes involved in gamete generation and development, meiotic cell cycle, DNA repair or embryo implantation. Finally, we designed a panel of 73 SNPs based on the GWAS, eGWAS and final network data, that explains between 5% (for sperm cell concentration) and 36% (for percentage of neck abnormalities) of the phenotypic variance of the sperm traits. Conclusions By applying a systems biology approach, we identified genes that potentially affect sperm quality and constructed a SNP panel that explains a substantial part of the phenotypic variance for semen quality in our study and that should be tested in other swine populations to evaluate its relevance for the pig breeding sector.info:eu-repo/semantics/publishedVersio

    Diabeettisten komplikaatioiden uusien fenotyyppiprofiilien etsintä, sekä ryhmien välisten geneettisten komponenttien tunnistus koneoppimismenetelmiä hyödyntäen

    Get PDF
    Patients with Type 1 diabetes (T1D) may develop a wide variety of additional slowly progressing complications, which have been shown to be partly heritable and to correlate with each other. However, the genetic and biological mechanisms behind them are still mostly unknown. The goal of this work was to use machine learning and data mining approaches that could capture the progressive nature of multiple complications simultaneously, and create novel phenotype classes that could help to solve the pathogenesis and genetics of diabetic complications. To achieve this, a dual-layer self-organizing map (SOM) was trained using clinical and environmental patient data from the FinnDiane study, and the trained SOM node prototypes were clustered to classes using agglomerative hierarchical clustering. The genetic differences between the created classes were evaluated using heritability estimates, and the genetic markers associated with the class assignments showing significant heritability were analysed in genome-wide association study (GWAS). The created class assignments were biologically plausible, and were estimated to be up to 42% genetically determined. The GWAS analyses detected a genetic marker (rs202095311, located in the last intron of the gene NRIP1) genome-wide significantly (p<5×10^-8) associated with one of the created class assignments. In addition, GWAS detected multiple other genetic regions with suggestive p-values that contained mostly genes and processes previously linked to diabetic complications or their risk factors. Overall, the new approach to study the genetics of complex diseases was found to perform well in case of T1D and its complications, and could be used to study also other complex traits and diseases.Tyypin 1 diabetikoille saattaa kehittyä useita hitaasti eteneviä lisäsairauksia, jotka ovat osittain perinnöllisiä sekä keskenään korreloivia. Sekä geneettiset että biologiset mekanismit näiden taustalla ovat kuitenkin pääasiassa vielä tuntemattomia. Tämän työn tarkoituksena oli hyödyntää koneoppimis- ja tiedonlouhintamenetelmiä, joiden avulla pystyttäisiin vangitsemaan samanaikaisesti useiden diabeettisten komplikaatioden etenevä luonne, sekä muodostamaan uusia fenotyyppiluokkia diabeettisten komplikaatioiden ja niiden genetiikan tutkimuksen avuksi. Työssä opetettiin monitasoinen itseorganisoituva kartta (SOM) käyttäen FinnDiane tutimuksessa kliinisistä muuttujista sekä ympäristötekijöistä kerättyä potilasdataa. Uusien fenotyyppiluokkien luomiseksi opetetun kartan prototyyppialkiot klusteroitiin kokoavalla hierarkkisella klusteroinnilla. Luokkien välisiä geneettisiä eroja vertailtiin heritabiliteettiestimaateilla. Lisäksi luokkajakoon assosioituvien geneettisten markkereiden vaikutusta tutkittiin perimänlaajuisessa assosiaatiotutkimuksessa (GWAS) niiden luokkien välillä, jotka saavuttivat merkitseviä estimaatteja heritabiliteeteille. Muodostetut potilasluokat olivat biologisesti mielekkäitä ja muodostetun luokkajaon estimoitiin olevan jopa 42% geneettisesti määräytyvä. Perimänlaajuisissa assosiaatiotutkimuksissa geneettinen variantti (rs202095311 NRIP1 geenin viimeisessä intronissa) assosioitui yhteen muodostetuista luokkajaoista genominlaajuisella merkitsevyystasolla (p<5×10^-8). Lisäksi analyyseissa havaittiin viitteellisillä p-arvoilla useita muita geneettisiä alueita, joilla sijaitsee aiemmin diabeettisiin komplikaatioihin tai niiden riskitekijöihin yhdistettyjä geenejä ja prosesseja. Yleisesti, uusi lähestymistapa kompleksisten sairauksien genetiikan tutkimukseen suoriutui sille asetetuista haasteista tyypin 1 diabeteksen ja sen komplikaatioiden tutkimuksessa ja vastaava lähestymistapa voisi olla hyödynnettävissä myös muiden kompleksisten sairauksien tutkimuksessa
    corecore