21 research outputs found
Improved Performance of Gene Set Analysis on Genome-Wide Transcriptomics Data When Using Gene Activity State Estimates
Gene set analysis methods continue to be a popular and powerful method of evaluating genome-wide transcriptomics data. These approach require a priori grouping of genes into biologically meaningful sets, and then conducting downstream analyses at the set (instead of gene) level of analysis. Gene set analysis methods have been shown to yield more powerful statistical conclusions than single-gene analyses due to both reduced multiple testing penalties and potentially larger observed effects due to the aggregation of effects across multiple genes in the set. Traditionally, gene set analysis methods have been applied directly to normalized, log-transformed, transcriptomics data. Recently, efforts have been made to transform transcriptomics data to scales yielding more biologically interpretable results. For example, recently proposed models transform log-transformed transcriptomics data to a confidence metric (ranging between 0 and 100%) that a gene is active (roughly speaking, that the gene product is part of an active cellular mechanism). In this manuscript, we demonstrate, on both real and simulated transcriptomics data, that tests for differential expression between sets of genes using are typically more powerful when using gene activity state estimates as opposed to log-transformed gene expression data. Our analysis suggests further exploration of techniques to transform transcriptomics data to meaningful quantities for improved downstream inference
Genome-Wide Interaction Study of Omega-3 PUFAs and Other Fatty Acids on Inflammatory Biomarkers of Cardiovascular Health in the Framingham Heart Study
Numerous genetic loci have been identified as being associated with circulating fatty acid (FA) levels and/or inflammatory biomarkers of cardiovascular health (e.g., C-reactive protein). Recently, using red blood cell (RBC) FA data from the Framingham Offspring Study, we conducted a genome-wide association study of over 2.5 million single nucleotide polymorphisms (SNPs) and 22 RBC FAs (and associated ratios), including the four Omega-3 FAs (ALA, DHA, DPA, and EPA). Our analyses identified numerous causal loci. In this manuscript, we investigate the extent to which polyunsaturated fatty acid (PUFA) levels moderate the relationship of genetics to cardiovascular health biomarkers using a genome-wide interaction study approach. In particular, we test for possible gene–FA interactions on 9 inflammatory biomarkers, with 2.5 million SNPs and 12 FAs, including all Omega-3 PUFAs. We identified eighteen novel loci, including loci which demonstrate strong evidence of modifying the impact of heritable genetics on biomarker levels, and subsequently cardiovascular health. The identified genes provide increased clarity on the biological functioning and role of Omega-3 PUFAs, as well as other common fatty acids, in cardiovascular health, and suggest numerous candidate loci for future replication and biological characterization
Improvements to Bayesian Gene Activity State Estimation from Genome-Wide Transcriptomics Data
An important question in many biological applications, is to estimate or classify gene activity states (active or inactive) based on genome-wide transcriptomics data. Recently, we proposed a Bayesian method, titled MultiMM, which showed superior results compared to existing methods. In short, MultiMM performed better than existing methods on both simulated and real gene expression data, confirming well-known biological results and yielding better agreement with fluxomics data. Despite these promising results, MultiMM has numerous limitations. First, MultiMM leverages co-regulatory models to improve activity state estimates, but information about co-regulation is incorporated in a manner that assumes that networks are known with certainty. Second, MultiMM assumes that genes that change states in the dataset can be distinguished with certainty from those that remain in one state. Third, the model can be sensitive to extreme measures (outliers) of gene expression. In this manuscript, we propose a modified Bayesian approach, which addresses these three limitations by improving outlier handling and by explicitly modeling network and other uncertainty yielding improved gene activity state estimates when compared to MultiMM
A Genome-Wide Association Study of Red-Blood Cell Fatty Acids and Ratios Incorporating Dietary Covariates: Framingham Heart Study Offspring Cohort
Recent analyses have suggested a strong heritable component to circulating fatty acid (FA) levels; however, only a limited number of genes have been identified which associate with FA levels. In order to expand upon a previous genome wide association study done on participants in the Framingham Heart Study Offspring Cohort and FA levels, we used data from 2,400 of these individuals for whom red blood cell FA profiles, dietary information and genotypes are available, and then conducted a genome-wide evaluation of potential genetic variants associated with 22 FAs and 15 FA ratios, after adjusting for relevant dietary covariates. Our analysis found nine previously identified loci associated with FA levels (FADS, ELOVL2, PCOLCE2, LPCAT3, AGPAT4, NTAN1/PDXDC1, PKD2L1, HBS1L/MYB and RAB3GAP1/MCM6), while identifying four novel loci. The latter include an association between variants in CALN1 (Chromosome 7) and eicosapentaenoic acid (EPA), DHRS4L2(Chromosome 14) and a FA ratio measuring delta-9-desaturase activity, as well as two loci associated with less well understood proteins. Thus, the inclusion of dietary covariates had a modest impact, helping to uncover four additional loci. While genome-wide association studies continue to uncover additional genes associated with circulating FA levels, much of the heritable risk is yet to be explained, suggesting the potential role of rare genetic variation, epistasis and gene-environment interactions on FA levels as well. Further studies are needed to continue to understand the complex genetic picture of FA metabolism and synthesis
MicroWalk: A Framework for Finding Side Channels in Binaries
Microarchitectural side channels expose unprotected software to information
leakage attacks where a software adversary is able to track runtime behavior of
a benign process and steal secrets such as cryptographic keys. As suggested by
incremental software patches for the RSA algorithm against variants of
side-channel attacks within different versions of cryptographic libraries,
protecting security-critical algorithms against side channels is an intricate
task. Software protections avoid leakages by operating in constant time with a
uniform resource usage pattern independent of the processed secret. In this
respect, automated testing and verification of software binaries for
leakage-free behavior is of importance, particularly when the source code is
not available. In this work, we propose a novel technique based on Dynamic
Binary Instrumentation and Mutual Information Analysis to efficiently locate
and quantify memory based and control-flow based microarchitectural leakages.
We develop a software framework named \tool~for side-channel analysis of
binaries which can be extended to support new classes of leakage. For the first
time, by utilizing \tool, we perform rigorous leakage analysis of two
widely-used closed-source cryptographic libraries: \emph{Intel IPP} and
\emph{Microsoft CNG}. We analyze different cryptographic implementations
consisting of million instructions in about minutes of CPU time. By
locating previously unknown leakages in hardened implementations, our results
suggest that \tool~can efficiently find microarchitectural leakages in software
binaries
Recommended from our members
Memory Safety for Today’s Languages and Architectures
Memory safety vulnerabilities remain one of the most critical sources of exploitable security problems in today’s software. Despite the growing popularity of modern, memory-safe languages, much of today’s software remains written in C and C++, which are prone to these vulnerabilities; and rewriting all of this C and C++ code would be prohibitively expensive and time-consuming. At the same time, microarchitectural side-channel attacks threaten to violate memory safety in increasingly complex ways. But, new languages such as WebAssembly (Wasm), and new hardware features such as ARM MTE, give programmers new tools in the fight against memory safety vulnerabilities — and with clever use of these tools, we can obtain strong security guarantees for today’s software.In this dissertation, we present a variety of tools for improving memory safety for today’s C and C++ codebases, on today’s side-channel-prone microarchitectures. In the domain of finding memory-safety vulnerabilities, we first demonstrate how new microarchitectural features sometimes introduce new side-channel attacks (Chapter 1); then, we present program analysis tools which help keep programs secure from that class of side-channel attacks (Chapter 2) and from a newer and particularly relevant class of side-channel attacks, Spectre attacks (Chapter 3). In the remainder of the dissertation we focus on automatically preventing memory-safety vulnerabilities. We systematically compare and critique proposed software-based defenses against Spectre (Chapter 4); then we present one such defense, a tool which automatically and efficiently secures cryptographic programs against Spectre (Chapter 5). Starting with Chapter 6 we return to non-side-channel memory safety vulnerabilities, proposing an extension to Wasm which provides memory safety even inside its software sandbox; and finally, in Chapter 7 we present a compiler-based defense which works in conjunction with ARM MTE to automatically secure C and C++ programs from spatial memory safety vulnerabilities
Genome-Wide Interaction Study of Omega-3 PUFAs and Other Fatty Acids on Inflammatory Biomarkers of Cardiovascular Health in the Framingham Heart Study
Numerous genetic loci have been identified as being associated with circulating fatty acid (FA) levels and/or inflammatory biomarkers of cardiovascular health (e.g., C-reactive protein). Recently, using red blood cell (RBC) FA data from the Framingham Offspring Study, we conducted a genome-wide association study of over 2.5 million single nucleotide polymorphisms (SNPs) and 22 RBC FAs (and associated ratios), including the four Omega-3 FAs (ALA, DHA, DPA, and EPA). Our analyses identified numerous causal loci. In this manuscript, we investigate the extent to which polyunsaturated fatty acid (PUFA) levels moderate the relationship of genetics to cardiovascular health biomarkers using a genome-wide interaction study approach. In particular, we test for possible gene–FA interactions on 9 inflammatory biomarkers, with 2.5 million SNPs and 12 FAs, including all Omega-3 PUFAs. We identified eighteen novel loci, including loci which demonstrate strong evidence of modifying the impact of heritable genetics on biomarker levels, and subsequently cardiovascular health. The identified genes provide increased clarity on the biological functioning and role of Omega-3 PUFAs, as well as other common fatty acids, in cardiovascular health, and suggest numerous candidate loci for future replication and biological characterization
Constant-Time Foundations for the New Spectre Era
PLDI '20International audienceThe constant-time discipline is a software-based countermeasure used for protecting high assurance cryptographic implementations against timing side-channel attacks. Constant-time is effective (it protects against many known attacks), rigorous (it can be formalized using program semantics), and amenable to automated verification. Yet, the advent of micro-architectural attacks makes constant-time as it exists today far less useful. This paper lays foundations for constant-time programming in the presence of speculative and out-of-order execution. We present an operational semantics and a formal definition of constant-time programs in this extended setting. Our semantics eschews formalization of microarchitectural features (that are instead assumed under adversary control), and yields a notion of constant-time that retains the elegance and tractability of the usual notion. We demonstrate the relevance of our semantics in two ways: First, by contrasting existing Spectre-like attacks with our definition of constant-time. Second, by implementing a static analysis tool, Pitchfork, which detects violations of our extended constant-time property in real world cryptographic libraries