95 research outputs found
Co-regulated Transcripts Associated to Cooperating eSNPs Define Bi-fan Motifs in Human Gene Networks
Associations between the level of single transcripts and single corresponding genetic variants, expression single nucleotide polymorphisms (eSNPs), have been extensively studied and reported. However, most expression traits are complex, involving the cooperative action of multiple SNPs at different loci affecting multiple genes. Finding these cooperating eSNPs by exhaustive search has proven to be statistically challenging. In this paper we utilized availability of sequencing data with transcriptional profiles in the same cohorts to identify two kinds of usual suspects: eSNPs that alter coding sequences or eSNPs within the span of transcription factors (TFs). We utilize a computational framework for considering triplets, each comprised of a SNP and two associated genes. We examine pairs of triplets with such cooperating source eSNPs that are both associated with the same pair of target genes. We characterize such quartets through their genomic, topological and functional properties. We establish that this regulatory structure of cooperating quartets is frequent in real data, but is rarely observed in permutations. eSNP sources are mostly located on different chromosomes and away from their targets. In the majority of quartets, SNPs affect the expression of the two gene targets independently of one another, suggesting a mutually independent rather than a directionally dependent effect. Furthermore, the directions in which the minor allele count of the SNP affects gene expression within quartets are consistent, so that the two source eSNPs either both have the same effect on the target genes or both affect one gene in the opposite direction to the other. Same-effect eSNPs are observed more often than expected by chance. Cooperating quartets reported here in a human system might correspond to bi-fans, a known network motif of four nodes previously described in model organisms. Overall, our analysis offers insights regarding the fine motif structure of human regulatory networks
Metaseq: Privacy Preserving Meta-analysis of Sequencing-based Association Studies
Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for meta-analysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exomesequencing data from multiple studies, simulating phenotypic information for powerful metaanalysis. The MetaSeq software is publicly available as open source
Fast hyperboloid decision tree algorithms
Hyperbolic geometry is gaining traction in machine learning for its
effectiveness at capturing hierarchical structures in real-world data.
Hyperbolic spaces, where neighborhoods grow exponentially, offer substantial
advantages and consistently deliver state-of-the-art results across diverse
applications. However, hyperbolic classifiers often grapple with computational
challenges. Methods reliant on Riemannian optimization frequently exhibit
sluggishness, stemming from the increased computational demands of operations
on Riemannian manifolds. In response to these challenges, we present hyperDT, a
novel extension of decision tree algorithms into hyperbolic space. Crucially,
hyperDT eliminates the need for computationally intensive Riemannian
optimization, numerically unstable exponential and logarithmic maps, or
pairwise comparisons between points by leveraging inner products to adapt
Euclidean decision tree algorithms to hyperbolic space. Our approach is
conceptually straightforward and maintains constant-time decision complexity
while mitigating the scalability issues inherent in high-dimensional Euclidean
spaces. Building upon hyperDT we introduce hyperRF, a hyperbolic random forest
model. Extensive benchmarking across diverse datasets underscores the superior
performance of these models, providing a swift, precise, accurate, and
user-friendly toolkit for hyperbolic data analysis
Recommended from our members
Allelic Selection of Amplicons in Glioblastoma Revealed by Combining Somatic and Germline Analysis
Cancer is a disease driven by a combination of inherited risk alleles coupled with the acquisition of somatic mutations, including amplification and deletion of genomic DNA. Potential relationships between the inherited and somatic aspects of the disease have only rarely been examined on a genome-wide level. Applying a novel integrative analysis of SNP and copy number measurements, we queried the tumor and normal-tissue genomes of 178 glioblastoma patients from the Cancer Genome Atlas project for preferentially amplified alleles, under the hypothesis that oncogenic germline variants will be selectively amplified in the tumor environment. Selected alleles are revealed by allelic imbalance in amplification across samples. This general approach is based on genetic principles and provides a method for identifying important tumor-related alleles. We find that SNP alleles that are most significantly overrepresented in amplicons tend to occur in genes involved with regulation of kinase and transferase activity, and many of these genes are known contributors to gliomagenesis. The analysis also implicates variants in synapse genes. By incorporating gene expression data, we demonstrate synergy between preferential allelic amplification and expression in DOCK4 and EGFR. Our results support the notion that combining germline and tumor genetic data can identify regions relevant to cancer biology
- β¦