Search CORE

619 research outputs found

Graph-guided joint prediction of class label and clinical scores for the Alzheimer’s disease

Author: Liu Yufeng
Shen Dinggang
Yu Guan
Publication venue
Publication date: 01/01/2016
Field of study

Accurate diagnosis of Alzheimer’s disease and its prodromal stage, i.e., mild cognitive impairment, is very important for early treatment. Over the last decade, various machine learning methods have been proposed to predict disease status and clinical scores from brain images. It is worth noting that many features extracted from brain images are correlated significantly. In this case, feature selection combined with the additional correlation information among features can effectively improve classification/regression performance. Typically, the correlation information among features can be modeled by the connectivity of an undirected graph, where each node represents one feature and each edge indicates that the two involved features are correlated significantly. In this paper, we propose a new graph-guided multi-task learning method incorporating this undirected graph information to predict multiple response variables (i.e., class label and clinical scores) jointly. Specifically, based on the sparse undirected feature graph, we utilize a new latent group Lasso penalty to encourage the correlated features to be selected together. Furthermore, this new penalty also encourages the intrinsic correlated tasks to share a common feature subset. To validate our method, we have performed many numerical studies using simulated datasets and the Alzheimer’s Disease Neuroimaging Initiative dataset. Compared with the other methods, our proposed method has very promising performance

PubMed Central

Carolina Digital Repository

Estimating heritability of drug-induced liver injury from common variants and implications for future study designs

Author: Hripcsak George M.
Overby Casey Lynnette
Shen Yufeng
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

Recent genome-wide association studies identified certain human leukocyote antigen (HLA) alleles as the major risk factors of drug-induced liver injuries (DILI). While these alleles often cause large relative risk, their predictive values are quite low due to low prevalence of idiosyncratic DILI. Finding additional risk factors is important for precision medicine. However, optimal design of further genetic studies is hindered by uncertain overall heritability of DILI. This is a common problem for low-prevalence pharmacological traits, since it is difficult to obtain clinical outcome data in families. Here we estimated the heritability (h2) of DILI from case-control genome-wide single nucleotide polymorphism data using a method based on random effect models. We estimated the proportion of h2 captured by common SNPs for DILI to be between 0.3 and 0.5. For co-amoxiclav induced DILI, chromosome 6 explained part of the heritability, indicating additional contributions from common variants yet to be found. We performed simulations to assess the robustness of the h2 estimate with limited sample size under low prevelance, a condition typical to studies on idiosyncratic pharmacological traits. Our findings suggest that common variants outside of HLA contribute to DILI susceptability; therefore, it is valuable to conduct further GWAS with expanded case collection

Crossref

Columbia University Academic Commons

PubMed Central

Recommended from our members

Template-based prediction of protein structure with deep learning

Author: Shen Yufeng
Zhang Haicang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins

Columbia University Academic Commons

Gold on graphene as a substrate for surface enhanced Raman scattering study

Author: Hao Yufeng
Hu Hailong
Ni Zhenhua
Shen Ze Xiang
Thong John TL
Wang Yingying
Wong Choun Pei
Yu Ting
Publication venue: 'AIP Publishing'
Publication date: 18/10/2010
Field of study

In this paper, we report our study on gold (Au) films with different thicknesses deposited on single layer graphene (SLG) as surface enhanced Raman scattering (SERS) substrates for the characterization of rhodamine (R6G) molecules. We find that an Au film with a thickness of ~7 nm deposited on SLG is an ideal substrate for SERS, giving the strongest Raman signals for the molecules and the weakest photoluminescence (PL) background. While Au films effectively enhance both the Raman and PL signals of molecules, SLG effectively quenches the PL signals from the Au film and molecules. The former is due to the electromagnetic mechanism involved while the latter is due to the strong resonance energy transfer from Au to SLG. Hence, the combination of Au films and SLG can be widely used in the characterization of low concentration molecules with relatively weak Raman signals.Comment: 11 pages, 4 figure

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

Consistency of Lloyd's Algorithm Under Perturbations

Author: Bhamidi Shankar
Liu Yufeng
Patel Dhruv
Pipiras Vladas
Shen Hui
Publication venue
Publication date: 01/09/2023
Field of study

In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on

n

independent samples from a sub-Gaussian mixture is exponentially bounded after

O(\log(n))

iterations, assuming proper initialization of the algorithm. However, in many applications, the true samples are unobserved and need to be learned from the data via pre-processing pipelines such as spectral methods on appropriate data matrices. We show that the mis-clustering rate of Lloyd's algorithm on perturbed samples from a sub-Gaussian mixture is also exponentially bounded after

O(\log(n))

iterations under the assumptions of proper initialization and that the perturbation is small relative to the sub-Gaussian noise. In canonical settings with ground truth clusters, we derive bounds for algorithms such as

k

-means

++

to find good initializations and thus leading to the correctness of clustering via the main result. We show the implications of the results for pipelines measuring the statistical significance of derived clusters from data such as SigClust. We use these general results to derive implications in providing theoretical guarantees on the misclustering rate for Lloyd's algorithm in a host of applications, including high-dimensional time series, multi-dimensional scaling, and community detection for sparse networks via spectral clustering.Comment: Preprint version

arXiv.org e-Print Archive

A Digital Signal Recovery Technique Using DNNs for LEO Satellite Communication Systems

Author: Huang Yonghui
Pedersen Gert Frølund
Shen Ming
Wang Zhugang
Wei Wei
Zhang Yufeng
Publication venue
Publication date: 20/05/2020
Field of study

This article proposes a new digital signal recovery (DSR) technique for next-generation power efficient low Earth orbit (LEO) satellite-To-ground communication systems, which feature additive white Gaussian noise (AWGN) channel and significant power variation. This technique utilizes the prior knowledge [i.e., nonlinearities of radio frequency power amplifiers (RF-PAs)] of space-borne transmitters to improve the quality of the signal received at ground stations by modeling and mitigating the imperfection using deep neural networks (DNNs). Benefiting from its robustness against noise and power variation, the proposed DNN based DSR technique (DNN-DSR), can correct high signal distortions caused by the nonlinearities and hence allows RF-PAs to work close to their saturation region, leading to a high power efficiency of the LEO satellites. This work has been validated by both simulations and experiments, in comparison with the power back-off technique as well as memory polynomial-based DSR solutions. Experimental results show that the DNN-DSR technique can increase the drain efficiency of the space-borne RF-PA from 32.6% to 45% while maintaining the same error vector magnitude as the power back-off technique. It has also been demonstrated that the proposed DNN-DSR technique can handle a signal power variation of 12 dB, which is challenging for conventional solutions.</p

VBN

A pan-cancer analysis of driver gene mutations, DNA methylation and gene expressions reveals that chromatin remodeling is a major mechanism inducing global changes in cancer epigenomes.

Author: Kim Kyung In
Rabadan Raul
Shen Yufeng
Tycko Benjamin
Wang Shuang
Youn Ahrim
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/01/2018
Field of study

BACKGROUND: Recent large-scale cancer sequencing studies have discovered many novel cancer driver genes (CDGs) in human cancers. Some studies also suggest that CDG mutations contribute to cancer-associated epigenomic and transcriptomic alterations across many cancer types. Here we aim to improve our understanding of the connections between CDG mutations and altered cancer cell epigenomes and transcriptomes on pan-cancer level and how these connections contribute to the known association between epigenome and transcriptome. METHOD: Using multi-omics data including somatic mutation, DNA methylation, and gene expression data of 20 cancer types from The Cancer Genome Atlas (TCGA) project, we conducted a pan-cancer analysis to identify CDGs, when mutated, have strong associations with genome-wide methylation or expression changes across cancer types, which we refer as methylation driver genes (MDGs) or expression driver genes (EDGs), respectively. RESULTS: We identified 32 MDGs, among which, eight are known chromatin modification or remodeling genes. Many of the remaining 24 MDGs are connected to chromatin regulators through either regulating their transcription or physically interacting with them as potential co-factors. We identified 29 EDGs, 26 of which are also MDGs. Further investigation on target genes\u27 promoters methylation and expression alteration patterns of these 26 overlapping driver genes shows that hyper-methylation of target genes\u27 promoters are significantly associated with down-regulation of the same target genes and hypo-methylation of target genes\u27 promoters are significantly associated with up-regulation of the same target genes. CONCLUSION: This finding suggests a pivotal role for genetically driven changes in chromatin remodeling in shaping DNA methylation and gene expression patterns during tumor development

The Jackson Laboratory: The Mouseion at the JAXlibrary

Columbia University Academic Commons

Directory of Open Access Journals

The Francis Crick Institute

Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors

Author: Cao Juan
Chua Tat-Seng
Ji Wei
Qi Peng
Shen Yufeng
Zhao Yuyang
Publication venue
Publication date: 08/06/2023
Field of study

The prevalence of short video platforms has spawned a lot of fake news videos, which have stronger propagation ability than textual fake news. Thus, automatically detecting fake news videos has been an important countermeasure in practice. Previous works commonly verify each news video individually with multimodal information. Nevertheless, news videos from different perspectives regarding the same event are commonly posted together, which contain complementary or contradictory information and thus can be used to evaluate each other mutually. To this end, we introduce a new and practical paradigm, i.e., cross-sample fake news video detection, and propose a novel framework, Neighbor-Enhanced fakE news video Detection (NEED), which integrates the neighborhood relationship of new videos belonging to the same event. NEED can be readily combined with existing single-sample detectors and further enhance their performances with the proposed graph aggregation (GA) and debunking rectification (DR) modules. Specifically, given the feature representations obtained from single-sample detectors, GA aggregates the neighborhood information with the dynamic graph to enrich the features of independent samples. After that, DR explicitly leverages the relationship between debunking videos and fake news videos to refute the candidate videos via textual and visual consistency. Extensive experiments on the public benchmark demonstrate that NEED greatly improves the performance of both single-modal (up to 8.34% in accuracy) and multimodal (up to 4.97% in accuracy) base detectors. Codes are available in https://github.com/ICTMCG/NEED.Comment: To appear in ACL 2023 Finding

arXiv.org e-Print Archive

Characterizing Immunoglobulin Repertoire from Whole Blood by a Personal Genome Sequencer

Author: Feng Yaping
Gao Fan
Lin Edwin
Mack William J.
Shen Yufeng
Wang Kai
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

In human immune system, V(D)J recombination produces an enormously large repertoire of immunoglobulins (Ig) so that they can tackle different antigens from bacteria, viruses and tumor cells. Several studies have demonstrated the utility of next-generation sequencers such as Roche 454 and Illumina Genome Analyzer to characterize the repertoire of immunoglobulins. However, these techniques typically require separation of B cell population from whole blood and require a few weeks for running the sequencers, so it may not be practical to implement them in clinical settings. Recently, the Ion Torrent personal genome sequencer has emerged as a tabletop personal genome sequencer that can be operated in a time-efficient and cost-effective manner. In this study, we explored the technical feasibility to use multiplex PCR for amplifying V(D)J recombination for IgH, directly from whole blood, then sequence the amplicons by the Ion Torrent sequencer. The whole process including data generation and analysis can be completed in one day. We tested the method in a pilot study on patients with benign, atypical and malignant meningiomas. Despite the noisy data, we were able to compare the samples by their usage frequencies of the V segment, as well as their somatic hypermutation rates. In summary, our study suggested that it is technically feasible to perform clinical monitoring of V(D)J recombination within a day by personal genome sequencers

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central