412 research outputs found
Multivariate Information Fusion With Fast Kernel Learning to Kernel Ridge Regression in Predicting LncRNA-Protein Interactions
Long non-coding RNAs (lncRNAs) constitute a large class of transcribed RNA molecules. They have a characteristic length of more than 200 nucleotides which do not encode proteins. They play an important role in regulating gene expression by interacting with the homologous RNA-binding proteins. Due to the laborious and time-consuming nature of wet experimental methods, more researchers should pay great attention to computational approaches for the prediction of lncRNA-protein interaction (LPI). An in-depth literature review in the state-of-the-art in silico investigations, leads to the conclusion that there is still room for improving the accuracy and velocity. This paper propose a novel method for identifying LPI by employing Kernel Ridge Regression, based on Fast Kernel Learning (LPI-FKLKRR). This approach, uses four distinct similarity measures for lncRNA and protein space, respectively. It is remarkable, that we extract Gene Ontology (GO) with proteins, in order to improve the quality of information in protein space. The process of heterogeneous kernels integration, applies Fast Kernel Learning (FastKL) to deal with weight optimization. The extrapolation model is obtained by gaining the ultimate prediction associations, after using Kernel Ridge Regression (KRR). Experimental outcomes show that the ability of modeling with LPI-FKLKRR has extraordinary performance compared with LPI prediction schemes. On benchmark dataset, it has been observed that the best Area Under Precision Recall Curve (AUPR) of 0.6950 is obtained by our proposed model LPI-FKLKRR, which outperforms the integrated LPLNP (AUPR: 0.4584), RWR (AUPR: 0.2827), CF (AUPR: 0.2357), LPIHN (AUPR: 0.2299), and LPBNI (AUPR: 0.3302). Also, combined with the experimental results of a case study on a novel dataset, it is anticipated that LPI-FKLKRR will be a useful tool for LPI prediction
SBSM-Pro: Support Bio-sequence Machine for Proteins
Proteins play a pivotal role in biological systems. The use of machine
learning algorithms for protein classification can assist and even guide
biological experiments, offering crucial insights for biotechnological
applications. We propose a support bio-sequence machine for proteins, a model
specifically designed for biological sequence classification. This model starts
with raw sequences and groups amino acids based on their physicochemical
properties. It incorporates sequence alignment to measure the similarities
between proteins and uses a novel MKL approach to integrate various types of
information, utilizing support vector machines for classification prediction.
The results indicate that our model demonstrates commendable performance across
10 datasets in terms of the identification of protein function and
posttranslational modification. This research not only showcases
state-of-the-art work in protein classification but also paves the way for new
directions in this domain, representing a beneficial endeavour in the
development of platforms tailored for biological sequence classification.
SBSM-Pro is available for access at http://lab.malab.cn/soft/SBSM-Pro/.Comment: 38 pages, 9 figure
Effect of childbearing-age women’s family status on the health status of three generations: evidence from China
It is widely recognized that inequalities in social status cause inequalities in health. Women in a family often directly influence three generations–women themselves, their children and their parents -yet the effect of women’s family status on their own health status and that of the two generations before and after is not clear. Taking data from the China Family Panel Studies, this study used an ordered response model to investigate the effect of childbearing-age women’s family status on the health status of three generations. The results showed that increases in childbearing-age women’s family status improved the health status of the women themselves and their children. Unlike previous studies, however, we found that higher family status did not improve parents’ health status but decreased it. The mechanism analysis indicated that women’s family status influenced the health status of three generations through economic conditions, resource allocation, and child discipline. The results held after robustness testing. Our findings contribute to knowledge in related fields and provide theoretical support for policies that empower women
FKRR-MVSF: A Fuzzy Kernel Ridge Regression Model for Identifying DNA-Binding Proteins by Multi-View Sequence Features via Chou\u27s Five-Step Rule
DNA-binding proteins play an important role in cell metabolism. In biological laboratories, the detection methods of DNA-binding proteins includes yeast one-hybrid methods, bacterial singles and X-ray crystallography methods and others, but these methods involve a lot of labor, material and time. In recent years, many computation-based approachs have been proposed to detect DNA-binding proteins. In this paper, a machine learning-based method, which is called the Fuzzy Kernel Ridge Regression model based on Multi-View Sequence Features (FKRR-MVSF), is proposed to identifying DNA-binding proteins. First of all, multi-view sequence features are extracted from protein sequences. Next, a Multiple Kernel Learning (MKL) algorithm is employed to combine multiple features. Finally, a Fuzzy Kernel Ridge Regression (FKRR) model is built to detect DNA-binding proteins. Compared with other methods, our model achieves good results. Our method obtains an accuracy of 83.26% and 81.72% on two benchmark datasets (PDB1075 and compared with PDB186), respectively
Multiple Kronecker RLS fusion-based link propagation for drug-side effect prediction
Drug-side effect prediction has become an essential area of research in the
field of pharmacology. As the use of medications continues to rise, so does the
importance of understanding and mitigating the potential risks associated with
them. At present, researchers have turned to data-driven methods to predict
drug-side effects. Drug-side effect prediction is a link prediction problem,
and the related data can be described from various perspectives. To process
these kinds of data, a multi-view method, called Multiple Kronecker RLS
fusion-based link propagation (MKronRLSF-LP), is proposed. MKronRLSF-LP extends
the Kron-RLS by finding the consensus partitions and multiple graph Laplacian
constraints in the multi-view setting. Both of these multi-view settings
contribute to a higher quality result. Extensive experiments have been
conducted on drug-side effect datasets, and our empirical results provide
evidence that our approach is effective and robust.Comment: Transactions on Machine Learning Research (TMLR 2024
Propagation-invariant strongly longitudinally polarized toroidal pulses
Recent advancements in optical, terahertz, and microwave systems have
unveiled non-transverse optical toroidal pulses characterized by skyrmionic
topologies, fractal-like singularities, space-time nonseparability, and
anapole-exciting ability. Despite this, the longitudinally polarized fields of
canonical toroidal pulses notably lag behind their transverse counterparts in
magnitude. Interestingly, although mushroom-cloud-like toroidal vortices with
strong longitudinal fields are common in nature, they remain unexplored in the
realm of electromagnetics. Here, we present strongly longitudinally polarized
toroidal pulses (SLPTPs) which boast a longitudinal component amplitude
exceeding that of the transverse component by over tenfold. This unique
polarization property endows SLPTPs with robust propagation characteristics,
showcasing nondiffracting behavior. The propagation-invariant strongly
longitudinally polarized field holds promise for pioneering light-matter
interactions, far-field superresolution microscopy, and high-capacity wireless
communication utilizing three polarizations
MDA-SKF: Similarity Kernel Fusion for Accurately Discovering miRNA-Disease Association
Identifying accurate associations between miRNAs and diseases is beneficial for diagnosis and treatment of human diseases. It is especially important to develop an efficient method to detect the association between miRNA and disease. Traditional experimental method has high precision, but its process is complicated and time-consuming. Various computational methods have been developed to uncover potential associations based on an assumption that similar miRNAs are always related to similar diseases. In this paper, we propose an accurate method, MDA-SKF, to uncover potential miRNA-disease associations. We first extract three miRNA similarity kernels (miRNA functional similarity, miRNA sequence similarity, Hamming profile similarity for miRNA) and three disease similarity kernels (disease semantic similarity, disease functional similarity, Hamming profile similarity for disease) in two subspaces, respectively. Then, due to limitations that some initial information may be lost in the process and some noises may be exist in integrated similarity kernel, we propose a novel Similarity Kernel Fusion (SKF) method to integrate multiple similarity kernels. Finally, we utilize the Laplacian Regularized Least Squares (LapRLS) method on the integrated kernel to find potential associations. MDA-SKF is evaluated by three evaluation methods, including global leave-one-out cross validation (LOOCV) and local LOOCV and 5-fold cross validation (CV), and achieves AUCs of 0.9576, 0.8356, and 0.9557, respectively. Compared with existing seven methods, MDA-SKF has outstanding performance on global LOOCV and 5-fold. We also test case studies to further analyze the performance of MDA-SKF on 32 diseases. Furthermore, 3200 candidate associations are obtained and a majority of them can be confirmed. It demonstrates that MDA-SKF is an accurate and efficient computational tool for guiding traditional experiments
Discovering Cancer Subtypes via an Accurate Fusion Strategy on Multiple Profile Data
Discovering cancer subtypes is useful for guiding clinical treatment of multiple cancers. Progressive profile technologies for tissue have accumulated diverse types of data. Based on these types of expression data, various computational methods have been proposed to predict cancer subtypes. It is crucial to study how to better integrate these multiple profiles of data. In this paper, we collect multiple profiles of data for five cancers on The Cancer Genome Atlas (TCGA). Then, we construct three similarity kernels for all patients of the same cancer by gene expression, miRNA expression and isoform expression data. We also propose a novel unsupervised multiple kernel fusion method, Similarity Kernel Fusion (SKF), in order to integrate three similarity kernels into one combined kernel. Finally, we make use of spectral clustering on the integrated kernel to predict cancer subtypes. In the experimental results, the P-values from the Cox regression model and survival curve analysis can be used to evaluate the performance of predicted subtypes on three datasets. Our kernel fusion method, SKF, has outstanding performance compared with single kernel and other multiple kernel fusion strategies. It demonstrates that our method can accurately identify more accurate subtypes on various kinds of cancers. Our cancer subtype prediction method can identify essential genes and biomarkers for disease diagnosis and prognosis, and we also discuss the possible side effects of therapies and treatment
- …
