16 research outputs found
Females and Males Show Differences in Early-Stage Transcriptomic Biomarkers of Lung Adenocarcinoma and Lung Squamous Cell Carcinoma
The incidence and mortality rates of lung cancers are different between females and males. Therefore, sex information should be an important part of how to train and optimize a diagnostic model. However, most of the existing studies do not fully utilize this information. This study carried out a comparative investigation between sex-specific models and sex-independent models. Three feature selection algorithms and five classifiers were utilized to evaluate the contribution of the sex information to the detection of early-stage lung cancers. Both lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) showed that the sex-specific models outperformed the sex-independent detection of early-stage lung cancers. The Venn plots suggested that females and males shared only a few transcriptomic biomarkers of early-stage lung cancers. Our experimental data suggested that sex information should be included in optimizing disease diagnosis models
Integration of lncRNAs, Protein-Coding Genes and Pathology Images for Detecting Metastatic Melanoma
Melanoma is a lethal skin disease that develops from moles. This study aimed to integrate multimodal data to predict metastatic melanoma, which is highly aggressive and difficult to treat. The proposed EnsembleSKCM method evaluated the prediction performances of long noncoding RNAs (lncRNAs), protein-coding messenger genes (mRNAs) and pathology images (images) for metastatic melanoma. Feature selection was used to screen for metastatic biomarkers in the lncRNA and mRNA datasets. The integrated EnsembleSKCM model was built based on the weighted results of the lncRNA-, mRNA- and image-based models. EnsembleSKCM achieved 0.9444 in the prediction accuracy of metastatic melanoma and outperformed the single-modal prediction models based on the lncRNA, mRNA and image data. The experimental data suggest the importance of integrating the complementary information from the three data modalities. WGCNA was used to analyze the relationship of molecular-level features and image features, and the results show connections between them. Another cohort was used to validate our prediction
DanceTrend: An Integration Framework of Video-Based Body Action Recognition and Color Space Features for Dance Popularity Prediction
Background: With the rise of user-generated content (UGC) platforms, we are witnessing an unprecedented surge in data. Among various content types, dance videos have emerged as a potent medium for artistic and emotional expression in the Web 2.0 era. Such videos have increasingly become a significant means for users to captivate audiences and amplify their online influence. Given this, predicting the popularity of dance videos on UGC platforms has drawn significant attention. Methods: This study postulates that body movement features play a pivotal role in determining the future popularity of dance videos. To test this hypothesis, we design a robust prediction framework DanceTrend to integrate the body movement features with color space information for dance popularity prediction. We utilize the jazz dance videos from the comprehensive AIST++ street dance dataset and segment each dance routine video into individual movements. AlphaPose was chosen as the human posture detection algorithm to help us obtain human motion features from the videos. Then, the ST-GCN (Spatial Temporal Graph Convolutional Network) is harnessed to train the movement classification models. These pre-trained ST-GCN models are applied to extract body movement features from our curated Bilibili dance video dataset. Alongside these body movement features, we integrate color space attributes and user metadata for the final dance popularity prediction task. Results: The experimental results endorse our initial hypothesis that the body movement features significantly influence the future popularity of dance videos. A comprehensive evaluation of various feature fusion strategies and diverse classifiers discern that a pre–post fusion hybrid strategy coupled with the XGBoost classifier yields the most optimal outcomes for our dataset
A Machine Learning-Based Investigation of Gender-Specific Prognosis of Lung Cancers
Background and Objective: Primary lung cancer is a lethal and rapidly-developing cancer type and is one of the most leading causes of cancer deaths. Materials and Methods: Statistical methods such as Cox regression are usually used to detect the prognosis factors of a disease. This study investigated survival prediction using machine learning algorithms. The clinical data of 28,458 patients with primary lung cancers were collected from the Surveillance, Epidemiology, and End Results (SEER) database. Results: This study indicated that the survival rate of women with primary lung cancer was often higher than that of men (p Conclusions: This data suggested that male patients may have more complicated factors in lung cancer than females, and it is necessary to develop gender-specific diagnosis and prognosis models
Recommended from our members
Computational Characterization of Undifferentially Expressed Genes with Altered Transcription Regulation in Lung Cancer.
Peer reviewed: TrueAcknowledgements: We extend our sincere thanks to the two anonymous reviewers for their insightful and constructive critiques. Their expert evaluations have significantly contributed to the improvement of our manuscript, notably, in refining the clarity, enhancing the visual presentation, and strengthening the argumentative rigor of our work.Publication status: PublishedFunder: Fundamental Research Funds for the Central UniversitiesA transcriptome profiles the expression levels of genes in cells and has accumulated a huge amount of public data. Most of the existing biomarker-related studies investigated the differential expression of individual transcriptomic features under the assumption of inter-feature independence. Many transcriptomic features without differential expression were ignored from the biomarker lists. This study proposed a computational analysis protocol (mqTrans) to analyze transcriptomes from the view of high-dimensional inter-feature correlations. The mqTrans protocol trained a regression model to predict the expression of an mRNA feature from those of the transcription factors (TFs). The difference between the predicted and real expression of an mRNA feature in a query sample was defined as the mqTrans feature. The new mqTrans view facilitated the detection of thirteen transcriptomic features with differentially expressed mqTrans features, but without differential expression in the original transcriptomic values in three independent datasets of lung cancer. These features were called dark biomarkers because they would have been ignored in a conventional differential analysis. The detailed discussion of one dark biomarker, GBP5, and additional validation experiments suggested that the overlapping long non-coding RNAs might have contributed to this interesting phenomenon. In summary, this study aimed to find undifferentially expressed genes with significantly changed mqTrans values in lung cancer. These genes were usually ignored in most biomarker detection studies of undifferential expression. However, their differentially expressed mqTrans values in three independent datasets suggested their strong associations with lung cancer
Reconstructing the cytokine view for the multi-view prediction of COVID-19 mortality
Abstract Background Coronavirus disease 2019 (COVID-19) is a rapidly developing and sometimes lethal pulmonary disease. Accurately predicting COVID-19 mortality will facilitate optimal patient treatment and medical resource deployment, but the clinical practice still needs to address it. Both complete blood counts and cytokine levels were observed to be modified by COVID-19 infection. This study aimed to use inexpensive and easily accessible complete blood counts to build an accurate COVID-19 mortality prediction model. The cytokine fluctuations reflect the inflammatory storm induced by COVID-19, but their levels are not as commonly accessible as complete blood counts. Therefore, this study explored the possibility of predicting cytokine levels based on complete blood counts. Methods We used complete blood counts to predict cytokine levels. The predictive model includes an autoencoder, principal component analysis, and linear regression models. We used classifiers such as support vector machine and feature selection models such as adaptive boost to predict the mortality of COVID-19 patients. Results Complete blood counts and original cytokine levels reached the COVID-19 mortality classification area under the curve (AUC) values of 0.9678 and 0.9111, respectively, and the cytokine levels predicted by the feature set alone reached the classification AUC value of 0.9844. The predicted cytokine levels were more significantly associated with COVID-19 mortality than the original values. Conclusions Integrating the predicted cytokine levels and complete blood counts improved a COVID-19 mortality prediction model using complete blood counts only. Both the cytokine level prediction models and the COVID-19 mortality prediction models are publicly available at http://www.healthinformaticslab.org/supp/resources.php
Additional file 1 of Reconstructing the cytokine view for the multi-view prediction of COVID-19 mortality
Supplementary Material