29 research outputs found

    A Unique Training Strategy to Enhance Language Models Capabilities for Health Mention Detection from Social Media Content

    Full text link
    An ever-increasing amount of social media content requires advanced AI-based computer programs capable of extracting useful information. Specifically, the extraction of health-related content from social media is useful for the development of diverse types of applications including disease spread, mortality rate prediction, and finding the impact of diverse types of drugs on diverse types of diseases. Language models are competent in extracting the syntactic and semantics of text. However, they face a hard time extracting similar patterns from social media texts. The primary reason for this shortfall lies in the non-standardized writing style commonly employed by social media users. Following the need for an optimal language model competent in extracting useful patterns from social media text, the key goal of this paper is to train language models in such a way that they learn to derive generalized patterns. The key goal is achieved through the incorporation of random weighted perturbation and contrastive learning strategies. On top of a unique training strategy, a meta predictor is proposed that reaps the benefits of 5 different language models for discriminating posts of social media text into non-health and health-related classes. Comprehensive experimentation across 3 public benchmark datasets reveals that the proposed training strategy improves the performance of the language models up to 3.87%, in terms of F1-score, as compared to their performance with traditional training. Furthermore, the proposed meta predictor outperforms existing health mention classification predictors across all 3 benchmark datasets

    GMO/GMF on Social Media in China: Jagged Landscape of Information Seeking and Sharing Behavior through a Valence View

    Get PDF
    The study examines the critical factors affecting Chinese social media (SM) users’ intentions and behavior to seek and share information on genetically modified organisms/ genetically modified food (GMO/GMF). The proposed framework was conceptualized through benefit-risk analysis and subsequently mapped SM users’ perceived benefits and risks to seeks and share information using Kurt Lewin’s valence view. Quantitative data was collected using survey questionnaires administered from 583 SM users. The results of the path analysis demonstrated two key findings related to SM users’ perceived benefits and risks to seek and share information on GMO/GMF. Among risks, the psychological risk is the strongest predictor of perceived risk to use SM for GMO/GMF, which consequently determines the intentions and behaviors to share information about GMO/GMF on SM in People’s Republic of China. Among benefits, the results showed that perceived usefulness, creditability of GMO/GMF information, and information support are positively related to perceived benefits to use SM for GMO/GMF, which subsequently, predicts the intentions and behaviors to seek information about GMO/GMF on SM. This study suggests scholars and practitioners explore and utilize the efficient communication strategy to fulfill the potential of the SM to increase GMO/GMF acceptance in Chinese society

    Convalescent plasma in patients admitted to hospital with COVID-19 (RECOVERY): a randomised controlled, open-label, platform trial

    Get PDF
    SummaryBackground Azithromycin has been proposed as a treatment for COVID-19 on the basis of its immunomodulatoryactions. We aimed to evaluate the safety and efficacy of azithromycin in patients admitted to hospital with COVID-19.Methods In this randomised, controlled, open-label, adaptive platform trial (Randomised Evaluation of COVID-19Therapy [RECOVERY]), several possible treatments were compared with usual care in patients admitted to hospitalwith COVID-19 in the UK. The trial is underway at 176 hospitals in the UK. Eligible and consenting patients wererandomly allocated to either usual standard of care alone or usual standard of care plus azithromycin 500 mg once perday by mouth or intravenously for 10 days or until discharge (or allocation to one of the other RECOVERY treatmentgroups). Patients were assigned via web-based simple (unstratified) randomisation with allocation concealment andwere twice as likely to be randomly assigned to usual care than to any of the active treatment groups. Participants andlocal study staff were not masked to the allocated treatment, but all others involved in the trial were masked to theoutcome data during the trial. The primary outcome was 28-day all-cause mortality, assessed in the intention-to-treatpopulation. The trial is registered with ISRCTN, 50189673, and ClinicalTrials.gov, NCT04381936.Findings Between April 7 and Nov 27, 2020, of 16 442 patients enrolled in the RECOVERY trial, 9433 (57%) wereeligible and 7763 were included in the assessment of azithromycin. The mean age of these study participants was65·3 years (SD 15·7) and approximately a third were women (2944 [38%] of 7763). 2582 patients were randomlyallocated to receive azithromycin and 5181 patients were randomly allocated to usual care alone. Overall,561 (22%) patients allocated to azithromycin and 1162 (22%) patients allocated to usual care died within 28 days(rate ratio 0·97, 95% CI 0·87–1·07; p=0·50). No significant difference was seen in duration of hospital stay (median10 days [IQR 5 to >28] vs 11 days [5 to >28]) or the proportion of patients discharged from hospital alive within 28 days(rate ratio 1·04, 95% CI 0·98–1·10; p=0·19). Among those not on invasive mechanical ventilation at baseline, nosignificant difference was seen in the proportion meeting the composite endpoint of invasive mechanical ventilationor death (risk ratio 0·95, 95% CI 0·87–1·03; p=0·24).Interpretation In patients admitted to hospital with COVID-19, azithromycin did not improve survival or otherprespecified clinical outcomes. Azithromycin use in patients admitted to hospital with COVID-19 should be restrictedto patients in whom there is a clear antimicrobial indication

    MirLocPredictor: A ConvNet-Based Multi-Label MicroRNA Subcellular Localization Predictor by Incorporating k-Mer Positional Information

    No full text
    MicroRNAs (miRNA) are small noncoding RNA sequences consisting of about 22 nucleotides that are involved in the regulation of almost 60% of mammalian genes. Presently, there are very limited approaches for the visualization of miRNA locations present inside cells to support the elucidation of pathways and mechanisms behind miRNA function, transport, and biogenesis. MIRLocator, a state-of-the-art tool for the prediction of subcellular localization of miRNAs makes use of a sequence-to-sequence model along with pretrained k-mer embeddings. Existing pretrained k-mer embedding generation methodologies focus on the extraction of semantics of k-mers. However, in RNA sequences, positional information of nucleotides is more important because distinct positions of the four nucleotides define the function of an RNA molecule. Considering the importance of the nucleotide position, we propose a novel approach (kmerPR2vec) which is a fusion of positional information of k-mers with randomly initialized neural k-mer embeddings. In contrast to existing k-mer-based representation, the proposed kmerPR2vec representation is much more rich in terms of semantic information and has more discriminative power. Using novel kmerPR2vec representation, we further present an end-to-end system (MirLocPredictor) which couples the discriminative power of kmerPR2vec with Convolutional Neural Networks (CNNs) for miRNA subcellular location prediction. The effectiveness of the proposed kmerPR2vec approach is evaluated with deep learning-based topologies (i.e., Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN)) and by using 9 different evaluation measures. Analysis of the results reveals that MirLocPredictor outperform state-of-the-art methods with a significant margin of 18% and 19% in terms of precision and recall

    An Efficient Automated Machine Learning Framework for Genomics and Proteomics Sequence Analysis

    No full text
    Genomics and Proteomics sequence analyses are the scientific studies of understanding the language of Deoxyribonucleic Acid (DNA), Ribonucleic Acid (RNA) and protein biomolecules with an objective of controlling the production of proteins and understanding their core functionalities. It helps to detect chronic diseases in early stages, root causes of clinical changes, key genetic targets for pharmaceutical development and optimization of therapeutics for various age groups. Most Genomics and Proteomics sequence analysis work is performed using typical wet lab experimental approaches that make use of different genetic diagnostic technologies. However, these approaches are costly, time consuming, skill and labor intensive. Hence, these approaches slow down the process of developing an efficient and economical sequence analysis landscape essential to demystify a variety of cellular processes and functioning of biomolecules in living organisms. To empower manual wet lab experiment driven research, many machine learning based approaches have been developed in recent years. However, these approaches cannot be used in practical environment due to their limited performance. Considering the sensitive and inherently demanding nature of Genomics and Proteomics sequence analysis which can have very far-reaching as well as serious repercussions on account of misdiagnosis, the main objective of this research is to develop an efficient automated computational framework for Genomics and Proteomics sequence analysis using the predictive and prescriptive analytical powers of Artificial Intelligence (AI) to significantly improve healthcare operations. The proposed framework is comprised of 3 main components namely sequence encoding, feature engineering and discrete or continuous value predictor. The sequence encoding module is equipped with a variety of existing and newly developed sequence encoding algorithms that are capable of generating a rich statistical representation of DNA, RNA and protein raw sequences. The feature engineering module has diverse types of feature selection and dimensionality reduction approaches which can be used to generate the most effective feature space. Furthermore, the discrete and/or continuous value predictor module of the proposed framework contains a wide range of existing machine learning and newly developed deep learning regressors and classifiers. To evaluate the integrity and generalizability of the proposed framework, we have performed a large-scale experimentation over diverse types of Genomics and Proteomics sequence analysis tasks (i.e., DNA, RNA and proteins). In Genomics analysis, Epigenetic modification detection is one of the key component. It helps clinical researchers and practitioners to distinguish normal cellular activities from malfunctioned ones, which can lead to diverse genetic disorders such as metabolic disorders, cancers, etc. To support this analysis, the proposed framework is used to solve the problem of DNA and Histone modification prediction where it has achieved state-of-the-art performance on 27 publicly available benchmark datasets of 17 different species with best accuracy of 97%. RNA sequence analysis is another vital component of Genomics sequence analysis where the identification of different coding and non-coding RNAs as well as their subcellular localization patterns help to demystify the functions of diverse RNAs, root causes of clinical changes, develop precision medicine and optimize therapeutics. To support this analysis, the proposed framework is utilized for non-coding RNA classification and multi-compartment RNA subcellular localization prediction. Where it achieved state-of-the-art performance on 10 publicly available benchmark datasets of Homo sapiens and Mus Musculus species with best accuracy of 98%. Proteomics sequence analysis is essential to demystify the virus pathogenesis, host immunity responses, the way proteins affect or are affected by the cell processes, their structure and core functionalities. To support this analysis, the proposed framework is used for host protein-protein and virus-host protein-protein interaction prediction. It has achieved state-of-the-art performance on 2 publicly available protein protein interaction datasets of Homo Sapiens and Mus Musculus species with best accuracy of 96% and 7 viral host protein protein interaction datasets of multiple hosts and viruses with best accuracy of 94%. Considering the performance and practical significance of proposed framework, we believe proposed framework will help researchers in developing cutting-edge practical applications for diverse Genomic and Proteomic sequence analyses tasks (i.e., DNA, RNA and proteins)

    LGCA-VHPPI: A local-global residue context aware viral-host protein-protein interaction predictor.

    No full text
    Viral-host protein protein interaction (PPI) analysis is essential to decode the molecular mechanism of viral pathogen and host immunity processes which eventually help to control viral diseases and optimize therapeutics. The state-of-the-art viral-host PPI predictor leverages unsupervised embedding learning technique (doc2vec) to generate statistical representations of viral-host protein sequences and a Random Forest classifier for interaction prediction. However, doc2vec approach generates the statistical representations of viral-host protein sequences by merely modelling the local context of residues which only partially captures residue semantics. The paper in hand proposes a novel technique for generating better statistical representations of viral and host protein sequences based on the infusion of comprehensive local and global contextual information of the residues. While local residue context aware encoding captures semantic relatedness and short range dependencies of residues. Global residue context aware encoding captures comprehensive long-range residues dependencies, positional invariance of residues, and unique residue combination distribution important for interaction prediction. Using concatenated rich statistical representations of viral and host protein sequences, a robust machine learning framework "LGCA-VHPPI" is developed which makes use of a deep forest model to effectively model complex non-linearity of viral-host PPI sequences. An in-depth performance comparison of the proposed LGCA-VHPPI framework with existing diverse sequence encoding schemes based viral-host PPI predictors reveals that LGCA-VHPPI outperforms state-of-the-art predictor by 6%, 2%, and 2% in terms of matthews correlation coefficient over 3 different benchmark viral-host PPI prediction datasets

    Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs

    No full text
    Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation
    corecore