7 research outputs found

    A Generic NLI approach for Classification of Sentiment Associated with Therapies

    Full text link
    This paper describes our system for addressing SMM4H 2023 Shared Task 2 on "Classification of sentiment associated with therapies (aspect-oriented)". In our work, we adopt an approach based on Natural language inference (NLI) to formulate this task as a sentence pair classification problem, and train transformer models to predict sentiment associated with a therapy on a given text. Our best model achieved 75.22\% F1-score which was 11\% (4\%) more than the mean (median) score of all teams' submissions.Comment: Accepted in Workshop on Social Media Mining for Health 2023 (#SMM4H

    PRED(BALB/c): a system for the prediction of peptide binding to H2(d) molecules, a haplotype of the BALB/c mouse

    Get PDF
    PRED(BALB/c) is a computational system that predicts peptides binding to the major histocompatibility complex-2 (H2(d)) of the BALB/c mouse, an important laboratory model organism. The predictions include the complete set of H2(d) class I (H2-K(d), H2-L(d) and H2-D(d)) and class II (I-E(d) and I-A(d)) molecules. The prediction system utilizes quantitative matrices, which were rigorously validated using experimentally determined binders and non-binders and also by in vivo studies using viral proteins. The prediction performance of PRED(BALB/c) is of very high accuracy. To our knowledge, this is the first online server for the prediction of peptides binding to a complete set of major histocompatibility complex molecules in a model organism (H2(d) haplotype). PRED(BALB/c) is available at

    BioDArt - Catalogue of biological data artifact examples

    No full text
    Information in biological data repositories continues to grow exponentially due to the increasing genomic and proteomic sequencing projects. As with any database, these data repositories are subjected to data quality issues related to correctness, uniformity, completeness, redundancy, among others. Data cleaning is a prerequisite to prevent the interference of low quality data with the accuracy of data mining and analysis. This in turn involves the detection and resolution of data artifacts (errors, discrepancies, redundancies, ambiguities, and incompleteness). Understanding the causes of data artifacts and systematically classifying them are critical towards their elimination in molecular sequence databases. This paper highlights eight data artifacts found among public molecular databases. Examples of major molecular sequence database records containing these artifacts are collected into the BioDArt catalogue (http://antigen.i2r.a-star.edu.sg/BioDArt)

    Literature-driven, Ontology-centric Knowledge Navigation for Lipidomics

    No full text
    As the semantic web vision continues to proliferate a gap still remains in the full scale adoption of such technologies. The exact reasons for this continue to be the subject of ongoing debate, however, it is likely the emergence of reproducible infrastructure and deployments will expedite its adoption. We illustrate the recognizable added value to life science researchers gained through the convergence of existing and customized semantic web technologies (content acquisition pipelines supplying legacy unstructured texts, natural language processing, OWL-DL ontology development and instantiation, reasoning over A-boxes using a visual query tool). The resulting platform allows lipidomic researchers to rapidly navigate large volumes of full-text scientific documents according to recognizable lipid nomenclature, hierarchies and classifications. Specifically we have enabled searches for sentences describing lipidprotein and lipid-disease interactions.

    CandiVF - Candida albicans virulence factor database

    No full text
    Candida albicans is a pathogen commonly infecting patients who receive immunosuppressive drug therapy, long-term catheterization, or those who suffer from acquired immune deficiency syndrome (AIDS). The major factor accountable for pathogenicity of C. albicans is host immune status. Various virulence molecules, or factors, of are also responsible for the disease progression. Virulence proteins are published in public databases but they normally lack detailed functional annotations. We have developed CandiVF, a specialized database of C. albicans virulence factors (http://antigen.i2r.a-star.edu.sg/Templar/DB/CandiVF/) to facilitate efficient extraction and analysis of data aimed to assist research on immune responses, pathogenesis, prevention, and control of candidiasis. CandiVF contains a large number of annotated virulence proteins, including secretory, cell wall-associated, membrane, cytoplasmic, and nuclear proteins. This database has in-built bioinformatics tools including keyword and BLAST search, visualization of 3D-structures, HLA-DR epitope prediction, virulence descriptors, and virulence factors ontology
    corecore