218,406 research outputs found

    Predicting human protein function with multitask deep neural networks

    Get PDF
    Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multitask deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability

    Team GPLSI at AuTexTification Shared Task: Determining the Authorship of a Text

    Get PDF
    AuTexTification is a shared task within the IberLEF workshop which aims to determine whether a text has been generated by an Artificial Intelligence (AI) or a human. The objective of this paper is to report the participation and results of the GPLSI team from the University of Alicante (Spain) in subtask 1: Human or Generated of the AuTexTification challenge for English and Spanish languages. We propose and experiment with different approaches based on Transfer Learning; Ensemble Learning; fine-tuning existing language models, such as RoBERTa or RemBERT; or relying on linguistic features. Our best models for both languages were trained through Transfer Learning techniques, obtaining the 6th and 8th position in the English and Spanish versions of this subtask, respectively. Results obtained in the Spanish-version were close to the top-performing team.This research work is part of the R&D projects “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00) and “TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP” (PID2021-122263OB-C22), both funded by MCIN/ AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”, and “CLEAR.TEXT:Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European Union NextGenerationEU/PRTR”. Moreover, it has been also partially funded by the Generalitat Valenciana through the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21)", and by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231)

    What can we learn from Semantic Tagging?

    Full text link
    We investigate the effects of multi-task learning using the recently introduced task of semantic tagging. We employ semantic tagging as an auxiliary task for three different NLP tasks: part-of-speech tagging, Universal Dependency parsing, and Natural Language Inference. We compare full neural network sharing, partial neural network sharing, and what we term the learning what to share setting where negative transfer between tasks is less likely. Our findings show considerable improvements for all tasks, particularly in the learning what to share setting, which shows consistent gains across all tasks.Comment: 9 pages with references and appendixes. EMNLP 2018 camera read

    Learning Multimodal Latent Attributes

    Get PDF
    Abstract—The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multi-modal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we (1) introduce a concept of semi-latent attribute space, expressing user-defined and latent attributes in a unified framework, and (2) propose a novel scalable probabilistic topic model for learning multi-modal semi-latent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multi-task learning, learning with label noise, N-shot transfer learning and importantly zero-shot learning
    • …
    corecore