78 research outputs found

    Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding

    Full text link
    We introduce a set of algorithms (Het-node2vec) that extend the original node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e. networks characterized by multiple types of nodes and edges. The resulting random walk samples capture both the structural characteristics of the graph and the semantics of the different types of nodes and edges. The proposed algorithms can focus their attention on specific node or edge types, allowing accurate representations also for underrepresented types of nodes/edges that are of interest for the prediction problem under investigation. These rich and well-focused representations can boost unsupervised and supervised learning on heterogeneous graphs.Comment: 20 pages, 5 figure

    The promises of large language models for protein design and modeling

    Get PDF
    The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the “language of proteins” invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design

    The promises of large language models for protein design and modeling.

    Get PDF
    The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the language of proteins invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design

    An expectation-maximization framework for comprehensive prediction of isoform-specific functions.

    Get PDF
    MOTIVATION: Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific gene ontology annotations. RESULTS: We present isoform interpretation, a method that uses expectation-maximization to infer isoform-specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85 617 isoforms of 17 900 protein-coding human genes spanning a range of 17 430 distinct gene ontology terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isoform interpretation significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isoform interpretation show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene-level function. AVAILABILITY AND IMPLEMENTATION: Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321

    SCUBA noise alters community structure and cooperation at Pederson’s cleaner shrimp cleaning stations

    Get PDF
    Recreational SCUBA diving is widespread and increasing on coral reefs worldwide. Standard open-circuit SCUBA equipment is inherently noisy and, by seeking out areas of high biodiversity, divers inadvertently expose reef communities to an intrusive source of anthropogenic noise. Currently, little is known about SCUBA noise as an acoustic stressor, and there is a general lack of empirical evidence on community-level impacts of anthropogenic noise on coral reefs. Here, we conducted a playback experiment on Caribbean reefs to investigate impacts of SCUBA noise on fish communities and interspecific cooperation at ecologically important cleaning stations of the Pederson’s cleaner shrimp Ancylomenes pedersoni. When exposed to SCUBA-noise playback, the total occurrence of fishes at the cleaning stations decreased by 7%, and the community and cleaning clientele compositions were significantly altered, with 27% and 25% of monitored species being affected, respectively. Compared with ambient-sound playback, SCUBA-noise playback resulted in clients having to wait 29% longer for cleaning initiation and receiving 43% less cleaning; however, cheating, signalling, posing and time spent cleaning were not affected by SCUBA-noise playback. Our study is the first to demonstrate experimentally that SCUBA noise can have at least some negative impacts on reef organisms, confirming it as an ecologically relevant pollutant. Moreover, by establishing acoustic disturbance as a likely mechanism for known impacts of diver presence on reef animals, we also identify a potential avenue for mitigation in these valuable ecosystems.</p

    GraPE: fast and scalable Graph Processing and Embedding

    Full text link
    Graph Representation Learning methods have enabled a wide range of learning problems to be addressed for data that can be represented in graph form. Nevertheless, several real world problems in economy, biology, medicine and other fields raised relevant scaling problems with existing methods and their software implementation, due to the size of real world graphs characterized by millions of nodes and billions of edges. We present GraPE, a software resource for graph processing and random walk based embedding, that can scale with large and high-degree graphs and significantly speed up-computation. GraPE comprises specialized data structures, algorithms, and a fast parallel implementation that displays everal orders of magnitude improvement in empirical space and time complexity compared to state of the art software resources, with a corresponding boost in the performance of machine learning methods for edge and node label prediction and for the unsupervised analysis of graphs.GraPE is designed to run on laptop and desktop computers, as well as on high performance computing cluster

    GRAPE for fast and scalable graph processing and random-walk-based embedding

    Get PDF
    Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third- party libraries, while ready-to-use and modular pipelines permit an easy-to- use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding

    Abdominal Computed Tomography Imaging Findings in Hospitalized COVID-19 Patients: A Year-Long Experience and Associations Revealed by Explainable Artificial Intelligence.

    Get PDF
    The aim of this retrospective study is to assess any association between abdominal CT findings and the radiological stage of COVID-19 pneumonia, pulmonary embolism and patient outcomes. We included 158 adult hospitalized COVID-19 patients between 1 March 2020 and 1 March 2021 who underwent 206 abdominal CTs. Two radiologists reviewed all CT images. Pathological findings were classified as acute or not. A subset of patients with inflammatory pathology in ACE2 organs (bowel, biliary tract, pancreas, urinary system) was identified. The radiological stage of COVID pneumonia, pulmonary embolism, overall days of hospitalization, ICU admission and outcome were registered. Univariate statistical analysis coupled with explainable artificial intelligence (AI) techniques were used to discover associations between variables. The most frequent acute findings were bowel abnormalities

    Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

    Get PDF
    Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of \u3e530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy

    Italian natural history museums on the verge of collapse?

    Get PDF
    The Italian natural history museums are facing a critical situation, due to the progressive loss of scientific relevance, decreasing economic investments, and scarcity of personnel. This is extremely alarming, especially for ensuring the long-term preservation of the precious collections they host. Moreover, a commitment in fieldwork to increase scientific collections and concurrent taxonomic research are rarely considered priorities, while most of the activities are addressed to public events with political payoffs, such as exhibits, didactic meetings, expositions, and talks. This is possibly due to the absence of a national museum that would have better steered research activities and overall concepts for collection management. We here propose that Italian natural history museums collaborate to instate a “metamuseum”, by establishing a reciprocal interaction network aimed at sharing budgetary and technical resources, which would assure better coordination of common long-term goals and scientific activities
    • …
    corecore