176 research outputs found

    GRAPE for fast and scalable graph processing and random-walk-based embedding

    Get PDF
    Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.National Center for Gene Therapy and Drugs based on RNA Technology, PNRR-NextGenerationEU program G43C22001320007United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Cancer Institute (NCI) U01-CA239108-02Transition Grant Line 1A Project NIMI PARTENARIATI H2020' 1R24OD011883-01United States Department of Health & Human Services National Institutes of Health (NIH) - USA U01-CA239108-02 DE-AC02-05CH11231United States Department of Energy (DOE)European Union (EU) Marie Curie Actions PSR2015-1720GVALE_01 PID2021-128970OA-I0

    Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding

    Full text link
    We introduce a set of algorithms (Het-node2vec) that extend the original node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e. networks characterized by multiple types of nodes and edges. The resulting random walk samples capture both the structural characteristics of the graph and the semantics of the different types of nodes and edges. The proposed algorithms can focus their attention on specific node or edge types, allowing accurate representations also for underrepresented types of nodes/edges that are of interest for the prediction problem under investigation. These rich and well-focused representations can boost unsupervised and supervised learning on heterogeneous graphs.Comment: 20 pages, 5 figure

    Distribution and genetic lineages of the Craspedacusta sowerbii species complex (Cnidaria, Olindiidae) in Italy

    Get PDF
    Olindiid freshwater jellyfishes of the genus Craspedacusta Lankester, 1880 are native to eastern Asia; however, some species within the genus have been introduced worldwide and are nowadays present in all continents except Antarctica. To date, there is no consensus regarding the taxonomy within the genus Craspedacusta due to the morphological plasticity of the medusa stages. The species Craspedacusta sowerbii Lankester, 1880 was first recorded in Italy in 1946, and until 2017, sightings of the jellyfish Craspedacusta were reported for 40 water bodies. Here, we shed new light on the presence of the freshwater jellyfishes belonging to the genus Craspedacusta across the Italian peninsula, Sardinia, and Sicily. First, we report 21 new observations of this non-native taxon, of which eighteen refer to medusae sightings, two to environmental DNA sequencing, and one to the finding of polyps. Then, we investigate the molecular diversity of collected Craspedacusta specimens, using a Bayesian analysis of sequences of the mitochondrial gene encoding for Cytochrome c Oxidase Subunit I (mtDNA COI). Our molecular analysis shows the presence of two distinctive genetic lineages: (i) a group that comprises sequences obtained from populations ranging from central to northern Italy; (ii) a group that comprises three populations from northern Italy—i.e., those from the Lake Levico, the Lake Santo of Monte Terlago, and the Lake Endine—and the single known Sicilian population. We also report for the first time a mtDNA COI sequence obtained from a Craspedacusta medusa collected in Spai

    parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.

    Get PDF
    BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF

    GraPE: fast and scalable Graph Processing and Embedding

    Full text link
    Graph Representation Learning methods have enabled a wide range of learning problems to be addressed for data that can be represented in graph form. Nevertheless, several real world problems in economy, biology, medicine and other fields raised relevant scaling problems with existing methods and their software implementation, due to the size of real world graphs characterized by millions of nodes and billions of edges. We present GraPE, a software resource for graph processing and random walk based embedding, that can scale with large and high-degree graphs and significantly speed up-computation. GraPE comprises specialized data structures, algorithms, and a fast parallel implementation that displays everal orders of magnitude improvement in empirical space and time complexity compared to state of the art software resources, with a corresponding boost in the performance of machine learning methods for edge and node label prediction and for the unsupervised analysis of graphs.GraPE is designed to run on laptop and desktop computers, as well as on high performance computing cluster

    GRAPE for fast and scalable graph processing and random-walk-based embedding

    Get PDF
    Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third- party libraries, while ready-to-use and modular pipelines permit an easy-to- use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding

    Metronomic Oral Vinorelbine: An Alternative Schedule in Elderly and Patients PS2 With Local/Advanced and Metastatic NSCLC Not Oncogene-addicted

    Get PDF
    The MILES and ELVIS studies showed that vinorelbine is one of the best options for elderly patients with advanced non-small-cell-lung cancer (NSCLC). Oral vinorelbine at standard schedule (60-80 mg/m2/weekly) has good activity in terms of response rates and progression-free survival. In recent years, a metronomic schedule of oral vinorelbine (40-50 mg/m2 three times a week, continuously) has been studied in phase II trials, especially in unfit and elderly patients. In the MOVE trial metronomic oral vinorelbine had a clinical benefit [partial response (PR)+stable disease (SD) >12 weeks] in 58.1% of patients with mild toxicity. On this basis, in 2017 we started a phase II study with metronomic oral vinorelbine in elderly (over 70 years) or unfit [Eastern Cooperative Oncology Group performance score (ECOG-PS) of 2] patients with locally/advanced and metastatic NSCLC. Primary aims were clinical benefit (PR+SD ≥6 months) and toxicity; secondary aims were progression-free survival and overall survival

    Volatile lipophilic substances management in case of fatal sniffing.

    Get PDF
    Death due to inhalation of aliphatic hydrocarbons such as butane and propane is a particularly serious problem worldwide, resulting in several fatal cases of sniffing these volatile substances in order to "get high". Despite the number of cases published, there is not a unique approach to case management of fatal sniffing. In this paper we illustrate the volatile lipophilic substances management in a case of a prisoner died after sniffing a butane-propane gas mixture from prefilled camping stove gas canisters, discussing the comprehensive approach of the crime scene, the autopsy, histology and toxicology. A large set of accurate values of both butane and propane was obtained by gas chromatography-mass spectrometry analyzing the following post-mortem biological samples: peripheral blood, heart blood, vitreous humor, liver, lung, heart, brain/cerebral cortex, fat tissue, kidney, and allowed an in depth discussion about the cause of death. A key role is played by following the proper sampling approach during autopsy
    corecore