176 research outputs found
GRAPE for fast and scalable graph processing and random-walk-based embedding
Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.National Center for Gene Therapy and Drugs based on RNA Technology, PNRR-NextGenerationEU program G43C22001320007United States Department of Health & Human Services
National Institutes of Health (NIH) - USA
NIH National Cancer Institute (NCI) U01-CA239108-02Transition Grant Line 1A Project NIMI PARTENARIATI H2020' 1R24OD011883-01United States Department of Health & Human Services
National Institutes of Health (NIH) - USA U01-CA239108-02
DE-AC02-05CH11231United States Department of Energy (DOE)European Union (EU)
Marie Curie Actions
PSR2015-1720GVALE_01
PID2021-128970OA-I0
Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding
We introduce a set of algorithms (Het-node2vec) that extend the original
node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e.
networks characterized by multiple types of nodes and edges. The resulting
random walk samples capture both the structural characteristics of the graph
and the semantics of the different types of nodes and edges. The proposed
algorithms can focus their attention on specific node or edge types, allowing
accurate representations also for underrepresented types of nodes/edges that
are of interest for the prediction problem under investigation. These rich and
well-focused representations can boost unsupervised and supervised learning on
heterogeneous graphs.Comment: 20 pages, 5 figure
Distribution and genetic lineages of the Craspedacusta sowerbii species complex (Cnidaria, Olindiidae) in Italy
Olindiid freshwater jellyfishes of the genus Craspedacusta Lankester, 1880 are native to eastern Asia; however, some species within the genus have been introduced worldwide and are nowadays present in all continents except Antarctica. To date, there is no consensus regarding the taxonomy within the genus Craspedacusta due to the morphological plasticity of the medusa stages. The species Craspedacusta sowerbii Lankester, 1880 was first recorded in Italy in 1946, and until 2017, sightings of the jellyfish Craspedacusta were reported for 40 water bodies. Here, we shed new light on the presence of the freshwater jellyfishes belonging to the genus Craspedacusta across the Italian peninsula, Sardinia, and Sicily. First, we report 21 new observations of this non-native taxon, of which eighteen refer to medusae sightings, two to environmental DNA sequencing, and one to the finding of polyps. Then, we investigate the molecular diversity of collected Craspedacusta specimens, using a Bayesian analysis of sequences of the mitochondrial gene encoding for Cytochrome c Oxidase Subunit I (mtDNA COI). Our molecular analysis shows the presence of two distinctive genetic lineages: (i) a group that comprises sequences obtained from populations ranging from central to northern Italy; (ii) a group that comprises three populations from northern Italy—i.e., those from the Lake Levico, the Lake Santo of Monte Terlago, and the Lake Endine—and the single known Sicilian population. We also report for the first time a mtDNA COI sequence obtained from a Craspedacusta medusa collected in Spai
parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.
BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data.
RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version.
CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF
GraPE: fast and scalable Graph Processing and Embedding
Graph Representation Learning methods have enabled a wide range of learning
problems to be addressed for data that can be represented in graph form.
Nevertheless, several real world problems in economy, biology, medicine and
other fields raised relevant scaling problems with existing methods and their
software implementation, due to the size of real world graphs characterized by
millions of nodes and billions of edges. We present GraPE, a software resource
for graph processing and random walk based embedding, that can scale with large
and high-degree graphs and significantly speed up-computation. GraPE comprises
specialized data structures, algorithms, and a fast parallel implementation
that displays everal orders of magnitude improvement in empirical space and
time complexity compared to state of the art software resources, with a
corresponding boost in the performance of machine learning methods for edge and
node label prediction and for the unsupervised analysis of graphs.GraPE is
designed to run on laptop and desktop computers, as well as on high performance
computing cluster
GRAPE for fast and scalable graph processing and random-walk-based embedding
Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately
1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third- party libraries, while ready-to-use and modular pipelines permit an easy-to- use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding
Metronomic Oral Vinorelbine: An Alternative Schedule in Elderly and Patients PS2 With Local/Advanced and Metastatic NSCLC Not Oncogene-addicted
The MILES and ELVIS studies showed that vinorelbine is one of the best options for elderly patients with advanced non-small-cell-lung cancer (NSCLC). Oral vinorelbine at standard schedule (60-80 mg/m2/weekly) has good activity in terms of response rates and progression-free survival. In recent years, a metronomic schedule of oral vinorelbine (40-50 mg/m2 three times a week, continuously) has been studied in phase II trials, especially in unfit and elderly patients. In the MOVE trial metronomic oral vinorelbine had a clinical benefit [partial response (PR)+stable disease (SD) >12 weeks] in 58.1% of patients with mild toxicity. On this basis, in 2017 we started a phase II study with metronomic oral vinorelbine in elderly (over 70 years) or unfit [Eastern Cooperative Oncology Group performance score (ECOG-PS) of 2] patients with locally/advanced and metastatic NSCLC. Primary aims were clinical benefit (PR+SD ≥6 months) and toxicity; secondary aims were progression-free survival and overall survival
Volatile lipophilic substances management in case of fatal sniffing.
Death due to inhalation of aliphatic hydrocarbons such as butane and propane is a particularly serious problem worldwide, resulting in several fatal cases of sniffing these volatile substances in order to "get high". Despite the number of cases published, there is not a unique approach to case management of fatal sniffing. In this paper we illustrate the volatile lipophilic substances management in a case of a prisoner died after sniffing a butane-propane gas mixture from prefilled camping stove gas canisters, discussing the comprehensive approach of the crime scene, the autopsy, histology and toxicology. A large set of accurate values of both butane and propane was obtained by gas chromatography-mass spectrometry analyzing the following post-mortem biological samples: peripheral blood, heart blood, vitreous humor, liver, lung, heart, brain/cerebral cortex, fat tissue, kidney, and allowed an in depth discussion about the cause of death. A key role is played by following the proper sampling approach during autopsy
- …