23 research outputs found
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification
A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
Background: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. Results: We propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. Conclusions: By parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes
Multi-species protein function prediction: towards Web-based visual analytics
The visualization and analysis of big bio-molecular networks is a key feature for the investigation and prediction of protein functions in a multi-species context. In this paper we present the design of a system that integrates data management, machine learning and visualization facilities to make effective the visual analysis of big networks by means of webbased interfaces
Managing intellectual property in a music fruition environment
With the advent of the digital age and the spread of portable digital audio players, interest in software and hardware tools that can help producers and distributors enhance and revive their catalogue of music has progressively increased. One of the main concerns of major labels is how to prevent file sharing. An innovative approach that couples reviving catalogues with support for rights management could provide an experience of multimedia content in which users select multiple media streams on the fly in a fully synchronized environment. Because this kind of user experience can't be reconstructed from the single original streams, illegal copying would be intrinsically discouraged. In this article, the authors propose an approach to encode contents and build advanced multimodal interfaces that protect intellectual property. As a case study, they use the IEEE 1599--an international standard for music description
Authorised access web portal for Italian data bank on sudden unexpected perinatal and infant death
Sudden infant death syndrome (SIDS) is common during the first year of life and affects 0.40 every 1,000 births. Stillbirths are seven times more common than SIDS; in 40\u201380% of cases remain unexplained and are categorised as sudden intrauterine unexpected death syndrome. In 2006 Italy passed legislation that fetuses, and infants, from 25 weeks of gestation to one postnatal year, that died suddenly and unexpectedly should be sent to the University of Milan, Italy, for an in-depth diagnostic post-mortem with parental permission. The \u201cLino Rossi\u201d Research Center is currently developing the technical specifications for a web portal (http://users.unimi.it/centrolinorossi) for its national data bank registry, which has been set up to centralise records retrieved from regions across Italy. This will record all post-mortem findings, together with clinical information about the pregnancy, fetal development, delivery, environmental conditions and the family situation when the death occurred. The privacy and confidentiality of the data are ensured, in accordance with European legislation
Data integration issues and opportunities in biological XML data management
There is a proliferation of research and industrial organizations that produce sources of huge amounts of biological data issuing from experimentation with biological systems. In order to make these heterogeneous data sources easy to use, several efforts at data integration are currently being undertaken based mainly on XML. Starting from a discussion of the main biological data types and system interactions that need to be represented, the authors deal with the main approaches proposed for their modelling through XML. Then, they show the current efforts in biological data integration and how an increasing amount of Semantic information is required in terms of vocabulary control and ontologies. Finally, future research directions in biological data integration are discussed
Multi-resolution visualization and analysis of biomolecular networks through hierarchical community detection and web-based graphical tools
The visual exploration and analysis of biomolecular networks is of paramount importance for identifying hidden and complex interaction patterns among proteins. Although many tools have been proposed for this task, they are mainly focused on the query and visualization of a single protein with its neighborhood. The global exploration of the entire network and the interpretation of its underlying structure still remains difficult, mainly due to the excessively large size of the biomolecular networks. In this paper we propose a novel multi-resolution representation and exploration approach that exploits hierarchical community detection algorithms for the identification of communities occurring in biomolecular networks. The proposed graphical rendering combines two types of nodes (protein and communities) and three types of edges (protein-protein, community-community, protein-community), and displays communities at different resolutions, allowing the user to interactively zoom in and out from different levels of the hierarchy. Links among communities are shown in terms of relationships and functional correlations among the biomolecules they contain. This form of navigation can be also combined by the user with a vertex centric visualization for identifying the communities holding a target biomolecule. Since communities gather limited-size groups of correlated proteins, the visualization and exploration of complex and large networks becomes feasible on off-the-shelf computer machines. The proposed graphical exploration strategies have been implemented and integrated in UNIPred-Web, a web application that we recently introduced for combining the UNIPred algorithm, able to address both integration and protein function prediction in an imbalance-aware fashion, with an easy to use vertex-centric exploration of the integrated network. The tool has been deeply amended from different standpoints, including the prediction core algorithm. Several tests on networks of different size and connectivity have been conducted to show off the vast potential of our methodology; moreover, enrichment analyses have been performed to assess the biological meaningfulness of detected communities. Finally, a CoV-human network has been embedded in the system, and a corresponding case study presented, including the visualization and the prediction of human host proteins that potentially interact with SARS-CoV2 proteins