113 research outputs found

    A Gibbs sampling strategy for mining of protein-protein interaction networks and protein structures

    Get PDF
    Complex networks are general and can be used to model phenomena that belongs to different fields of research, from biochemical applications to social networks. However, due to the intrinsic complexity of real networks, their analysis can be computationally demanding. Recently, several statistic and probabilistic analysis approaches have been designed, resulting to be much faster, flexible and effective than deterministic algorithms. Among statistical methods, Gibbs sampling is one of the simplest and most powerful algorithms for solving complex optimization problems and it has been applied in different contexts. It has shown its effectiveness in computational biology but in sequence analysis rather than in network analysis. One approach to analyze complex networks is to compare them, in order to identify similar patterns of interconnections and predict the function or the role of some unknown nodes. Thus, this motivated the main goal of the thesis: designing and implementing novel graph mining techniques based on Gibbs sampling to compare two or more complex networks. The methodology is domain-independent and can work on any complex system of interacting entities with associated attributes. However, in this thesis we focus our attention on protein analysis overcoming the strong current limitations in this area. Proteins can be analyzed from two different points of view: (i) an internal perspective, i.e. the 3D structure of the protein, (ii) an external perspective, i.e. the interactions with other macromolecules. In both cases, a comparative analysis with other proteins of the same or distinct species can reveal important clues for the function of the protein and evolutionary convergences or divergences between different organisms in the way a specific function or process is carried out. First, we present two methods based on Gibbs sampling for the comparative analysis of protein-protein interaction networks: GASOLINE and SPECTRA. GASOLINE is a stochastic and greedy algorithm to find similar groups of interacting proteins in two or more networks. It can align many networks and more quickly than the state-of-the-art methods. SPECTRA is a framework to retrieve and compare networks of proteins that interact with one another in specific healthy or tumor tissues. The aim in this case is to identify changes in protein concentration or protein "behaviour" across different tissues. SPECTRA is an adaptation of GASOLINE for weighted protein-protein interaction networks with gene expressions as node weights. It is the first algorithm proposed for multiple comparison of tissue-specific interaction networks. We also describe a Gibbs sampling based algorithm for 3D protein structure comparison, called PROPOSAL, which finds local structural similarities across two or more protein structures. Experimental results confirm our computational predictions and show that the proposed algorithms are much faster and in most cases more accurate than existing methods

    Establish the expected number of induced motifs on unlabeled graphs through analytical models

    Get PDF
    AbstractComplex networks are usually characterized by the presence of small and recurrent patterns of interactions between nodes, called network motifs. These small modules can help to elucidate the structure and the functioning of complex systems. Assessing the statistical significance of a pattern as a motif in a network G is a time consuming task which entails the computation of the expected number of occurrences of the pattern in an ensemble of random graphs preserving some features of G, such as the degree distribution. Recently, few models have been devised to analytically compute expectations of the number of non-induced occurrences of a motif. Less attention has been payed to the harder analysis of induced motifs. Here, we illustrate an analytical model to derive the mean number of occurrences of an induced motif in an unlabeled network with respect to a random graph model. A comprehensive experimental analysis shows the effectiveness of our approach for the computation of the expected number of induced motifs up to 10 nodes. Finally, the proposed method is helpful when running subgraph counting algorithms to get the number of occurrences of a topology become unfeasible

    TemporalRI: subgraph isomorphism in temporal networks with multiple contacts

    Get PDF
    AbstractTemporal networks are graphs where each edge is associated with a timestamp denoting when two nodes interact. Temporal Subgraph Isomorphism (TSI) aims at retrieving all the subgraphs of a temporal network (called target) matching a smaller temporal network (called query), such that matched target edges appear in the same chronological order of corresponding query edges. Few algorithms have been proposed to solve the TSI problem (or variants of it) and most of them are applicable only to small or specific queries. In this paper we present TemporalRI, a new subgraph isomorphism algorithm for temporal networks with multiple contacts between nodes, which is inspired by RI algorithm. TemporalRI introduces the notion of temporal flows and uses them to filter the search space of candidate nodes for the matching. Our algorithm can handle queries of any size and any topology. Experiments on real networks of different sizes show that TemporalRI is very efficient compared to the state-of-the-art, especially for large queries and targets

    Salinity Reduction of Real Produced Waters via Assisted Reverse Electrodialysis

    Get PDF
    Produced waters (PWs) are waste streams generated during the crude oil extraction processes. The management of these wastewaters is complicated by the large volumes extracted during the oil recovery operations: these depends on the life of the oil-well: typically, 3 barrels of PWs on average are produced for each barrel of oil extracted. After oil separation, PWs are usually re-injected into the well, but this approach is not always possible without a preliminary and suitable treatment. Bioremediation techniques might be a good option, but they fail due to the PWs high salinity, which inhibit bacteria growth and metabolism. Thus, reducing their salinity upstream a bioremediation unit is a matter of crucial importance. To this aim, Assisted Reverse electrodialysis (ARED) along with the use of a dilute stream typically available on site is here proposed as a novel solution. In ARED an additional voltage is applied in the same direction of the salinity gradient through the membranes in order to enhance the passage of ions from the PW to the diluted solution, thus significantly reducing the required membrane area. An experimental campaign was carried out in order to assess the process feasibility. A fixed volume of real PWs was fed to a laboratory scale ARED unit. Each experimental test lasted for three days to reduce the salinity down to about 20 g l-1, a value compatible with the biomass metabolism for a downstream bioremediation step. Two different types of commercial membranes were tested and relevant energy consumptions were calculated. The long-runs performed did not show a significant loss of efficiency due to fouling, thus suggesting that ARED might a suitable technology for a pre-dilution of produced water

    Economic Analysis of an Innovative Scheme for the Treatment of Produced Waters

    Get PDF
    During the crude oil extraction processes, for each barrel of oil turns out an equivalent of 3 barrels of wastewaters on average. These wastes are known as Produced Waters (PWs) and their dramatic impact on the environment has attracted the attention of researchers in order to find an economic and efficient method for their treatment. Dealing with PWs is not easy: the long exposure with oil increases their hydrocarbon fraction, while the contact with the underground wells increases their concentration in salts and minerals. The direct discharge of PWs into the sea is obviously not allowed by law and PWs are usually re-injected into the well. The present work deals with a novel and innovative treatment chain (including assisted reverse electrodialysis (ARED) as dilution step) able to reduce both the salinity and organic content of PWs. The innovative scheme includes an ultrafiltration unit as pre-treatment, upstream an ARED unit for the PW dilution. Once the salinity level has been reduced down to a value affordable for a bioremediation step, PWs are sent to a bio-reactor, where the organic compounds are digested. Finally, a reverse osmosis unit is used to recover water from the treated PWs and to recycle it as diluted stream in the ARED unit. A techno-economic model was purposely developed in the present work to assess the economic feasibility of the proposed scheme. Preliminary results suggest that the treatment costs are lower than 5 € m-3 PW and fully competitive with current PWs treatment technologies

    Electrodialysis with Bipolar Membranes for the Sustainable Production of Chemicals from Seawater Brines at Pilot Plant Scale

    Get PDF
    Environmental concerns regarding the disposal of seawater reverse osmosis brines require the development of new valorization strategies. Electrodialysis with bipolar membrane (EDBM) technology enables the production of acid and base from a salty waste stream. In this study, an EDBM pilot plant with a membrane area of 19.2 m2 was tested. This total membrane area results much larger (i.e., more than 16 times larger) than those reported in the literature so far for the production of HCl and NaOH aqueous solutions, starting from NaCl brines. The pilot unit was tested both in continuous and discontinuous operation modes, at different current densities (200-500 A m-2). Particularly, three different process configurations were evaluated, namely, closed loop, feed and bleed, and fed-batch. At lower applied current density (200 A m-2), the closed-loop had a lower specific energy consumption (SEC) (1.4 kWh kg-1) and a higher current efficiency (CE) (80%). When the current density was increased (300-500 A m-2), the feed and bleed mode was more appropriate due to its low values of SEC (1.9-2.6 kWh kg-1) as well as high values of specific production (SP) (0.82-1.3 ton year-1 m-2) and current efficiency (63-67%). These results showed the effect of various process configurations on the performance of the EDBM, thereby guiding the selection of the most suitable process configuration when varying the operating conditions and representing a first important step toward the implementation of this technology at industrial scale

    APPAGATO: an APproximate PArallel and stochastic GrAph querying TOol for biological networks

    Get PDF
    Motivation: Biological network querying is a problem requiring a considerable computational effort tobe solved. Given a target and a query network, it aims to find occurrences of the query in the target byconsidering topological and node similarities (i.e. mismatches between nodes, edges, or node labels).Querying tools that deal with similarities are crucial in biological network analysis since they providemeaningful results also in case of noisy data. In addition, since the size of available networks increasessteadily, existing algorithms and tools are becoming unsuitable. This is rising new challenges for the designof more efficient and accurate solutions.Results: This paper presents APPAGATO, a stochastic and parallel algorithm to find approximateoccurrences of a query network in biological networks. APPAGATO handles node, edge, and node labelmismatches. Thanks to its randomic and parallel nature, it applies to large networks and, compared toexisting tools, it provides higher performance as well as statistically significant more accurate results.Tests have been performed on protein-protein interaction networks annotated with synthetic and real geneontology terms. Case studies have been done by querying protein complexes among different species andtissue
    corecore