523 research outputs found

    Random subgraphs make identification affordable

    Full text link
    An identifying code of a graph is a dominating set which uniquely determines all the vertices by their neighborhood within the code. Whereas graphs with large minimum degree have small domination number, this is not the case for the identifying code number (the size of a smallest identifying code), which indeed is not even a monotone parameter with respect to graph inclusion. We show that every graph GG with nn vertices, maximum degree Δ=ω(1)\Delta=\omega(1) and minimum degree δclogΔ\delta\geq c\log{\Delta}, for some constant c>0c>0, contains a large spanning subgraph which admits an identifying code with size O(nlogΔδ)O\left(\frac{n\log{\Delta}}{\delta}\right). In particular, if δ=Θ(n)\delta=\Theta(n), then GG has a dense spanning subgraph with identifying code O(logn)O\left(\log n\right), namely, of asymptotically optimal size. The subgraph we build is created using a probabilistic approach, and we use an interplay of various random methods to analyze it. Moreover we show that the result is essentially best possible, both in terms of the number of deleted edges and the size of the identifying code

    Local Ranking Problem on the BrowseGraph

    Full text link
    The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8

    Integrated mining of feature spaces for bioinformatics domain discovery

    Get PDF
    One of the major challenges in the field of bioinformatics is the elucidation of protein folding for the functional annotation of proteins. The factors that govern protein folding include the chemical, physical, and environmental conditions of the protein\u27s surroundings, which can be measured and exploited for computational discovery purposes. These conditions enable the protein to transform from a sequence of amino acids to a globular three-dimensional structure. Information concerning the folded state of a protein has significant potential to explain biochemical pathways and their involvement in disorders and diseases. This information impacts the ways in which genetic diseases are characterized and cured and in which designer drugs are created. With the exponential growth of protein databases and the limitations of experimental protein structure determination, sophisticated computational methods have been developed and applied to search for, detect, and compare protein homology. Most computational tools developed for protein structure prediction are primarily based on sequence similarity searches. These approaches have improved the prediction accuracy of high sequence similarity proteins but have failed to perform well with proteins of low sequence similarity. Data mining offers unique algorithmic computational approaches that have been used widely in the development of automatic protein structure classification and prediction. In this dissertation, we present a novel approach for the integration of physico-chemical properties and effective feature extraction techniques for the classification of proteins. Our approaches overcome one of the major obstacles of data mining in protein databases, the encapsulation of different hydrophobicity residue properties into a much reduced feature space that possess high degrees of specificity and sensitivity in protein structure classification. We have developed three unique computational algorithms for coherent feature extraction on selected scale properties of the protein sequence. When plagued by the problem of the unequal cardinality of proteins, our proposed integration scheme effectively handles the varied sizes of proteins and scales well with increasing dimensionality of these sequences. We also detail a two-fold methodology for protein functional annotation. First, we exhibit our success in creating an algorithm that provides a means to integrate multiple physico-chemical properties in the form of a multi-layered abstract feature space, with each layer corresponding to a physico-chemical property. Second, we discuss a wavelet-based segmentation approach that efficiently detects regions of property conservation across all layers of the created feature space. Finally, we present a unique graph-theory based algorithmic framework for the identification of conserved hydrophobic residue interaction patterns using identified scales of hydrophobicity. We report that these discriminatory features are specific to a family of proteins, which consist of conserved hydrophobic residues that are then used for structural classification. We also present our rigorously tested validation schemes, which report significant degrees of accuracy to show that homologous proteins exhibit the conservation of physico-chemical properties along the protein backbone. We conclude our discussion by summarizing our results and contributions and by listing our goals for future research

    生物情報ネットワークのグラフ理論に基づく解析法

    Get PDF
    京都大学新制・課程博士博士(情報学)甲第24730号情博第818号新制||情||138(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授 阿久津 達也, 教授 山本 章博, 教授 岡部 寿男学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Span-core Decomposition for Temporal Networks: Algorithms and Applications

    Full text link
    When analyzing temporal networks, a fundamental task is the identification of dense structures (i.e., groups of vertices that exhibit a large number of links), together with their temporal span (i.e., the period of time for which the high density holds). In this paper we tackle this task by introducing a notion of temporal core decomposition where each core is associated with two quantities, its coreness, which quantifies how densely it is connected, and its span, which is a temporal interval: we call such cores \emph{span-cores}. For a temporal network defined on a discrete temporal domain TT, the total number of time intervals included in TT is quadratic in T|T|, so that the total number of span-cores is potentially quadratic in T|T| as well. Our first main contribution is an algorithm that, by exploiting containment properties among span-cores, computes all the span-cores efficiently. Then, we focus on the problem of finding only the \emph{maximal span-cores}, i.e., span-cores that are not dominated by any other span-core by both their coreness property and their span. We devise a very efficient algorithm that exploits theoretical findings on the maximality condition to directly extract the maximal ones without computing all span-cores. Finally, as a third contribution, we introduce the problem of \emph{temporal community search}, where a set of query vertices is given as input, and the goal is to find a set of densely-connected subgraphs containing the query vertices and covering the whole underlying temporal domain TT. We derive a connection between this problem and the problem of finding (maximal) span-cores. Based on this connection, we show how temporal community search can be solved in polynomial-time via dynamic programming, and how the maximal span-cores can be profitably exploited to significantly speed-up the basic algorithm.Comment: ACM Transactions on Knowledge Discovery from Data (TKDD), 2020. arXiv admin note: substantial text overlap with arXiv:1808.0937

    Systems approaches to drug repositioning

    Get PDF
    PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased biomedical knowledge and evolving approaches to Research and Development (R&D). One complementary approach to drug discovery is that of drug repositioning which focusses on identifying novel uses for existing drugs. By focussing on existing drugs that have already reached the market, drug repositioning has the potential to both reduce the timeframe and cost of getting a disease treatment to those that need it. Many marketed examples of repositioned drugs have been found via serendipitous or rational observations, highlighting the need for more systematic methodologies. Systems approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but require an integrative approach to biological data. Integrated networks can facilitate systems-level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person can identify portions of the graph that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated procedures are required to mine integrated networks systematically for these subgraphs and bring them to the attention of the user. The aim of this project was the development of novel computational methods to identify new therapeutic uses for existing drugs (with particular focus on active small molecules) using data integration. A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning Network Integration Framework (DReNInF) was developed as part of this work. This framework includes a high-level ontology, Drug Repositioning Network Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite of parsers; and a generic semantic graph integration platform. This framework enables the production of integrated networks maintaining strict semantics that are important in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description Framework (RDF) dataset. A Web-based front end was developed, which includes a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this dataset. To automate the mining of drug repositioning datasets, a formal framework for the definition of semantic subgraphs was established and a method for Drug Repositioning Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated. 9,643,061 putative D-T interactions were identified and ranked, with a strong correlation between highly scored associations and those supported by literature observed. The 20 top ranked associations were analysed in more detail with 14 found to be novel and six found to be supported by the literature. It was also shown that this approach better prioritises known D-T interactions, than other state-of-the-art methodologies. The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also investigated. As target-based approaches are utilised heavily in the field of drug discovery, it is necessary to have a systematic method to rank Gene-Disease (G-D) associations. Although methods already exist to collect, integrate and score these associations, these scores are often not a reliable re flection of expert knowledge. Therefore, an integrated data-driven approach to drug repositioning was developed using a Bayesian statistics approach and applied to rank 309,885 G-D associations using existing knowledge. Ranked associations were then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network it was shown that diseases of the central nervous system (CNS) provide an area of interest. The network was then systematically mined for semantic subgraphs that capture novel Dr-D relations. 275,934 Dr-D associations were identified and ranked, with those more likely to be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within the field of drug repositioning. DReNIn, for example, includes data that previous comparable datasets relevant to drug repositioning have neglected, such as clinical trial data and drug indications. Furthermore, the dataset may be easily extended using DReNInF to include future data as and when it becomes available, such as G-D association directionality (i.e. is the mutation a loss-of-function or gain-of-function). Unlike other algorithms and approaches developed for drug repositioning, DReSMin can be used to infer any types of associations captured in the target semantic network. Moreover, the approaches presented here should be more generically applicable to other fields that require algorithms for the integration and mining of semantically rich networks.European and Physical Sciences Research Council (EPSRC) and GS
    corecore