160 research outputs found

    Clique-based data mining for related genes in a biomedical database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph.</p> <p>Results</p> <p>We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes.</p> <p>Conclusion</p> <p>We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.</p

    Enumeration of condition-dependent dense modules in protein interaction networks

    Get PDF
    Motivation: Modern systems biology aims at understanding how the different molecular components of a biological cell interact. Often, cellular functions are performed by complexes consisting of many different proteins. The composition of these complexes may change according to the cellular environment, and one protein may be involved in several different processes. The automatic discovery of functional complexes from protein interaction data is challenging. While previous approaches use approximations to extract dense modules, our approach exactly solves the problem of dense module enumeration. Furthermore, constraints from additional information sources such as gene expression and phenotype data can be integrated, so we can systematically mine for dense modules with interesting profiles

    The bipartite clique: A topological paradigm for Web user search customization and Web site restructuring

    Get PDF
    The objective of this dissertation research is to aid the Web user to achieve his search objective at a host Web site by organizing a strongly connected neighborhood of Web pages that are thematically and spatially related to the user\u27s search interest. Therefore, methods were developed to (1) find all Web pages at a given Web site that are thematically similar to a user\u27s initial choice of a Web page (selected from the set of Web pages returned in response to a query by any popular search engine), and (2) organize these pages hierarchically in terms of their relevance to the user\u27s initial Web page request. This selection and organization of pages is dynamically adjusted in order to make these methods responsive to the user\u27s choice of pages defining his search agenda. The methods developed in this work skillfully incorporate the production of the bipartite clique graph structure to simulate both spatial and thematic relatedness of Web pages. By ranking the user\u27s initial page choice as the most relevant page, the authority page, link analysis is used to identify a set of pages with out-links to this authority page and assemble these into a hub of relevant pages. The authority set (initially containing only the user\u27s initial page choice) is then expanded to include other pages with in-links from the set of hub pages. The authority-hub relationship signified by Web page links is used to define the two partite sets of the biclique graph. The partite set of authority pages contains the user\u27s initial page choice and other thematically and spatially similar pages. The partite set of hub pages contains pages whose out-links to the authority pages serve as validation of their thematic relevance to the user\u27s search objective. Two maximal biclique neighborhoods of Web pages specific to the user\u27s interest, containing eight and five pages respectively, were successfully extracted from Web server access logs containing 47,635 entries and 1,140 distinct request pages. The iterative use of these methods in association with three Web page metrics introduced in this research facilitated extending a neighborhood dynamically to include nine additional relevant pages

    Search Rank Fraud Prevention in Online Systems

    Get PDF
    The survival of products in online services such as Google Play, Yelp, Facebook and Amazon, is contingent on their search rank. This, along with the social impact of such services, has also turned them into a lucrative medium for fraudulently influencing public opinion. Motivated by the need to aggressively promote products, communities that specialize in social network fraud (e.g., fake opinions and reviews, likes, followers, app installs) have emerged, to create a black market for fraudulent search optimization. Fraudulent product developers exploit these communities to hire teams of workers willing and able to commit fraud collectively, emulating realistic, spontaneous activities from unrelated people. We call this behavior “search rank fraud”. In this dissertation, we argue that fraud needs to be proactively discouraged and prevented, instead of only reactively detected and filtered. We introduce two novel approaches to discourage search rank fraud in online systems. First, we detect fraud in real-time, when it is posted, and impose resource consuming penalties on the devices that post activities. We introduce and leverage several novel concepts that include (i) stateless, verifiable computational puzzles that impose minimal performance overhead, but enable the efficient verification of their authenticity, (ii) a real-time, graph based solution to assign fraud scores to user activities, and (iii) mechanisms to dynamically adjust puzzle difficulty levels based on fraud scores and the computational capabilities of devices. In a second approach, we introduce the problem of fraud de-anonymization: reveal the crowdsourcing site accounts of the people who post large amounts of fraud, thus their bank accounts, and provide compelling evidence of fraud to the users of products that they promote. We investigate the ability of our solutions to ensure that fraud does not pay off

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    Logic learning and optimized drawing: two hard combinatorial problems

    Get PDF
    Nowadays, information extraction from large datasets is a recurring operation in countless fields of applications. The purpose leading this thesis is to ideally follow the data flow along its journey, describing some hard combinatorial problems that arise from two key processes, one consecutive to the other: information extraction and representation. The approaches here considered will focus mainly on metaheuristic algorithms, to address the need for fast and effective optimization methods. The problems studied include data extraction instances, as Supervised Learning in Logic Domains and the Max Cut-Clique Problem, as well as two different Graph Drawing Problems. Moreover, stemming from these main topics, other additional themes will be discussed, namely two different approaches to handle Information Variability in Combinatorial Optimization Problems (COPs), and Topology Optimization of lightweight concrete structures

    Mobile Search Engine using Clustering and Query Expansion

    Get PDF
    Internet content is growing exponentially and searching for useful content is a tedious task that we all deal with today. Mobile phones lack of screen space and limited interaction methods makes traditional search engine interface very inefficient. As the use of mobile internet continues to grow there is a need for an effective search tool. I have created a mobile search engine that uses clustering and query expansion to find relevant web pages efficiently. Clustering organizes web pages into groups that reflect different components of a query topic. Users can ignore clusters that they find irrelevant so they are not forced to sift through a long list of off-topic web pages. Query expansion uses query results, dictionaries, and cluster labels to formulate additional terms to manipulate the original query. The new manipulated query gives a more in depth result that eliminates noise. I believe that these two techniques are effective and can be combined to make the ultimate mobile search engine
    corecore