17 research outputs found
The maximum clique enumeration problem: algorithms, applications, and implementations
Background
The maximum clique enumeration (MCE) problem asks that we identify all maximum cliques in a finite, simple graph. MCE is closely related to two other well-known and widely-studied problems: the maximum clique optimization problem, which asks us to determine the size of a largest clique, and the maximal clique enumeration problem, which asks that we compile a listing of all maximal cliques. Naturally, these three problems are View MathML /\u3e-hard, given that they subsume the classic version of the View MathML /\u3e-complete clique decision problem. MCE can be solved in principle with standard enumeration methods due to Bron, Kerbosch, Kose and others. Unfortunately, these techniques are ill-suited to graphs encountered in our applications. We must solve MCE on instances deeply seeded in data mining and computational biology, where high-throughput data capture often creates graphs of extreme size and density. MCE can also be solved in principle using more modern algorithms based in part on vertex cover and the theory of fixed-parameter tractability (FPT). While FPT is an improvement, these algorithms too can fail to scale sufficiently well as the sizes and densities of our datasets grow. Results
An extensive testbed of benchmark graphs are created using publicly available transcriptomic datasets from the Gene Expression Omnibus (GEO). Empirical testing reveals crucial but latent features of such high-throughput biological data. In turn, it is shown that these features distinguish real data from random data intended to reproduce salient topological features. In particular, with real data there tends to be an unusually high degree of maximum clique overlap. Armed with this knowledge, novel decomposition strategies are tuned to the data and coupled with the best FPT MCE implementations. Conclusions
Several algorithmic improvements to MCE are made which progressively decrease the run time on graphs in the testbed. Frequently the final runtime improvement is several orders of magnitude. As a result, instances which were once prohibitively time-consuming to solve are brought into the domain of realistic feasibility
The collaborative cross strains and their founders vary widely in cocaine-induced behavioral sensitization
Cocaine use and overdose deaths attributed to cocaine have increased significantly in the United States in the last 10 years. Despite the prevalence of cocaine use disorder (CUD) and the personal and societal problems it presents, there are currently no approved pharmaceutical treatments. The absence of treatment options is due, in part, to our lack of knowledge about the etiology of CUDs. There is ample evidence that genetics plays a role in increasing CUD risk but thus far, very few risk genes have been identified in human studies. Genetic studies in mice have been extremely useful for identifying genetic loci and genes, but have been limited to very few genetic backgrounds, leaving substantial phenotypic, and genetic diversity unexplored. Herein we report the measurement of cocaine-induced behavioral sensitization using a 19-day protocol that captures baseline locomotor activity, initial locomotor response to an acute exposure to cocaine and locomotor sensitization across 5 exposures to the drug. These behaviors were measured in 51 genetically diverse Collaborative Cross (CC) strains along with their inbred founder strains. The CC was generated by crossing eight genetically diverse inbred strains such that each inbred CC strain has genetic contributions from each of the founder strains. Inbred CC mice are infinitely reproducible and provide a stable, yet diverse genetic platform on which to study the genetic architecture and genetic correlations among phenotypes. We have identified significant differences in cocaine locomotor sensitivity and behavioral sensitization across the panel of CC strains and their founders. We have established relationships among cocaine sensitization behaviors and identified extreme responding strains that can be used in future studies aimed at understanding the genetic, biological, and pharmacological mechanisms that drive addiction-related behaviors. Finally, we have determined that these behaviors exhibit relatively robust heritability making them amenable to future genetic mapping studies to identify addiction risk genes and genetic pathways that can be studied as potential targets for the development of novel therapeutics
Genetic differences and longevity-related phenotypes influence lifespan and lifespan variation in a sex-specific manner in mice
Epidemiological studies of human longevity found two interesting features, robust advantage of female lifespan and consistent reduction of lifespan variation. To help understand the genetic aspects of these phenomena, the current study examined sex differences and variation of longevity using previously published mouse data sets including data on lifespan, age of puberty, and circulating insulin-like growth factor 1 (IGF1) levels in 31 inbred strains, data from colonies of nuclear-receptor-interacting protein 1 (Nrip1) knockout mice, and a congenic strain, B6.C3H-Igf1. Looking at the overall data for all inbred strains, the results show no significant difference in lifespan and lifespan variation between sexes; however, considerable differences were found among and within strains. Across strains, lifespan variations of female and male mice are significantly correlated. Strikingly, between sexes, IGF1 levels correlate with the lifespan variation and maximum lifespan in different directions. Female mice with low IGF1 levels have higher variation and extended maximum lifespan. The opposite is detected in males. Compared to domesticated inbred strains, wild-derived inbred strains have elevated lifespan variation due to increased early deaths in both sexes and extended maximum lifespan in female mice. Intriguingly, the sex differences in survival curves of inbred strains negatively associated with age of female puberty, which is significantly accelerated in domesticated inbred strains compared to wild-derived strains. In conclusion, this study suggests that genetic factors are involved in the regulation of sexual disparities in lifespan and lifespan variation, and dissecting the mouse genome may provide novel insight into the underlying genetic mechanisms.Environmental Biolog
Algorithms and experiments for clique relaxationsâfinding maximum s-plexes
Abstract. We propose new practical algorithms to find degree-relaxed variants of cliques called s-plexes. An s-plex denotes a vertex subset in a graph inducing a subgraph where every vertex has edges to all but at most s vertices in the s-plex. Cliques are 1-plexes. In analogy to the special case of finding maximum-cardinality cliques, finding maximumcardinality s-plexes is NP-hard. Complementing previous work, we develop combinatorial, exact algorithms, which are strongly based on methods from parameterized algorithmics. The experiments with our freely available implementation indicate the competitiveness of our approach, for many real-world graphs outperforming the previously used methods.
A more relaxed model for graph-based data clustering: s-plex editing
Abstract. We introduce the s-Plex Editing problem generalizing the well-studied Cluster Editing problem, both being NP-hard and both being motivated by graph-based data clustering. Instead of transforming a given graph by a minimum number of edge modifications into a disjoint union of cliques (Cluster Editing), the task in the case of s-Plex Editing is now to transform a graph into a disjoint union of so-called s-plexes. Herein, an s-plex denotes a vertex set inducing a (sub)graph where every vertex has edges to all but at most s vertices in the s-plex. Cliques are 1-plexes. The advantage of s-plexes for s â„ 2 is that they allow to model a more relaxed cluster notion (s-plexes instead of cliques), which better reflects inaccuracies of the input data. We develop a provably efficient and effective preprocessing based on data reduction (yielding a so-called problem kernel), a forbidden subgraph characterization of s-plex cluster graphs, and a depth-bounded search tree which is used to find optimal edge modification sets. Altogether, this yields efficient algorithms in case of moderate numbers of edge modifications.
Editing Graphs into Disjoint Unions of Dense Clusters
In the Î -Cluster Editin gproblem, one is given an undirected graph G, a density measure Î , and an integer k â„ 0, and needs to decide whether it is possible to transform G by editing (deleting and inserting) at most k edges into a dense cluster graph. Herein, a dense cluster graph is a graph in which every connected component K = (VK,EK) satisfies Î . The well-studied Cluster Editing problem is a special case of this problem with Î :=âbeing a cliqueâ. In this work, we consider three other density measures that generalize cliques: 1) having at most s missing edges (sdefective cliques), 2) having average degree at least |VK | â s (average-splexes), and 3) having average degree at least ” · (|VK | â1) (”-cliques), where s and ” are a fixedinteger and a fixedrational number, respectively. We first show that the Î -Cluster Editing problem is NP-complete for all three density measures. Then, we study the fixed-parameter tractability of the three clustering problems, showing that the first two problems are fixed-parameter tractable with respect to the parameter (s,k
3D graphical visualization of the genetic architectures underlying complex traits in multiple environments
An approach for generating interactive 3D graphical visualization of the genetic architectures of complex traits in multiple environments is described. 3D graphical visualization is utilized for making improvements on traditional plots in quantitative trait locus (QTL) mapping analysis. Interactive 3D graphical visualization for abstract expression of QTL, epistasis and their environmental interactions for experimental populations was developed in framework of user-friendly software QTLNetwork ( http://ibi.zju.edu.cn/software/ qtlnetwork ). Novel definition of graphical meta system and computation of virtual coordinates are used to achieve explicit but meaningful visualization. Interactive 3D graphical visualization for QTL analysis provides geneticists and breeders a powerful and easy-to-use tool to analyze and publish their research results