544 research outputs found

    A comparative analysis of parallel disk-based Methods for enumerating implicit graphs

    Full text link

    Indexing query graphs to speedup graph query processing

    Get PDF
    Subgraph/supergraph queries although central to graph analytics, are costly as they entail the NP-Complete problem of subgraph isomorphism. We present a fresh solution, the novel principle of which is to acquire and utilize knowledge from the results of previously executed queries. Our approach, iGQ, encompasses two component subindexes to identify if a new query is a subgraph/supergraph of previously executed queries and stores related key information. iGQ comes with novel query processing and index space management algorithms, including graph replacement policies. The end result is a system that leads to significant reduction in the number of required subgraph isomorphism tests and speedups in query processing time. iGQ can be incorporated into any sub/supergraph query processing method and help improve performance. In fact, it is the only contribution that can speedup significantly both subgraph and supergraph query processing. We establish the principles of iGQ and formally prove its correctness. We have implemented iGQ and have incorporated it within three popular recent state of the art index-based graph query processing solutions. We evaluated its performance using real-world and synthetic graph datasets with different characteristics, and a number of query workloads, showcasing its benefits

    De Novo Protein Structure Modeling from Cryoem Data Through a Dynamic Programming Algorithm in the Secondary Structure Topology Graph

    Get PDF
    Proteins are the molecules carry out the vital functions and make more than the half of dry weight in every cell. Protein in nature folds into a unique and energetically favorable 3-Dimensional (3-D) structure which is critical and unique to its biological function. In contrast to other methods for protein structure determination, Electron Cryorricroscopy (CryoEM) is able to produce volumetric maps of proteins that are poorly soluble, large and hard to crystallize. Furthermore, it studies the proteins in their native environment. Unfortunately, the volumetric maps generated by current advances in CryoEM technique produces protein maps at medium resolution about (~5 to 10Å) in which it is hard to determine the atomic-structure of the protein. However, the resolution of the volumetric maps is improving steadily, and recent works could obtain atomic models at higher resolutions (~3Å). De novo protein modeling is the process of building the structure of the protein using its CryoEM volumetric map. Thereupon, the volumetric maps at medium resolution generated by CryoEM technique proposed a new challenge. At the medium resolution, the location and orientation of secondary structure elements (SSE) can be visually and computationally identified. However, the order and direction (called protein topology) of the SSEs detected from the CryoEM volumetric map are not visible. In order to determine the protein structure, the topology of the SSEs has to be figured out and then the backbone can be built. Consequently, the topology problem has become a bottle neck for protein modeling using CryoEM In this dissertation, we focus to establish an effective computational framework to derive the atomic structure of a protein from the medium resolution CryoEM volumetric maps. This framework includes a topology graph component to rank effectively the topologies of the SSEs and a model building component. In order to generate the small subset of candidate topologies, the problem is translated into a layered graph representation. We developed a dynamic programming algorithm (TopoDP) for the new representation to overcome the problem of large search space. Our approach shows the improved accuracy, speed and memory use when compared with existing methods. However, the generating of such set was infeasible using a brute force method. Therefore, the topology graph component effectively reduces the topological space using the geometrical features of the secondary structures through a constrained K-shortest paths method in our layered graph. The model building component involves the bending of a helix and the loop construction using skeleton of the volumetric map. The forward-backward CCD is applied to bend the helices and model the loops

    Finding The Lazy Programmer's Bugs

    Get PDF
    Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps biased by what they believe to be the current boundary conditions of the function being tested. Or at least, they were supposed to. A major step forward was the development of property testing. Property testing requires the user to write a few functional properties that are used to generate tests, and requires an external library or tool to create test data for the tests. As such many thousands of tests can be created for a single property. For the purely functional programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck and Lazy SmallCheck [RNL08]. Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may silently insert runtime exceptions for incomplete pattern matches. We attempt to automate the testing process using these implicit tests. Our contributions are in four main areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed to generate test data without requiring additional programmer work or annotations. (2) To combine the constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type of test data at its most general, in order to prevent committing too early to monomorphic types that cause needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make our test data generation algorithm more expressive. In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high coverage test suites and detect common programming errors in the process

    Graphlet based network analysis

    Get PDF
    The majority of the existing works on network analysis, study properties that are related to the global topology of a network. Examples of such properties include diameter, power-law exponent, and spectra of graph Laplacians. Such works enhance our understanding of real-life networks, or enable us to generate synthetic graphs with real-life graph properties. However, many of the existing problems on networks require the study of local topological structures of a network. Graphlets which are induced small subgraphs capture the local topological structure of a network effectively. They are becoming increasingly popular for characterizing large networks in recent years. Graphlet based network analysis can vary based on the types of topological structures considered and the kinds of analysis tasks. For example, one of the most popular and early graphlet analyses is based on triples (triangles or paths of length two). Graphlet analysis based on cycles and cliques are also explored in several recent works. Another more comprehensive class of graphlet analysis methods works with graphlets of specific sizes—graphlets with three, four or five nodes ({3, 4, 5}-Graphlets) are particularly popular. For all the above analysis tasks, excessive computational cost is a major challenge, which becomes severe for analyzing large networks with millions of vertices. To overcome this challenge, effective methodologies are in urgent need. Furthermore, the existence of efficient methods for graphlet analysis will encourage more works broadening the scope of graphlet analysis. For graphlet counting, we propose edge iteration based methods (ExactTC and ExactGC) for efficiently computing triple and graphlet counts. The proposed methods compute local graphlet statistics in the neighborhood of each edge in the network and then aggregate the local statistics to give the global characterization (transitivity, graphlet frequency distribution (GFD), etc) of the network. Scalability of the proposed methods is further improved by iterating over a sampled set of edges and estimating the triangle count (ApproxTC) and graphlet count (Graft) by approximate rescaling of the aggregated statistics. The independence of local feature vector construction corresponding to each edge makes the methods embarrassingly parallelizable. We show this by giving a parallel edge iteration method ParApproxTC for triangle counting. For graphlet sampling, we propose Markov Chain Monte Carlo (MCMC) sampling based methods for triple and graphlet analysis. Proposed triple analysis methods, Vertex-MCMC and Triple-MCMC, estimate triangle count and network transitivity. Vertex-MCMC samples triples in two steps. First, the method selects a node (using the MCMC method) with probability proportional to the number of triples of which the node is a center. Then Vertex-MCMC samples uniformly from the triples centered by the selected node. The method Triple-MCMC samples triples by performing a MCMC walk in a triple sample space. Triple sample space consists of all the possible triples in a network. MCMC method performs triple sampling by walking form one triple to one of its neighboring triples in the triple space. We design the triple space in such a way that two triples are neighbors only if they share exactly two nodes. The proposed triple sampling algorithms Vertex-MCMC and Triple-MCMC are able to sample triples from any arbitrary distribution, as long as the weight of each triple is locally computable. The proposed methods are able to sample triples without the knowledge of the complete network structure. Information regarding only the local neighborhood structure of currently observed node or triple are enough to walk to the next node or triple. This gives the proposed methods a significant advantage: the capability to sample triples from networks that have restricted access, on which a direct sampling based method is simply not applicable. The proposed methods are also suitable for dynamic and large networks. Similar to the concept of Triple-MCMC, we propose Guise for sampling graphlets of sizes three, four and five ({3, 4, 5}-Graphlets). Guise samples graphlets, by performing a MCMC walk on a graphlet sample space, containing all the graphlets of sizes three, four and five in the network. Despite the proven utility of graphlets in static network analysis, works harnessing the ability of graphlets for dynamic network analysis are yet to come. Dynamic networks contain additional time information for their edges. With time, the topological structure of a dynamic network changes—edges can appear, disappear and reappear over time. In this direction, predicting the link state of a network at a future time, given a collection of link states at earlier times, is an important task with many real-life applications. In the existing literature, this task is known as link prediction in dynamic networks. Performing this task is more difficult than its counterpart in static networks because an effective feature representation of node-pair instances for the case of a dynamic network is hard to obtain. We design a novel graphlet transition based feature embedding for node-pair instances of a dynamic network. Our proposed method GraTFEL, uses automatic feature learning methodologies on such graphlet transition based features to give a low-dimensional feature embedding of unlabeled node-pair instances. The feature learning task is modeled as an optimal coding task where the objective is to minimize the reconstruction error. GraTFEL solves this optimization task by using a gradient descent method. We validate the effectiveness of the learned optimal feature embedding by utilizing it for link prediction in real-life dynamic networks. Specifically, we show that GraTFEL, which uses the extracted feature embedding of graphlet transition events, outperforms existing methods that use well-known link prediction features

    How many can you see at a glance? The role of attention in visual enumeration

    Get PDF
    There is considerable controversy as to how the brain extracts numerosity information from a visual scene and as to how much attention is needed for this process. Traditionally, it has been assumed that visual enumeration is subserved by two functionally distinct mechanisms: the fast and accurate apprehension of 1 to about 4 items, a process termed "subitizing", and the slow and error-prone enumeration of larger numerosities referred to as "counting". Further to a functional dichotomy between these two mechanisms, an attentional dichotomy has been proposed. Subitizing has been thought of as a pre-attentive and parallel process, whereas counting is supposed to require serial attention. In this work, the hypothesis of a parallel and pre-attentive subitizing mechanism was tested. To this aim, the amount of attention that could be allocated to an enumeration task was experimentally manipulated. In Experiment 1, attentional set was manipulated such that attention could either be drawn to the relevant of two subsets to enumerate or had to be distributed to both subsets. Furthermore, the relationship of enumeration to perceptual grouping and item discrimination was explored. In Experiment 2, a dual-task approach was employed in which the amount of attentional resources available to enumeration was systematically modulated by imposing an additional task and by varying its attentional load. Experiment 3 investigated the neural correlates of visual enumeration under attentional load using functional magnetic resonance imaging (fMRI). Results indicated that (1) enumeration, particularly subitizing, was clearly compromised under conditions of distributed or reduced attention. (2) Both the enumeration of small and large numerosities was a�ffected by such attentional manipulations. (3) Subitizing selectively activated brain areas associated with stimulus-driven attention. (4) Enumeration is contingent on other potentially attention-demanding visual processes such as perceptual grouping. The evidence presented here seriously challenges the traditionally held claim of a parallel and preattentive subitizing mechanism and suggests instead that small numerosity judgement requires visual attention. This weakens the argument of an attentional as well as a functional dichotomy and strengthens the idea that enumeration may be subserved by a single, continuous mechanism

    Algorithmic and technical improvements: Optimal solutions to the (Generalized) Multi-Weber Problem

    Get PDF
    Rosing has recently demonstrated a new method for obtaining optimal solutions to the (Generalized) Multi-Weber Problem and proved the optimality of the results. The method develops all convex hulls and then covers the destinations with disjoint convex hulls. This paper seeks to improve implementation of the algorithm to make such solutions economically attractive. Four areas are considered: sharper decision rules to eliminate unnecessary searching, bit pattern matching as a method of recording a history and eliminating duplication, vector intrinsic functions to speed up comparisons, and profiling a program to maximize operating efficiency. Computational experience is also presented
    corecore