546 research outputs found

    Space-efficient Feature Maps for String Alignment Kernels

    Get PDF
    String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We address this need by presenting the first approximation for string alignment kernels, which we call space-efficient feature maps for edit distance with moves (SFMEDM), by leveraging a metric embedding named edit sensitive parsing (ESP) and feature maps (FMs) of random Fourier features (RFFs) for large-scale string analyses. The original FMs for RFFs consume a huge amount of memory proportional to the dimension d of input vectors and the dimension D of output vectors, which prohibits its large-scale applications. We present novel space-efficient feature maps (SFMs) of RFFs for a space reduction from O(dD) of the original FMs to O(d) of SFMs with a theoretical guarantee with respect to concentration bounds. We experimentally test SFMEDM on its ability to learn SVM for large-scale string classifications with various massive string data, and we demonstrate the superior performance of SFMEDM with respect to prediction accuracy, scalability and computation efficiency.Comment: Full version for ICDM'19 pape

    Estimation of Human Body Shape and Posture Under Clothing

    Full text link
    Estimating the body shape and posture of a dressed human subject in motion represented as a sequence of (possibly incomplete) 3D meshes is important for virtual change rooms and security. To solve this problem, statistical shape spaces encoding human body shape and posture variations are commonly used to constrain the search space for the shape estimate. In this work, we propose a novel method that uses a posture-invariant shape space to model body shape variation combined with a skeleton-based deformation to model posture variation. Our method can estimate the body shape and posture of both static scans and motion sequences of dressed human body scans. In case of motion sequences, our method takes advantage of motion cues to solve for a single body shape estimate along with a sequence of posture estimates. We apply our approach to both static scans and motion sequences and demonstrate that using our method, higher fitting accuracy is achieved than when using a variant of the popular SCAPE model as statistical model.Comment: 23 pages, 11 figure

    Assembly of long error-prone reads using de Bruijn graphs

    Get PDF
    The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions

    Scanning sequences after Gibbs sampling to find multiple occurrences of functional elements

    Get PDF
    BACKGROUND: Many DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set. RESULTS: We describe an improvement to the A-GLAM computer program, which predicts regulatory elements within DNA sequences with Gibbs sampling. The improvement adds an optional "scanning step" after Gibbs sampling. Gibbs sampling produces a position specific scoring matrix (PSSM). The new scanning step resembles an iterative PSI-BLAST search based on the PSSM. First, it assigns an "individual score" to each subsequence of appropriate length within the input sequences using the initial PSSM. Second, it computes an E-value from each individual score, to assess the agreement between the corresponding subsequence and the PSSM. Third, it permits subsequences with E-values falling below a threshold to contribute to the underlying PSSM, which is then updated using the Bayesian calculus. A-GLAM iterates its scanning step to convergence, at which point no new subsequences contribute to the PSSM. After convergence, A-GLAM reports predicted regulatory elements within each sequence in order of increasing E-values, so users have a statistical evaluation of the predicted elements in a convenient presentation. Thus, although the Gibbs sampling step in A-GLAM finds at most one regulatory element per input sequence, the scanning step can now rapidly locate further instances of the element in each sequence. CONCLUSION: Datasets from experiments determining the binding sites of transcription factors were used to evaluate the improvement to A-GLAM. Typically, the datasets included several sequences containing multiple instances of a regulatory motif. The improvements to A-GLAM permitted it to predict the multiple instances

    Effective 3D Geometric Matching for Data Restoration and Its Forensic Application

    Get PDF
    3D geometric matching is the technique to detect the similar patterns among multiple objects. It is an important and fundamental problem and can facilitate many tasks in computer graphics and vision, including shape comparison and retrieval, data fusion, scene understanding and object recognition, and data restoration. For example, 3D scans of an object from different angles are matched and stitched together to form the complete geometry. In medical image analysis, the motion of deforming organs is modeled and predicted by matching a series of CT images. This problem is challenging and remains unsolved, especially when the similar patterns are 1) small and lack geometric saliency; 2) incomplete due to the occlusion of the scanning and damage of the data. We study the reliable matching algorithm that can tackle the above difficulties and its application in data restoration. Data restoration is the problem to restore the fragmented or damaged model to its original complete state. It is a new area and has direct applications in many scientific fields such as Forensics and Archeology. In this dissertation, we study novel effective geometric matching algorithms, including curve matching, surface matching, pairwise matching, multi-piece matching and template matching. We demonstrate its applications in an integrated digital pipeline of skull reassembly, skull completion, and facial reconstruction, which is developed to facilitate the state-of-the-art forensic skull/facial reconstruction processing pipeline in law enforcement

    Learning kinematic structure correspondences using multi-order similarities

    Get PDF
    We present a novel framework for finding the kinematic structure correspondences between two articulated objects in videos via hypergraph matching. In contrast to appearance and graph alignment based matching methods, which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos. Thus our method allows matching the structure of objects which have similar topologies or motions, or a combination of the two. Our main contributions are summarised as follows: (i)casting the kinematic structure correspondence problem into a hypergraph matching problem by incorporating multi-order similarities with normalising weights, (ii)introducing a structural topology similarity measure by aggregating topology constrained subgraph isomorphisms, (iii)measuring kinematic correlations between pairwise nodes, and (iv)proposing a combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold. We demonstrate the robustness and accuracy of our method through a number of experiments on synthetic and real data, showing that various other recent and state of the art methods are outperformed. Our method is not limited to a specific application nor sensor, and can be used as building block in applications such as action recognition, human motion retargeting to robots, and articulated object manipulation

    Unsupervised Structural Embedding Methods for Efficient Collective Network Mining

    Full text link
    How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient. In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd
    • …
    corecore