159,247 research outputs found

    Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations

    Full text link
    We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like tf-idf\textit{tf-idf}, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.Comment: 5 pages, 3 figures. Rep4NLP workshop at ACL 201

    Modeling spatial uncertainties in geospatial data fusion and mining

    Get PDF
    Geospatial data analysis relies on Spatial Data Fusion and Mining (SDFM), which heavily depend on topology and geometry of spatial objects. Capturing and representing geometric characteristics such as orientation, shape, proximity, similarity, and their measurement are of the highest interest in SDFM. Representation of uncertain and dynamically changing topological structure of spatial objects including social and communication networks, roads and waterways under the influence of noise, obstacles, temporary loss of communication, and other factors. is another challenge. Spatial distribution of the dynamic network is a complex and dynamic mixture of its topology and geometry. Historically, separation of topology and geometry in mathematics was motivated by the need to separate the invariant part of the spatial distribution (topology) from the less invariant part (geometry). The geometric characteristics such as orientation, shape, and proximity are not invariant. This separation between geometry and topology was done under the assumption that the topological structure is certain and does not change over time. New challenges to deal with the dynamic and uncertain topological structure require a reexamination of this fundamental assumption. In the previous work we proposed a dynamic logic methodology for capturing, representing, and recording uncertain and dynamic topology and geometry jointly for spatial data fusion and mining. This work presents a further elaboration and formalization of this methodology as well as its application for modeling vector-to-vector and raster-to-vector conflation/registration problems and automated feature extraction from the imagery

    Degenerating families of dendrograms

    Full text link
    Dendrograms used in data analysis are ultrametric spaces, hence objects of nonarchimedean geometry. It is known that there exist pp-adic representation of dendrograms. Completed by a point at infinity, they can be viewed as subtrees of the Bruhat-Tits tree associated to the pp-adic projective line. The implications are that certain moduli spaces known in algebraic geometry are pp-adic parameter spaces of (families of) dendrograms, and stochastic classification can also be handled within this framework. At the end, we calculate the topology of the hidden part of a dendrogram.Comment: 13 pages, 8 figure

    Computing Multidimensional Persistence

    Full text link
    The theory of multidimensional persistence captures the topology of a multifiltration -- a multiparameter family of increasing spaces. Multifiltrations arise naturally in the topological analysis of scientific data. In this paper, we give a polynomial time algorithm for computing multidimensional persistence. We recast this computation as a problem within computational algebraic geometry and utilize algorithms from this area to solve it. While the resulting problem is Expspace-complete and the standard algorithms take doubly-exponential time, we exploit the structure inherent withing multifiltrations to yield practical algorithms. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.Comment: This paper has been withdrawn by the authors. Journal of Computational Geometry, 1(1) 2010, pages 72-100. http://jocg.org/index.php/jocg/article/view/1

    Quantification of marine sediment properties from planar and volumetric pore geometries

    Get PDF
    Pore geometry and topology are important determinants of sediment physical properties, such as porosity and permeability. They also influence processes that occur in the sediment, such as acoustic propagation, attenuation, and dispersion, single- and multi-phase fluid flow, and hydrodynamic dispersion. This study uses images to evaluate pore geometry and topology of ooid (subspherical particles) and siliclastic (angular quartz) sand that was collected from the marine environment south of Bimni Bahamas and Ft. Walton Beach, FL, respectively. Image analysis techniques and predictive tools enable insight into the relationships among sediment pore geometry, topology, and physical properties for these differently shaped sands. High frequency acoustics utilize short wavelength signals to evaluate sediments. Correspondingly short length scales are then needed for sedimentary property predictions, which is possible with planar and volumetric image analysis of sand. This data was compared to data obtained by direct large scale measurements (e.g., water weight loss, constant head permeability) were made. Mean porosity differed by as much as 0.04 and mean permeability showed good agreement and differed by a factor of 2. Given that the image analysis predictions were made from much smaller samples (~equivalent to the length scale of the high acoustic frequencies used) than the bulk samples, a sediment characterization at acoustically relevant length scales is possible. It was also demonstrated that for these homogeneous sands (i.e., ooids and quartz) two-dimensional pore geometry and topology are quite similar to three-dimensional pore geometry and topology (i.e., pore connectivity). Additionally it was determined that pore network models typically overestimate the topology and therefore, in order to match image and bulk predictions of sediment properties, these models must underestimate the conductance of individual pore throats (i.e., conductive element in sand). Typically pore throats are depicted as straight cylinders. Image data suggests that pore throats are better represented by biconical shapes where conductance is as much as 3 times higher than conductance within the straight cylinders. These findings indicate that increased realism in pore throat shape (higher conductivity) and in topology (fewer pore throats) may significantly influence network model evaluations of fluid flow or acoustic propagation in marine sand

    How to Extract the Geometry and Topology from Very Large 3D Segmentations

    Full text link
    Segmentation is often an essential intermediate step in image analysis. A volume segmentation characterizes the underlying volume image in terms of geometric information--segments, faces between segments, curves in which several faces meet--as well as a topology on these objects. Existing algorithms encode this information in designated data structures, but require that these data structures fit entirely in Random Access Memory (RAM). Today, 3D images with several billion voxels are acquired, e.g. in structural neurobiology. Since these large volumes can no longer be processed with existing methods, we present a new algorithm which performs geometry and topology extraction with a runtime linear in the number of voxels and log-linear in the number of faces and curves. The parallelizable algorithm proceeds in a block-wise fashion and constructs a consistent representation of the entire volume image on the hard drive, making the structure of very large volume segmentations accessible to image analysis. The parallelized C++ source code, free command line tools and MATLAB mex files are avilable from http://hci.iwr.uni-heidelberg.de/software.phpComment: C++ source code, free command line tools and MATLAB mex files are avilable from http://hci.iwr.uni-heidelberg.de/software.ph
    • …
    corecore