159,247 research outputs found
Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations
We investigate the pertinence of methods from algebraic topology for text
data analysis. These methods enable the development of
mathematically-principled isometric-invariant mappings from a set of vectors to
a document embedding, which is stable with respect to the geometry of the
document in the selected metric space. In this work, we evaluate the utility of
these topology-based document representations in traditional NLP tasks,
specifically document clustering and sentiment classification. We find that the
embeddings do not benefit text analysis. In fact, performance is worse than
simple techniques like , indicating that the geometry of the
document does not provide enough variability for classification on the basis of
topic or sentiment in the chosen datasets.Comment: 5 pages, 3 figures. Rep4NLP workshop at ACL 201
Modeling spatial uncertainties in geospatial data fusion and mining
Geospatial data analysis relies on Spatial Data Fusion and Mining (SDFM), which heavily depend on topology and geometry of spatial objects. Capturing and representing geometric characteristics such as orientation, shape, proximity, similarity, and their measurement are of the highest interest in SDFM. Representation of uncertain and dynamically changing topological structure of spatial objects including social and communication networks, roads and waterways under the influence of noise, obstacles, temporary loss of communication, and other factors. is another challenge. Spatial distribution of the dynamic network is a complex and dynamic mixture of its topology and geometry. Historically, separation of topology and geometry in mathematics was motivated by the need to separate the invariant part of the spatial distribution (topology) from the less invariant part (geometry). The geometric characteristics such as orientation, shape, and proximity are not invariant. This separation between geometry and topology was done under the assumption that the topological structure is certain and does not change over time. New challenges to deal with the dynamic and uncertain topological structure require a reexamination of this fundamental assumption. In the previous work we proposed a dynamic logic methodology for capturing, representing, and recording uncertain and dynamic topology and geometry jointly for spatial data fusion and mining. This work presents a further elaboration and formalization of this methodology as well as its application for modeling vector-to-vector and raster-to-vector conflation/registration problems and automated feature extraction from the imagery
Degenerating families of dendrograms
Dendrograms used in data analysis are ultrametric spaces, hence objects of
nonarchimedean geometry. It is known that there exist -adic representation
of dendrograms. Completed by a point at infinity, they can be viewed as
subtrees of the Bruhat-Tits tree associated to the -adic projective line.
The implications are that certain moduli spaces known in algebraic geometry are
-adic parameter spaces of (families of) dendrograms, and stochastic
classification can also be handled within this framework. At the end, we
calculate the topology of the hidden part of a dendrogram.Comment: 13 pages, 8 figure
Computing Multidimensional Persistence
The theory of multidimensional persistence captures the topology of a
multifiltration -- a multiparameter family of increasing spaces.
Multifiltrations arise naturally in the topological analysis of scientific
data. In this paper, we give a polynomial time algorithm for computing
multidimensional persistence. We recast this computation as a problem within
computational algebraic geometry and utilize algorithms from this area to solve
it. While the resulting problem is Expspace-complete and the standard
algorithms take doubly-exponential time, we exploit the structure inherent
withing multifiltrations to yield practical algorithms. We implement all
algorithms in the paper and provide statistical experiments to demonstrate
their feasibility.Comment: This paper has been withdrawn by the authors. Journal of
Computational Geometry, 1(1) 2010, pages 72-100.
http://jocg.org/index.php/jocg/article/view/1
Quantification of marine sediment properties from planar and volumetric pore geometries
Pore geometry and topology are important determinants of sediment physical properties, such as porosity and permeability. They also influence processes that occur in the sediment, such as acoustic propagation, attenuation, and dispersion, single- and multi-phase fluid flow, and hydrodynamic dispersion. This study uses images to evaluate pore geometry and topology of ooid (subspherical particles) and siliclastic (angular quartz) sand that was collected from the marine environment south of Bimni Bahamas and Ft. Walton Beach, FL, respectively. Image analysis techniques and predictive tools enable insight into the relationships among sediment pore geometry, topology, and physical properties for these differently shaped sands. High frequency acoustics utilize short wavelength signals to evaluate sediments. Correspondingly short length scales are then needed for sedimentary property predictions, which is possible with planar and volumetric image analysis of sand. This data was compared to data obtained by direct large scale measurements (e.g., water weight loss, constant head permeability) were made. Mean porosity differed by as much as 0.04 and mean permeability showed good agreement and differed by a factor of 2. Given that the image analysis predictions were made from much smaller samples (~equivalent to the length scale of the high acoustic frequencies used) than the bulk samples, a sediment characterization at acoustically relevant length scales is possible. It was also demonstrated that for these homogeneous sands (i.e., ooids and quartz) two-dimensional pore geometry and topology are quite similar to three-dimensional pore geometry and topology (i.e., pore connectivity). Additionally it was determined that pore network models typically overestimate the topology and therefore, in order to match image and bulk predictions of sediment properties, these models must underestimate the conductance of individual pore throats (i.e., conductive element in sand). Typically pore throats are depicted as straight cylinders. Image data suggests that pore throats are better represented by biconical shapes where conductance is as much as 3 times higher than conductance within the straight cylinders. These findings indicate that increased realism in pore throat shape (higher conductivity) and in topology (fewer pore throats) may significantly influence network model evaluations of fluid flow or acoustic propagation in marine sand
How to Extract the Geometry and Topology from Very Large 3D Segmentations
Segmentation is often an essential intermediate step in image analysis. A
volume segmentation characterizes the underlying volume image in terms of
geometric information--segments, faces between segments, curves in which
several faces meet--as well as a topology on these objects. Existing algorithms
encode this information in designated data structures, but require that these
data structures fit entirely in Random Access Memory (RAM). Today, 3D images
with several billion voxels are acquired, e.g. in structural neurobiology.
Since these large volumes can no longer be processed with existing methods, we
present a new algorithm which performs geometry and topology extraction with a
runtime linear in the number of voxels and log-linear in the number of faces
and curves. The parallelizable algorithm proceeds in a block-wise fashion and
constructs a consistent representation of the entire volume image on the hard
drive, making the structure of very large volume segmentations accessible to
image analysis. The parallelized C++ source code, free command line tools and
MATLAB mex files are avilable from
http://hci.iwr.uni-heidelberg.de/software.phpComment: C++ source code, free command line tools and MATLAB mex files are
avilable from http://hci.iwr.uni-heidelberg.de/software.ph
- …