400,604 research outputs found
EarthâObservation Data Access: A Knowledge Discovery Concept for Payload Ground Segments
In recent years the ability to store large quantities of Earth Observation (EO) satellite images has greatly surpassed the ability to access and meaningfully extract information from it. The state-of-the-art of operational systems for Remote Sensing data access (in particular for images) allows queries by geographical location, time of acquisition or type of sensor. Nevertheless, this information is often less relevant than the content of the scene (e.g. specific scattering properties, structures, objects, etc.). Moreover, the continuous increase in the size of the archives and in the variety and complexity of EO sensors require new methodologies and tools - based on a shared knowledge - for information mining and management, in support of emerging applications (e.g.: change detection, global monitoring, disaster and risk management, image time series, etc.). In addition, the current Payload Ground Segments (PGS) are mainly designed for Long Term Data Preservation (LTDP), in this article we propose an alternative solution for enhancing the access to the data content. Our solution presents a knowledge discovery concept, whose intention is to implement a communication channel between the PGS (EO data sources) and the end-user who receives the content of the data sources coded in an understandable format associated with semantics and ready for the exploitation. The first implemented concepts were presented in Knowledge driven content based Image Information Mining (KIM) and Geospatial Information Retrieval and Indexing (GeoIRIS) system as examples of data mining systems. Our new concept is developed in a modular system composed of the following components 1) the data model generation implementing methods for extracting relevant descriptors (low-level features) of the sources (EO images), analyzing their metadata in order to complement the information, and combining with vector data sources coming from Geographical Information Systems. 2) A database management system, where the database structure supports the knowledge management, feature computation, and visualization tools because of the modules for analysis, indexing, training and retrieval are resolved into the database. 3) Data mining and knowledge discovery tools allowing the end-user to perform advanced queries and to assign semantic annotations to the image content. The low-level features are complemented with semantic annotations giving meaning to the image information. The semantic description is based on semi-supervised learning methods for spatio-temporal and contextual pattern discovery. 4) Scene understanding counting on annotation tools for helping the user to create scenarios using EO images as for example change detection analysis, etc. 5) Visual data mining providing Human-Machine Interfaces for navigating and browsing the archive using 2D or 3D representation. The visualization techniques perform an interactive loop in order to optimize the visual interaction with huge volumes of data of heterogeneous nature and the end-user
Compressing and Performing Algorithms on Massively Large Networks
Networks are represented as a set of nodes (vertices) and the arcs (links) connecting them. Such networks can model various real-world structures such as social networks (e.g., Facebook), information networks (e.g., citation networks), technological networks (e.g., the Internet), and biological networks (e.g., gene-phenotype network). Analysis of such structures is a heavily studied area with many applications. However, in this era of big data, we find ourselves with networks so massive that the space requirements inhibit network analysis.
Since many of these networks have nodes and arcs on the order of billions to trillions, even basic data structures such as adjacency lists could cost petabytes to zettabytes of storage. Storing these networks in secondary memory would require I/O access (i.e., disk access) during analysis, thus drastically slowing analysis time. To perform analysis efficiently on such extensive data, we either need enough main memory for the data structures and algorithms, or we need to develop compressions that require much less space while still being able to answer queries efficiently.
In this dissertation, we develop several compression techniques that succinctly represent these real-world networks while still being able to efficiently query the network (e.g., check if an arc exists between two nodes). Furthermore, since many of these networks continue to grow over time, our compression techniques also support the ability to add and remove nodes and edges directly on the compressed structure. We also provide a way to compress the data quickly without any intermediate structure, thus giving minimal memory overhead. We provide detailed analysis and prove that our compression is indeed succinct (i.e., achieves the information-theoretic lower bound). Also, we empirically show that our compression rates outperform or are equal to existing compression algorithms on many benchmark datasets.
We also extend our technique to time-evolving networks. That is, we store the entire state of the network at each time frame. Studying time-evolving networks allows us to find patterns throughout the time that would not be available in regular, static network analysis. A succinct representation for time-evolving networks is arguably more important than static graphs, due to the extra dimension inflating the space requirements of basic data structures even more. Again, we manage to achieve succinctness while also providing fast encoding, minimal memory overhead during encoding, fast queries, and fast, direct modification. We also compare against several benchmarks and empirically show that we achieve compression rates better than or equal to the best performing benchmark for each dataset.
Finally, we also develop both static and time-evolving algorithms that run directly on our compressed structures. Using our static graph compression combined with our differential technique, we find that we can speed up matrix-vector multiplication by reusing previously computed products. We compare our results against a similar technique using the Webgraph Framework, and we see that not only are our base query speeds faster, but we also gain a more significant speed-up from reusing products. Then, we use our time-evolving compression to solve the earliest arrival paths problem and time-evolving transitive closure. We found that not only were we the first to run such algorithms directly on compressed data, but that our technique was particularly efficient at doing so
Managing polyglot systems metadata with hypergraphs
A single type of data store can hardly fulfill every end-user requirements in the NoSQL world. Therefore, polyglot systems use different types of NoSQL datastores in combination. However, the heterogeneity of the data storage models makes managing the metadata a complex task in such systems, with only a handful of research carried out to address this. In this paper, we propose a hypergraph-based approach for representing the catalog of metadata in a polyglot system. Taking an existing common programming interface to NoSQL systems, we extend and formalize it as hypergraphs for managing metadata. Then, we define design constraints and query transformation rules for three representative data store types. Furthermore, we propose a simple query rewriting algorithm using the catalog itself for these data store types and provide a prototype implementation. Finally, we show the feasibility of our approach on a use case of an existing polyglot system.Peer ReviewedPostprint (author's final draft
Recommended from our members
Some shortcomings of long-term working memory
Within the framework of their long-term working memory theory, Ericsson and Kintsch (1995) propose that experts rapidly store information in long-term memory through two mechanisms: elaboration of long-term memory patterns and schemas and use of retrieval structures. They use chess playersâ memory as one of their most compelling sources of empirical evidence. In this paper, I show that evidence from chess memory, far from supporting their theory, limits its generality. Evidence from other domains reviewed by Ericsson and Kintsch, such as medical expertise, is not as strong as claimed, and sometimes contradicts the theory outright. I argue that Ericsson and Kintschâs concept of retrieval structure conflates three different types of memory structures that possess quite different properties. One of these types of structuresâgeneric, general-purpose retrieval structuresâhas a narrower use than proposed by Ericsson and Kintsch: it applies only in domains where there is a conscious, deliberate intent by individuals to improve their memory. Other mechanisms, including specific retrieval structures, exist that permit a rapid encoding into long-term memory under other circumstances
Selling fashion: realizing the research potential of the House of Fraser archive, University of Glasgow Archive Services
The House of Fraser archive is a rich resource for the study of the development of fashion retailing in Britain since the mid-nineteenth century. It is, however, underexploited by textile, fashion and retail historians. During the summer of 2009, the University of Glasgow archive services will complete an Arts and Humanities Research Council-funded project which seeks to improve the accessibility of the Archive. Adopting a progressive approach to archival description, the project is developing an innovative online catalogue, providing fuller access to information about the Archive and the resources contained within it
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
- âŠ