400,604 research outputs found

    Earth‐Observation Data Access: A Knowledge Discovery Concept for Payload Ground Segments

    Get PDF
    In recent years the ability to store large quantities of Earth Observation (EO) satellite images has greatly surpassed the ability to access and meaningfully extract information from it. The state-of-the-art of operational systems for Remote Sensing data access (in particular for images) allows queries by geographical location, time of acquisition or type of sensor. Nevertheless, this information is often less relevant than the content of the scene (e.g. specific scattering properties, structures, objects, etc.). Moreover, the continuous increase in the size of the archives and in the variety and complexity of EO sensors require new methodologies and tools - based on a shared knowledge - for information mining and management, in support of emerging applications (e.g.: change detection, global monitoring, disaster and risk management, image time series, etc.). In addition, the current Payload Ground Segments (PGS) are mainly designed for Long Term Data Preservation (LTDP), in this article we propose an alternative solution for enhancing the access to the data content. Our solution presents a knowledge discovery concept, whose intention is to implement a communication channel between the PGS (EO data sources) and the end-user who receives the content of the data sources coded in an understandable format associated with semantics and ready for the exploitation. The first implemented concepts were presented in Knowledge driven content based Image Information Mining (KIM) and Geospatial Information Retrieval and Indexing (GeoIRIS) system as examples of data mining systems. Our new concept is developed in a modular system composed of the following components 1) the data model generation implementing methods for extracting relevant descriptors (low-level features) of the sources (EO images), analyzing their metadata in order to complement the information, and combining with vector data sources coming from Geographical Information Systems. 2) A database management system, where the database structure supports the knowledge management, feature computation, and visualization tools because of the modules for analysis, indexing, training and retrieval are resolved into the database. 3) Data mining and knowledge discovery tools allowing the end-user to perform advanced queries and to assign semantic annotations to the image content. The low-level features are complemented with semantic annotations giving meaning to the image information. The semantic description is based on semi-supervised learning methods for spatio-temporal and contextual pattern discovery. 4) Scene understanding counting on annotation tools for helping the user to create scenarios using EO images as for example change detection analysis, etc. 5) Visual data mining providing Human-Machine Interfaces for navigating and browsing the archive using 2D or 3D representation. The visualization techniques perform an interactive loop in order to optimize the visual interaction with huge volumes of data of heterogeneous nature and the end-user

    Compressing and Performing Algorithms on Massively Large Networks

    Get PDF
    Networks are represented as a set of nodes (vertices) and the arcs (links) connecting them. Such networks can model various real-world structures such as social networks (e.g., Facebook), information networks (e.g., citation networks), technological networks (e.g., the Internet), and biological networks (e.g., gene-phenotype network). Analysis of such structures is a heavily studied area with many applications. However, in this era of big data, we find ourselves with networks so massive that the space requirements inhibit network analysis. Since many of these networks have nodes and arcs on the order of billions to trillions, even basic data structures such as adjacency lists could cost petabytes to zettabytes of storage. Storing these networks in secondary memory would require I/O access (i.e., disk access) during analysis, thus drastically slowing analysis time. To perform analysis efficiently on such extensive data, we either need enough main memory for the data structures and algorithms, or we need to develop compressions that require much less space while still being able to answer queries efficiently. In this dissertation, we develop several compression techniques that succinctly represent these real-world networks while still being able to efficiently query the network (e.g., check if an arc exists between two nodes). Furthermore, since many of these networks continue to grow over time, our compression techniques also support the ability to add and remove nodes and edges directly on the compressed structure. We also provide a way to compress the data quickly without any intermediate structure, thus giving minimal memory overhead. We provide detailed analysis and prove that our compression is indeed succinct (i.e., achieves the information-theoretic lower bound). Also, we empirically show that our compression rates outperform or are equal to existing compression algorithms on many benchmark datasets. We also extend our technique to time-evolving networks. That is, we store the entire state of the network at each time frame. Studying time-evolving networks allows us to find patterns throughout the time that would not be available in regular, static network analysis. A succinct representation for time-evolving networks is arguably more important than static graphs, due to the extra dimension inflating the space requirements of basic data structures even more. Again, we manage to achieve succinctness while also providing fast encoding, minimal memory overhead during encoding, fast queries, and fast, direct modification. We also compare against several benchmarks and empirically show that we achieve compression rates better than or equal to the best performing benchmark for each dataset. Finally, we also develop both static and time-evolving algorithms that run directly on our compressed structures. Using our static graph compression combined with our differential technique, we find that we can speed up matrix-vector multiplication by reusing previously computed products. We compare our results against a similar technique using the Webgraph Framework, and we see that not only are our base query speeds faster, but we also gain a more significant speed-up from reusing products. Then, we use our time-evolving compression to solve the earliest arrival paths problem and time-evolving transitive closure. We found that not only were we the first to run such algorithms directly on compressed data, but that our technique was particularly efficient at doing so

    Managing polyglot systems metadata with hypergraphs

    Get PDF
    A single type of data store can hardly fulfill every end-user requirements in the NoSQL world. Therefore, polyglot systems use different types of NoSQL datastores in combination. However, the heterogeneity of the data storage models makes managing the metadata a complex task in such systems, with only a handful of research carried out to address this. In this paper, we propose a hypergraph-based approach for representing the catalog of metadata in a polyglot system. Taking an existing common programming interface to NoSQL systems, we extend and formalize it as hypergraphs for managing metadata. Then, we define design constraints and query transformation rules for three representative data store types. Furthermore, we propose a simple query rewriting algorithm using the catalog itself for these data store types and provide a prototype implementation. Finally, we show the feasibility of our approach on a use case of an existing polyglot system.Peer ReviewedPostprint (author's final draft

    Selling fashion: realizing the research potential of the House of Fraser archive, University of Glasgow Archive Services

    Get PDF
    The House of Fraser archive is a rich resource for the study of the development of fashion retailing in Britain since the mid-nineteenth century. It is, however, underexploited by textile, fashion and retail historians. During the summer of 2009, the University of Glasgow archive services will complete an Arts and Humanities Research Council-funded project which seeks to improve the accessibility of the Archive. Adopting a progressive approach to archival description, the project is developing an innovative online catalogue, providing fuller access to information about the Archive and the resources contained within it

    Indexing Metric Spaces for Exact Similarity Search

    Full text link
    With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes
    • 

    corecore