512 research outputs found

    Monitoring Network Data Streams

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Algorithms for data placement, reconfiguration and monitoring in storage networks

    Get PDF
    In this thesis we address three problems related to self-management of storage networks - data placement, data reconfiguration and data monitoring. Examples of such storage networks include centrally managed systems like Storage Area Networks and Network Attached Storage devices, or even highly distributed systems like a P2P network or a Sensor Network. One of the crucial functions of a storage system is that of deciding the placement of data within the system. This data placement is dependent on the demand pattern for the data and subject to constraints of the storage system. For instance, if a particular data item is very popular the storage system might want to host it on a disk with high bandwidth or make multiple copies of the item. We present new results for some of these data placement problems. As the demand pattern changes over time, the storage system will have to modify its placement accordingly. Such a modification in placement will typically involve movement of data items from one set of disks to another or changing the number of copies of a data item in the system. For such a modification to be effective, it should be computed and applied quickly since the system is running inefficiently during this reconfiguration. We propose new schemes to reconfigure the data placement to deal with changing demand. To re-compute data placement periodically and to reconfigure the data placement, we need to continuously track of the demand distribution in the storage system and also be able to answer aggregate queries about the demand distribution. The data monitoring portion of the thesis deals with such problems that arise in the context of distributed data management applications. A monitoring system for such a scenario would need to process large amounts of data from a widely distributed set of data sources. The thesis presents new schemes that improve communication-efficiency of existing methods that address these problems

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Advanced distributed data integration infrastructure and research data management portal

    Get PDF
    The amount of data available due to the rapid spread of advanced information technology is exploding. At the same time, continued research on data integration systems aims to provide users with uniform data access and efficient data sharing. The ability to share data is particularly important for interdisciplinary research, where a comprehensive picture of the subject requires large amounts of data from disparate data sources from a variety of disciplines. While there are numerous data sets available from various groups worldwide, the existing data sources are principally oriented toward regional comparative efforts rather than global applications. They vary widely both in content and format. Such data sources cannot be easily integrated, and maintained by small groups of developers. I propose an advanced infrastructure for large-scale data integration based on crowdsourcing. In particular, I propose a novel architecture and algorithms to efficiently store dynamically incoming heterogeneous datasets enabling both data integration and data autonomy. My proposed infrastructure combines machine learning algorithms and human expertise to perform efficient schema alignment and maintain relationships between the datasets. It provides efficient data exploration functionality without requiring users to write complex queries, as well as performs approximate information fusion when exact match does not exist. Finally, I introduce Col*Fusion system that implements the proposed advance data integration infrastructure

    Localized Events in Social Media Streams: Detection, Tracking, and Recommendation

    Get PDF
    From the recent proliferation of social media channels to the immense amount of user-generated content, an increasing interest in social media mining is currently being witnessed. Messages continuously posted via these channels report a broad range of topics from daily life to global and local events. As a consequence, this has opened new opportunities for mining event information crucial in many application domains, especially in increasing the situational awareness in critical scenarios. Interestingly, many of these messages are enriched with location information, due to the wide- spread of mobile devices and the recent advancements of today’s location acquisition techniques. This enables location-aware event mining, i.e., the detection and tracking of localized events. In this thesis, we propose novel frameworks and models that digest social media content for localized event detection, tracking, and recommendation. We first develop KeyPicker, a framework to extract and score event-related keywords in an online fashion, accounting for high levels of noise, temporal heterogeneity and outliers in the data. Then, LocEvent is proposed to incrementally detect and track events using a 4-stage procedure. That is, LocEvent receives the keywords extracted by KeyPicker, identifies local keywords, spatially clusters them, and finally scores the generated clusters. For each detected event, a set of descriptive keywords, a location, and a time interval are estimated at a fine-grained resolution. In addition to the sparsity of geo-tagged messages, people sometimes post about events far away from an event’s location. Such spatial problems are handled by novel spatial regularization techniques, namely, graph- and gazetteer-based regularization. To ensure scalability, we utilize a hierarchical spatial index in addition to a multi-stage filtering procedure that gradually suppresses noisy words and considers only event-related ones for complex spatial computations. As for recommendation applications, we propose an event recommender system built upon model-based collaborative filtering. Our model is able to suggest events to users, taking into account a number of contextual features including the social links between users, the topical similarities of events, and the spatio-temporal proximity between users and events. To realize this model, we employ and adapt matrix factorization, which allows for uncovering latent user-event patterns. Our proposed features contribute to directing the learning process towards recommendations that better suit the taste of users, in particular when new users have very sparse (or even no) event attendance history. To evaluate the effectiveness and efficiency of our proposed approaches, extensive comparative experiments are conducted using datasets collected from social media channels. Our analysis of the experimental results reveals the superiority and advantages of our frameworks over existing methods in terms of the relevancy and precision of the obtained results

    Accurate Data Approximation in Constrained Environments

    Get PDF
    Several data reduction techniques have been proposed recently as methods for providing fast and fairly accurate answers to complex queries over large quantities of data. Their use has been widespread, due to the multiple benefits that they may offer in several constrained environments and applications. Compressed data representations require less space to store, less bandwidth to communicate and can provide, due to their size, very fast response times to queries. Sensor networks represent a typical constrained environment, due to the limited processing, storage and battery capabilities of the sensor nodes. Large-scale sensor networks require tight data handling and data dissemination techniques. Transmitting a full-resolution data feed from each sensor back to the base-station is often prohibitive due to (i) limited bandwidth that may not be sufficient to sustain a continuous feed from all sensors and (ii) increased power consumption due to the wireless multi-hop communication. In order to minimize the volume of the transmitted data, we can apply two well data reduction techniques: aggregation and approximation. In this dissertation we propose novel data reduction techniques for the transmission of measurements collected in sensor network environments. We first study the problem of summarizing multi-valued data feeds generated at a single sensor node, a step necessary for the transmission of large amounts of historical information collected at the node. The transmission of these measurements may either be periodic (i.e., when a certain amount of measurements has been collected), or in response to a query from the base station. We then also consider the approximate evaluation of aggregate continuous queries. A continuous query is a query that runs continuously until explicitly terminated by the user. These queries can be used to obtain a live-estimate of some (aggregated) quantity, such as the total number of moving objects detected by the sensors

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Implicit Entity Networks: A Versatile Document Model

    Get PDF
    The time in which we live is often referred to as the Information Age. However, it can also aptly be characterized as an age of constant information overload. Nowhere is this more present than on the Web, which serves as an endless source of news articles, blog posts, and social media messages. Of course, this overload is even greater in professions that handle the creation or extraction of information and knowledge, such as journalists, lawyers, researchers, clerks, or medical professionals. The volume of available documents and the interconnectedness of their contents are both a blessing and a curse for the contemporary information consumer. On the one hand, they provide near limitless information, but on the other hand, their consumption and comprehension requires an amount of time that many of us cannot spare. As a result, automated extraction, aggregation, and summarization techniques have risen in popularity, even though they are a long way from being comprehensive. When we, as humans, are faced with an overload of information, we tend to look for patterns that bring order into the chaos. In news, we might identify familiar political figures or celebrities, whereas we might look for expressive symptoms in medicine, or precedential cases in law. In other words, we look for known entities as reference points, and then explore the content along the lines of their relations to others entities. Unfortunately, this approach is not reflected in current document models, which do not provide a similar focus on entities. As a direct result, the retrieval of entity-centric knowledge and relations from a flood of textual information becomes more difficult than it has to be, and the inclusion of external knowledge sources is impeded. In this thesis, we introduce implicit entity networks as a comprehensive document model that addresses this shortcoming and provides a holistic representation of document collections and document streams. Based on the premise of modelling the cooccurrence relations between terms and entities as first-class citizens, we investigate how the resulting network structure facilitates efficient and effective entity-centric search, and demonstrate the extraction of complex entity relations, as well as their summarization. We show that the implicit network model is fully compatible with dynamic streams of documents. Furthermore, we introduce document aggregation methods that are sensitive to the context of entity mentions, and can be used to distinguish between different entity relations. Beyond the relations of individual entities, we introduce network topics as a novel and scalable method for the extraction of topics from collections and streams of documents. Finally, we combine the insights gained from these applications in a versatile hypergraph document model that bridges the gap between unstructured text and structured knowledge sources
    corecore