50 research outputs found

    Identifying Graphs from Noisy Observational Data

    Get PDF
    There is a growing amount of data describing networks -- examples include social networks, communication networks, and biological networks. As the amount of available data increases, so does our interest in analyzing the properties and characteristics of these networks. However, in most cases the data is noisy, incomplete, and the result of passively acquired observational data; naively analyzing these networks without taking these errors into account can result in inaccurate and misleading conclusions. In my dissertation, I study the tasks of entity resolution, link prediction, and collective classification to address these deficiencies. I describe these tasks in detail and discuss my own work on each of these tasks. For entity resolution, I develop a method for resolving the identities of name mentions in email communications. For link prediction, I develop a method for inferring subordinate-manager relationships between individuals in an email communication network. For collective classification, I propose an adaptive active surveying method to address node labeling in a query-driven setting on network data. In many real-world settings, however, these deficiencies are not found in isolation and all need to be addressed to infer the desired complete and accurate network. Furthermore, because of the dependencies typically found in these tasks, the tasks are inherently inter-related and must be performed jointly. I define the general problem of graph identification which simultaneously performs these tasks; removing the noise and missing values in the observed input network and inferring the complete and accurate output network. I present a novel approach to graph identification using a collection of Coupled Collective Classifiers, C3, which, in addition to capturing the variety of features typically used for each task, can capture the intra- and inter-dependencies required to correctly infer nodes, edges, and labels in the output network. I discuss variants of C3 using different learning and inference paradigms and show the superior performance of C3, in terms of both prediction quality and runtime performance, over various previous approaches. I then conclude by presenting the Graph Alignment, Identification, and Analysis (GAIA) open-source software library which not only provides an implementation of C3 but also algorithms for various tasks in network data such as entity resolution, link prediction, collective classification, clustering, active learning, data generation, and analysis

    Context Aware Computing for The Internet of Things: A Survey

    Get PDF
    As we are moving towards the Internet of Things (IoT), the number of sensors deployed around the world is growing at a rapid pace. Market research has shown a significant growth of sensor deployments over the past decade and has predicted a significant increment of the growth rate in the future. These sensors continuously generate enormous amounts of data. However, in order to add value to raw sensor data we need to understand it. Collection, modelling, reasoning, and distribution of context in relation to sensor data plays critical role in this challenge. Context-aware computing has proven to be successful in understanding sensor data. In this paper, we survey context awareness from an IoT perspective. We present the necessary background by introducing the IoT paradigm and context-aware fundamentals at the beginning. Then we provide an in-depth analysis of context life cycle. We evaluate a subset of projects (50) which represent the majority of research and commercial solutions proposed in the field of context-aware computing conducted over the last decade (2001-2011) based on our own taxonomy. Finally, based on our evaluation, we highlight the lessons to be learnt from the past and some possible directions for future research. The survey addresses a broad range of techniques, methods, models, functionalities, systems, applications, and middleware solutions related to context awareness and IoT. Our goal is not only to analyse, compare and consolidate past research work but also to appreciate their findings and discuss their applicability towards the IoT.Comment: IEEE Communications Surveys & Tutorials Journal, 201

    Unsupervised extraction and normalization of product attributes from web pages.

    Get PDF
    Xiong, Jiani."July 2010."Thesis (M.Phil.)--Chinese University of Hong Kong, 2010.Includes bibliographical references (p. 59-63).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Motivation --- p.4Chapter 1.3 --- Our Approach --- p.8Chapter 1.4 --- Potential Applications --- p.12Chapter 1.5 --- Research Contributions --- p.13Chapter 1.6 --- Thesis Organization --- p.15Chapter 2 --- Literature Survey --- p.16Chapter 2.1 --- Supervised Extraction Approaches --- p.16Chapter 2.2 --- Unsupervised Extraction Approaches --- p.19Chapter 2.3 --- Attribute Normalization --- p.21Chapter 2.4 --- Integrated Approaches --- p.22Chapter 3 --- Problem Definition and Preliminaries --- p.24Chapter 3.1 --- Problem Definition --- p.24Chapter 3.2 --- Preliminaries --- p.27Chapter 3.2.1 --- Web Pre-processing --- p.27Chapter 3.2.2 --- Overview of Our Framework --- p.31Chapter 3.2.3 --- Background of Graphical Models --- p.32Chapter 4 --- Our Proposed Framework --- p.36Chapter 4.1 --- Our Proposed Graphical Model --- p.36Chapter 4.2 --- Inference --- p.41Chapter 4.3 --- Product Attribute Information Determination --- p.47Chapter 5 --- Experiments and Results --- p.49Chapter 6 --- Conclusion --- p.57Bibliography --- p.59Chapter A --- Dirichlet Process --- p.64Chapter B --- Hidden Markov Models --- p.6

    Schema-agnostic entity retrieval in highly heterogeneous semi-structured environments

    Get PDF
    [no abstract

    Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

    Get PDF
    The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Big Data in Management Research. Exploring New Avenues

    Get PDF

    Big Data in Management Research. Exploring New Avenues

    Get PDF
    corecore