6 research outputs found

    Incremental Entity Resolution from Linked Documents

    Full text link
    In many government applications we often find that information about entities, such as persons, are available in disparate data sources such as passports, driving licences, bank accounts, and income tax records. Similar scenarios are commonplace in large enterprises having multiple customer, supplier, or partner databases. Each data source maintains different aspects of an entity, and resolving entities based on these attributes is a well-studied problem. However, in many cases documents in one source reference those in others; e.g., a person may provide his driving-licence number while applying for a passport, or vice-versa. These links define relationships between documents of the same entity (as opposed to inter-entity relationships, which are also often used for resolution). In this paper we describe an algorithm to cluster documents that are highly likely to belong to the same entity by exploiting inter-document references in addition to attribute similarity. Our technique uses a combination of iterative graph-traversal, locality-sensitive hashing, iterative match-merge, and graph-clustering to discover unique entities based on a document corpus. A unique feature of our technique is that new sets of documents can be added incrementally while having to re-resolve only a small subset of a previously resolved entity-document collection. We present performance and quality results on two data-sets: a real-world database of companies and a large synthetically generated `population' database. We also demonstrate benefit of using inter-document references for clustering in the form of enhanced recall of documents for resolution.Comment: 15 pages, 8 figures, patented wor

    Development of a Managerial Approach for a New IT Organisation Design Framework (ITODF) Based on Digitisation Trends.

    Get PDF
    Abstract Business organisations are currently at a tipping point. Disruptive technologies like Artificial Intelligence, Blockchain and others transforms many industries in the ways they work. Lines between business and technology blur. Researchers have acknowledged that this is the time in which the IT organisation needs to re-strategize itself. In this dissertation, the author provides a structured derivation of an IT organisation design framework. He illustrates how the IT organisation needs to be designed in the digital age to be successful. The research results are derived through a qualitative exploratory study and a quantitative confirmatory study. The findings show that the detailed design of six dimensions is critical for the successful IT organisation design: Strategy, Structure, Information, Governance, Processes and Sourcing. Additionally, the dissertation outlines important implications for practitioners along five guiding principles. These guiding principles explain how to best implement the design framework in practice. Resumen Las organizaciones empresariales se encuentran actualmente en un punto de inflexión en el que tecnologías disruptivas como la Inteligencia Artificial, Blockchain y otras pueden transformar el funcionamiento de muchas industrias, pues la línea divisoria entre el negocio y la tecnología se hace cada vez más difusa. Los investigadores coinciden en que, en este momento, es necesario redefinir las organizaciones relacionadas con las Tecnologías de la Información (TI) en la empresa. En este trabajo, el autor ofrece un marco de diseño estructurado las empresas del sector TI. Se expone cómo debe definirse el diseño organizativo de TI en la era digital, realizando para ello un estudio exploratorio cualitativo y un estudio confirmatorio cuantitativo. Los resultados obtenidos muestran que, para garantizar el éxito de la organización TI, es fundamental el diseño detallado de seis dimensiones: Estrategia, Estructura, Información, Gobierno, Procesos y Abastecimiento. Además, el presente trabajo ofrece a los profesionales implicaciones importantes referidas a cada uno de los cinco principios rectores citados, con el fin de implementar adecuadamente en la práctica el diseño de la organización.Administración y Dirección de Empresa

    Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

    Full text link
    Abstract Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.http://deepblue.lib.umich.edu/bitstream/2027.42/134522/1/13742_2016_Article_117.pd

    Gestão estratégica empresarial: proposição de um modelo de monitoramento informacional na era do big data

    Get PDF
    This paper aims to propose an Informational Monitoring Model that can support the business strategic management, in order to identify threats and opportunities, and strengths and weaknesses, providing the obtainment of important information for the preparation and monitoring of the Business Strategic Planning. Under the methodological aspect, it is a theoretical discussion based on literature that seeks to recover concepts related to Big Data and Intelligent Software Agent, identifying characteristics that if automated, can contribute to the process of search and analysis of relevant data and information for the preparation and monitoring of the Business Strategic Planning. Finally, it demonstrates that in the intersection of these concepts is possible to observe a system of Competitive Intelligence and Organizational that can be expanded and evolved to use the features of Big Data and Intelligent Software Agent, allowing one to build as a result of the discussion, the model proposed

    Utilising Semantic Web Technologies for Improved Road Network Information Exchange

    Get PDF
    Road asset data harmonisation is a challenge for the Australian road and transport authorities considering their heterogeneous data standards, data formats and tools. Classic data harmonisation techniques require huge databases with many tables, a unified metadata definition and standardised tools to share data with others. In order to find a better way to harmonise heterogeneous road network data, this dissertation uses Semantic Web technologies to investigate fast and efficient road asset data harmonisation
    corecore