945 research outputs found

    Information Extraction on Para-Relational Data.

    Full text link
    Para-relational data (such as spreadsheets and diagrams) refers to a type of nearly relational data that shares the important qualities of relational data but does not present itself in a relational format. Para-relational data often conveys highly valuable information and is widely used in many different areas. If we can convert para-relational data into the relational format, many existing tools can be leveraged for a variety of interesting applications, such as data analysis with relational query systems and data integration applications. This dissertation aims to convert para-relational data into a high-quality relational form with little user assistance. We have developed four standalone systems, each addressing a specific type of para-relational data. Senbazuru is a prototype spreadsheet database management system that extracts relational information from a large number of spreadsheets. Anthias is an extension of the Senbazuru system to convert a broader range of spreadsheets into a relational format. Lyretail is an extraction system to detect long-tail dictionary entities on webpages. Finally, DiagramFlyer is a web-based search system that obtains a large number of diagrams automatically extracted from web-crawled PDFs. Together, these four systems demonstrate that converting para-relational data into the relational format is possible today, and also suggest directions for future systems.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120853/1/chenzhe_1.pd

    EVALUATION OF THE CLUSTERING PERFORMANCE OF AFFINITY PROPAGATION ALGORITHM CONSIDERING THE INFLUENCE OF PREFERENCE PARAMETER AND DAMPING FACTOR

    Get PDF
    The identification of significant underlying data patterns such as image composition and spatial arrangements is fundamental in remote sensing tasks. Therefore, the development of an effective approach for information extraction is crucial to achieve this goal. Affinity propagation (AP) algorithm is a novel powerful technique with the ability of handling with unusual data, containing both categorical and numerical attributes. However, AP has some limitations related to the choice of initial preference parameter, occurrence of oscillations and processing of large data sets. This paper evaluates the clustering performance of AP algorithm taking into account the influence of preference parameter and damping factor. The study was conducted considering the AP algorithm, the adaptive AP and partition AP. According to the experiments, the choice of preference and damping greatly influences on the quality and the final number of clusters

    Improving the Navigability of Tagging Systems with Hierarchically Constructed Resource Lists and Tag Trails

    Get PDF
    Recent research has shown that the navigability of tagging systems leaves much to be desired. In general, it was observed that tagging systems are not navigable if the resource lists of the tagging system are limited to a certain factor k. Hence, in this paper a novel resource list generation approach is introduced that addresses this issue. The proposed approach is based on a hierarchical network model. The paper shows through a number of experiments based on a tagging dataset from a large online encyclopedia system called Austria-Forum, that the new algorithm is able to create tag network structures that are navigable in an efficient manner. Contrary to previous work, the method featured in this paper is completely generic, i.e. the introduced resource list generation approach could be used to improve the navigability of any tagging system. This work is relevant for researchers interested in navigability of emergent hypertext structures and for engineers seeking to improve the navigability of tagging systems

    Information-seeking on the Web with Trusted Social Networks - from Theory to Systems

    Get PDF
    This research investigates how synergies between the Web and social networks can enhance the process of obtaining relevant and trustworthy information. A review of literature on personalised search, social search, recommender systems, social networks and trust propagation reveals limitations of existing technology in areas such as relevance, collaboration, task-adaptivity and trust. In response to these limitations I present a Web-based approach to information-seeking using social networks. This approach takes a source-centric perspective on the information-seeking process, aiming to identify trustworthy sources of relevant information from within the user's social network. An empirical study of source-selection decisions in information- and recommendation-seeking identified five factors that influence the choice of source, and its perceived trustworthiness. The priority given to each of these factors was found to vary according to the criticality and subjectivity of the task. A series of algorithms have been developed that operationalise three of these factors (expertise, experience, affinity) and generate from various data sources a number of trust metrics for use in social network-based information seeking. The most significant of these data sources is Revyu.com, a reviewing and rating Web site implemented as part of this research, that takes input from regular users and makes it available on the Semantic Web for easy re-use by the implemented algorithms. Output of the algorithms is used in Hoonoh.com, a Semantic Web-based system that has been developed to support users in identifying relevant and trustworthy information sources within their social networks. Evaluation of this system's ability to predict source selections showed more promising results for the experience factor than for expertise or affinity. This may be attributed to the greater demands these two factors place in terms of input data. Limitations of the work and opportunities for future research are discussed

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Neural Networks forBuilding Semantic Models and Knowledge Graphs

    Get PDF
    1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInoopenFutia, Giusepp
    corecore