Search CORE

945 research outputs found

Information Extraction on Para-Relational Data.

Author: Chen Zhe
Publication venue
Publication date
Field of study

Para-relational data (such as spreadsheets and diagrams) refers to a type of nearly relational data that shares the important qualities of relational data but does not present itself in a relational format. Para-relational data often conveys highly valuable information and is widely used in many different areas. If we can convert para-relational data into the relational format, many existing tools can be leveraged for a variety of interesting applications, such as data analysis with relational query systems and data integration applications. This dissertation aims to convert para-relational data into a high-quality relational form with little user assistance. We have developed four standalone systems, each addressing a specific type of para-relational data. Senbazuru is a prototype spreadsheet database management system that extracts relational information from a large number of spreadsheets. Anthias is an extension of the Senbazuru system to convert a broader range of spreadsheets into a relational format. Lyretail is an extraction system to detect long-tail dictionary entities on webpages. Finally, DiagramFlyer is a web-based search system that obtains a large number of diagrams automatically extracted from web-crawled PDFs. Together, these four systems demonstrate that converting para-relational data into the relational format is possible today, and also suggest directions for future systems.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120853/1/chenzhe_1.pd

Deep Blue Documents at the University of Michigan

EVALUATION OF THE CLUSTERING PERFORMANCE OF AFFINITY PROPAGATION ALGORITHM CONSIDERING THE INFLUENCE OF PREFERENCE PARAMETER AND DAMPING FACTOR

Author: Machado Álvaro Muriel Lima
Moiane André Fenias
Publication venue: Bulletin of Geodetic Sciences
Publication date: 01/12/2018
Field of study

The identification of significant underlying data patterns such as image composition and spatial arrangements is fundamental in remote sensing tasks. Therefore, the development of an effective approach for information extraction is crucial to achieve this goal. Affinity propagation (AP) algorithm is a novel powerful technique with the ability of handling with unusual data, containing both categorical and numerical attributes. However, AP has some limitations related to the choice of initial preference parameter, occurrence of oscillations and processing of large data sets. This paper evaluates the clustering performance of AP algorithm taking into account the influence of preference parameter and damping factor. The study was conducted considering the AP algorithm, the adaptive AP and partition AP. According to the experiments, the choice of preference and damping greatly influences on the quality and the final number of clusters

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital de Periódicos da UFPR (Universidade Federal do Paraná)

FigShare

Improving the Navigability of Tagging Systems with Hierarchically Constructed Resource Lists and Tag Trails

Author: Christoph Trattner
Miller
Plangprasopchok
Trattner
Zaphiris
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2011
Field of study

Recent research has shown that the navigability of tagging systems leaves much to be desired. In general, it was observed that tagging systems are not navigable if the resource lists of the tagging system are limited to a certain factor k. Hence, in this paper a novel resource list generation approach is introduced that addresses this issue. The proposed approach is based on a hierarchical network model. The paper shows through a number of experiments based on a tagging dataset from a large online encyclopedia system called Austria-Forum, that the new algorithm is able to create tag network structures that are navigable in an efficient manner. Contrary to previous work, the method featured in this paper is completely generic, i.e. the introduced resource list generation approach could be used to improve the navigability of any tagging system. This work is relevant for researchers interested in navigability of emergent hypertext structures and for engineers seeking to improve the navigability of tagging systems

Crossref

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Information-seeking on the Web with Trusted Social Networks - from Theory to Systems

Author: Heath Tom
Publication venue
Publication date: 01/01/2008
Field of study

This research investigates how synergies between the Web and social networks can enhance the process of obtaining relevant and trustworthy information. A review of literature on personalised search, social search, recommender systems, social networks and trust propagation reveals limitations of existing technology in areas such as relevance, collaboration, task-adaptivity and trust. In response to these limitations I present a Web-based approach to information-seeking using social networks. This approach takes a source-centric perspective on the information-seeking process, aiming to identify trustworthy sources of relevant information from within the user's social network. An empirical study of source-selection decisions in information- and recommendation-seeking identified five factors that influence the choice of source, and its perceived trustworthiness. The priority given to each of these factors was found to vary according to the criticality and subjectivity of the task. A series of algorithms have been developed that operationalise three of these factors (expertise, experience, affinity) and generate from various data sources a number of trust metrics for use in social network-based information seeking. The most significant of these data sources is Revyu.com, a reviewing and rating Web site implemented as part of this research, that takes input from regular users and makes it available on the Semantic Web for easy re-use by the implemented algorithms. Output of the algorithms is used in Hoonoh.com, a Semantic Web-based system that has been developed to support users in identifying relevant and trustworthy information sources within their social networks. Evaluation of this system's ability to predict source selections showed more promising results for the experience factor than for expertise or affinity. This may be attributed to the greater demands these two factors place in terms of input data. Limitations of the work and opportunities for future research are discussed

CiteSeerX

Open Research Online (The Open University)

OpenGrey Repository

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX

Neural Networks forBuilding Semantic Models and Knowledge Graphs

Author
Publication venue: Politecnico di Torino
Publication date: 30/10/2020
Field of study

1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInoopenFutia, Giusepp

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

EGI user forum 2011 : book of abstracts

Author
Publication venue
Publication date: 01/01/2011
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main