3 research outputs found

    GENERATING KNOWLEDGE STRUCTURES FROM OPEN DATASETS' TAGS - AN APPROACH BASED ON FORMAL CONCEPT ANALYSIS

    Get PDF
    Under influence of data transparency initiatives, a variety of institutions have published a significant number of datasets. In most cases, data publishers take advantage of open data portals (ODPs) for making their datasets publicly available. To improve the datasets' discoverability, open data portals (ODPs) group open datasets into categories using various criteria like publishers, institutions, formats, and descriptions. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata may be missing, or may be incomplete or redundant. Each of these situations makes it difficult for users to find appropriate datasets and obtain the desired information. As the number of available datasets grows, this problem becomes easy to notice. This paper is focused on the first step towards decreasing this problem by implementing knowledge structures to be used in situations where a part of datasets' metadata is missing. In particular, we focus on developing knowledge structures capable of suggesting the best match for the category where an uncategorized dataset should belong to. Our approach relies on dataset descriptions provided by users within dataset tags. We take advantage of a formal concept analysis to reveal the shared conceptualization originating from the tags' usage by developing a concept lattice per each category of open datasets. Since tags represent free text metadata entered by users, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Finally, we will demonstrate the advantage of our proposal by comparing concept lattices generated using formal the concept analysis before and after the optimization process. The main experimental research results will show that our approach is capable of reducing the number of nodes within a lattice more than 40%

    TOOL FOR INTERACTIVE VISUAL ANALYSIS OF LARGE HIERARCHICAL DATA STRUCTURES

    Get PDF
    In the Big Data era data visualization and exploration systems, as means for data perception and manipulation are facing major challenges. One of the challenges for modern visualization systems is to ensure adequate visual presentation and interaction.  Therefore, within this paper, we present a tool for interactive visualization of data with a hierarchical structure. It is a general-purpose tool that uses a graph-based approach. However, its main focus is on the visual analysis of concept lattices generated as the output of the Formal Concept Analysis algorithm. As the data grow, concept lattice can become complex and hard for visualization and analysis. In order to address this issue, functionalities important for the exploration of the large concept lattices are applied within this tool. The usage of the tool is presented in the example of visualization of concept lattices generated based on the available data on the Canadas open data portal and can be used for exploring the usage of tags within datasets

    Unapređenje upotrebljivosti otvorenih podataka definisanjem metode kategorizacije zasnovane na metapodacima portala otvorenih podataka

    No full text
    Due to numerous data transparency and open government initiatives, a large volume of data was published on open data portals. To make it more accessible and visible, these portals have introduced data filtering by category, tags, format, organization, etc. This information is stored as metadata and provided when publishing the data. However, the metadata is not always complete. The lack of data categories has a great impact on the data visibility, accessibility, and usability of information. As the data increases on the portals, it becomes harder to find and identify the wanted information when the category is missing. Within this doctoral dissertation, an analysis of metadata on open data portals, as well as an analysis of categories and tags usage, and their connections on open data portals was performed. Afterward, the problem of missing data categories was addressed by proposing a methodology for data categorization based on the combination of tags. Within the methodology, the hierarchical organization of tags in a category was defined based on their usage in categorized data. Then, a tool was presented for visual analysis of the hierarchical organization of tags, and a proposal was given for the data categorization based on the combination of tags. The presented categorization relies on the way tags are used in categorized data, i.e. their hierarchical organization. The approach calculates the similarity between two tags, and two combinations of tags, as well as defines the parameters for categorizing the combination of tags with categories on the portal. Afterward, an algorithm was defined that proposes the categories for a dataset with a given combination of tags. For the proposed categorization, an evaluation was performed using the data from the Canadian open data portal. Lastly, within the doctoral dissertation, a model was proposed for supplementing the datasets’ metadata on open data portals
    corecore