7,125 research outputs found

    Prompting the data transformation activities for cluster analysis on collections of documents

    Get PDF
    In this work we argue towards a new self-learning engine able to suggest to the analyst good transformation methods and weighting schemas for a given data collection. This new generation of systems, named SELF-DATA (SELF-learning DAta TrAnsformation) relies on an engine capable of exploring different data weighting schemas (e.g., normalized term frequencies, logarithmic entropy) and data transformation methods (e.g., PCA, LSI) before applying a given data mining algorithm (e.g., cluster analysis), evaluating and comparing solutions through different quality indices (e.g., weighted Silhouette), and presenting the 3-top solutions to the analyst. SELF-DATA will also include a knowledge database storing results of experiments on previously processed datasets, and a classification algorithm trained on the knowledge base content to forecast the best methods for future analyses. SELF-DATA’s current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. The preliminary validation performed on 4 collections of documents highlights that the TF-IDF and logarithmic entropy weighting methods are effective to measure item relevance with sparse datasets, and the LSI method outperforms PCA in the presence of a larger feature domain

    Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation

    Get PDF
    ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorithm parameters and in discovering cohesive and well-separated groups of documents with a similar topic

    CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

    Full text link
    In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.Comment: Paper (13 pages, 5 figures and 2 tables) + Appendix (18 pages, 4 figures and 5 tables

    Text miner's little helper: scalable self-tuning methodologies for knowledge exploration

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Agents and artefacts in the emerging electric vehicle space

    Get PDF
    After COP 21, the targets for reducing CO2 emissions have boosted the commitment of governments and companies to developing alternative technologies for the mobility of people and goods. Electric vehicles are at the heart of this transformation, which is profoundly affecting the characteristics of agents and artefacts. The aim of the paper is to identify the relevant domains of this transformation, and to identify what characterises the space of the agents and artefacts of the electric vehicle and their interactions, as oriented by the public policies promoted by the various countries. The paper presents the results of a multidimensional textual analysis of the news published in English by electrive.com, a daily newsletter covering a wide range of relevant information on developments in electric transport in Europe and beyond. These results are a preliminary step for the analysis of the social, economic, organisational and technological changes related to sustainable mobility.After COP 21, the targets for reducing CO2 emissions have boosted the commitment of governments and companies to developing alternative technologies for the mobility of people and goods. Electric vehicles are at the heart of this transformation, which is profoundly affecting the characteristics of agents and artefacts. The aim of the paper is to identify the relevant domains of this transformation, and to identify what characterises the space of the agents and artefacts of the electric vehicle and their interactions, as oriented by the public policies promoted by the various countries. The paper presents the results of a multidimensional textual analysis of the news published in English by electrive.com, a daily newsletter covering a wide range of relevant information on developments in electric transport in Europe and beyond. These results are a preliminary step for the analysis of the social, economic, organisational and technological changes related to sustainable mobility

    NMC Horizon Report: 2017 Library Edition

    Get PDF
    What is on the five-year horizon for academic and research libraries? Which trends and technology developments will drive transformation? What are the critical challenges and how can we strategize solutions? These questions regarding technology adoption and educational change steered the discussions of 77 experts to produce the NMC Horizon Report: 2017 Library Edition, in partnership with the University of Applied Sciences (HTW) Chur, Technische Informationsbibliothek (TIB), ETH Library, and the Association of College & Research Libraries (ACRL). Six key trends, six significant challenges, and six developments in technology profiled in this report are poised to impact library strategies, operations, and services with regards to learning, creative inquiry, research, and information management. The three sections of this report constitute a reference and technology planning guide for librarians, library leaders, library staff, policymakers, and technologists

    The NMC Horizon Report : 2015 Library Edition

    Get PDF

    Practicing Integrity

    Get PDF
    • …
    corecore