3 research outputs found

    Protecting Privacy in the Archives: Preliminary Explorations of Topic Modeling for Born-Digital Collections

    Get PDF
    Natural language processing (NLP) is an area of increased interest for digital archivists, although most research to date has focused on digitized rather than born-digital collections. This study in progress explores whether NLP techniques can be used effectively to surface documents requiring restrictions due to their personal information content. This phase of the research focuses on using topic modeling to find records relating to human resources. Early results show some promise, but suggest that topic modeling on its own will not be sufficient; other techniques to be explored include sentiment analysis and named entity extraction

    Author-Topic Modeling of DESIDOC Journal of Library and Information Technology (2008-2017), India

    Get PDF
    This study presents a method to analyze textual data and applying it to the field of Library and Information Science. This paper subsumes a special case of Latent Dirichlet Allocation and Author-Topic models where each article has one unique author and each author has one unique topic. Topic Modeling Toolkit is used to perform the author-topic modeling. The study further which considers topics and their changes over time by taking into account both the word co-occurrence pattern and time. 393 full-text articles were downloaded from DESIDOC Journal of Library and Information Technology and were analyzed accordingly. 16 core topics have been identified throughout the period of ten years. These core topics can be considered as the core area of research in the journal from 2008 to 2017. This paper further identifies top five authors associated with the representative articles for each studied year. These authors can be treated as the subject-experts for the modeled topics as indicated. The results of the study can serve as a platform to determine the research trend; core areas of research; and the subject-experts related to those core areas in the field the Library and Information Science in India
    corecore