34 research outputs found

    A Visual Framework for Graph and Text Analytics in Email Investigation

    Get PDF
    The aim of this work is to build a framework which can benefit from data analysis techniques to explore and mine important information stored in an email collection archive. The analysis of email data could be accomplished from different perspectives, we mainly focused our approach on two different aspects: social behaviors and the textual content of the emails body. We will present a review on the past techniques and features adopted to handle this type of analysis, and evaluate them in real tools. This background will motivate our choices and proposed approach, and help us build a final visual framework which can analyze and show social graph networks along with other data visualization elements that assist users in understanding and dynamically elaborating the email data uploaded. We will present the architecture and logical structure of the framework, and show the flexibility nature of the system for future integrations and improvements. The functional aspects of our approach will be tested using the ‘enron dataset’, and by applying real key actors involved in the ‘enron case’ scandal

    How to structure citations data and bibliographic metadata in the OpenCitations accepted format

    Full text link
    The OpenCitations organization is working on ingesting citation data and bibliographic metadata directly provided by the community (e.g., scholars and publishers). The aim is to improve the general coverage of open citations, which is still far from being complete, and use the provided metadata to enrich the characterization of the citing and cited entities. This paper illustrates how the citation data and bibliographic metadata should be structured to comply with the OpenCitations accepted format.Comment: 5 pages, submitted to JCDL 202

    A quantitative and qualitative open citation analysis of retracted articles in the humanities

    Get PDF
    In this article, we show and discuss the results of a quantitative and qualitative analysis of open citations to retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, subject, etc.) and the characteristics of their in-text citations (e.g., intent, sentiment, etc.). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities' abstracts and the in-text citation contexts. As part of our main findings, we noticed that there was no drop in the overall number of citations after the year of retraction, with few entities which have either mentioned the retraction or expressed a negative sentiment toward the cited publication. In addition, on several occasions, we noticed a higher concern/awareness when it was about citing a retracted publication, by the citing entities belonging to the health sciences domain, if compared to the humanities and the social science domains. Philosophy, arts, and history are the humanities areas that showed the higher concern toward the retraction

    Science of retracted science: a citation analysis of the arts and humanities domain

    Get PDF
    In the scholarly publishing domain, a retraction is raised when a specific publication is considered erroneous by the venue in which it appeared after it was published. The aim of this work is uncovering new insights and learn new important information to help us understand the retraction phenomenon in the arts and humanities domain. Our investigation is based on a methodology defined using quantitative and qualitative measures derived from previous studies in the transdisciplinary research field of “science of science” (SciSci). The designed methodology takes into account a general case of retraction and applies a citation analysis based on five phases. Citations to retracted publications (before and after their retraction) are gathered and characterized with a set of attributes, including general metadata and information extracted from citing entities’ full text. The annotated characteristics are further considered for a statistical and a textual analysis (i.e., a topic modeling analysis). The contribution of this thesis is grounded by addressing the following research questions: (RQ1) How did scholarly research cite retracted humanities publications before and after their retraction? (RQ2) Did all the humanities areas behave similarly concerning the retraction phenomenon? (RQ3) What are the main differences and similarities in the retraction dynamics between the humanities domain and the STEM disciplines? RQ1 and RQ2 are addressed by tuning and applying the methodology on the analysis of the retracted publications in the humanities domain. RQ3 is addressed on two levels, i.e., considering and comparing: (L1) the outcomes of the past studies on the retraction in STEM, and (L2) the results obtained from an analysis of a retraction case in STEM using the defined methodology

    Retractions in Arts and Humanities: an Analysis of the Retraction Notices

    Get PDF
    The aim of this work is to understand the retraction phenomenon in the arts and humanities domain through an analysis of the retraction notices – formal documents stating and describing the retraction of a particular publication. The retractions and the corresponding notices are identified using the data provided by Retraction Watch. Our methodology for the analysis combines a metadata analysis and a content analysis (mainly performed using a topic modeling process) of the retraction notices. Considering 343 cases of retraction, we found that many retraction notices are neither identifiable nor findable. In addition, these were not always separated from the original papers, introducing ambiguity in understanding how these notices were perceived by the community (i.e., cited). Also, we noticed that there is no systematic way to write a retraction notice. Indeed, some retraction notices presented a complete discussion of the reasons for retraction, while others tended to be more direct and succinct. We have also reported many notices having similar text while addressing different retractions. We think a further study with a larger collection should be done using the same methodology to confirm and investigate our findings further

    A Prototype for a Controlled and Valid RDF Data Production Using SHACL

    Full text link
    The paper introduces a tool prototype that combines SHACL's capabilities with ad-hoc validation functions to create a controlled and user-friendly form interface for producing valid RDF data. The proposed tool is developed within the context of the OpenCitations Data Model (OCDM) use case. The paper discusses the current status of the tool, outlines the future steps required for achieving full functionality, and explores the potential applications and benefits of the tool

    OpenCitations Meta

    Full text link
    OpenCitations Meta is a new database that contains bibliographic metadata of scholarly publications involved in citations indexed by the OpenCitations infrastructure. It adheres to Open Science principles and provides data under a CC0 license for maximum reuse. The data can be accessed through a SPARQL endpoint, REST APIs, and dumps. OpenCitations Meta serves three important purposes. Firstly, it enables disambiguation of citations between publications described using different identifiers from various sources. For example, it can link publications identified by DOIs in Crossref and PMIDs in PubMed. Secondly, it assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs), to bibliographic resources without existing external persistent identifiers like DOIs. Lastly, by hosting the bibliographic metadata internally, OpenCitations Meta improves the speed of metadata retrieval for citing and cited documents. The database is populated through automated data curation, including deduplication, error correction, and metadata enrichment. The data is stored in RDF format following the OpenCitations Data Model, and changes and provenance information are tracked. OpenCitations Meta and its production. OpenCitations Meta currently incorporates data from Crossref, DataCite, and the NIH Open Citation Collection. In terms of semantic publishing datasets, it is currently the first in data volume.Comment: 26 pages, 7 figure
    corecore