13 research outputs found

    Uberization Module: A life saver for the manually entered dirty citation data in faculty reporting tool (VIVO 2018)

    No full text
    <div>Cornell University is a decentralized institution where every college and school uses its own means and procedures to record their faculty’s publications. Few of them rely on institutional repositories such as Digital Commons from bepress; while, others use faculty reporting tools such as Activity Insight from Digital Measures or Symplectic Elements from Digital Science. In this presentation, I will discuss a case study of College of Agriculture and Life Sciences (CALS) that currently use Activity Insight (AI) for their faculty reporting needs.</div><div><br></div><div>Every year during faculty reporting season, faculty report their research contributions of the past one year. In College of Agriculture and Life Sciences (CALS), different strategies are used to collect publication data from faculty. Faculty can either i) provide their up to date CVs and an admin staff from the college may read the CVs and manually enter publications data in the reporting system, ii) faculty can copy/paste publications list from their CVs and enter them as a single text blob in a free text template provided by the CALS administration, or iii) faculty can themselves log in to the reporting system and enter their publications in a publication template form. In all three options, publications are entered manually into the faculty reporting system. Such manually entered data is prone to errors and many examples have been found where manually entered citation data do not reflect the truth. Some of the noticed errors include incorrect journal name, incorrect ISSN/EISSN numbers, mistakes in DOIs, incorrect list/order of authors etc. Such dirty citation data cannot be used for data analysis or future strategic discussions. In Scholars@Cornell project, we use uberization module to clean such dirty data.</div><div><br></div><div>First, we load dirty publication data from Activity Insight (AI) to Symplectic Elements as an institutional feed. In cases where the loaded publication is already harvested by Symplectic Elements via upstream sources (such as WoS, Pubmed, Scopus), the AI publication become another record in the existing publication object. In scenarios where the AI publication is the first record in Elements, one may re-run the search for the faculty so that the citation data for the same publication is harvested from upstream sources as well. Once step two is completed, next step is to extract publication objects from Elements, merging data from different sources (i.e., one record from each source) and creating a single record – “uber record” for each article. For creation of an uber record, we ranked the citation data sources based on the experience and intuition of two senior Cornell librarians and started with the metadata from the source they considered best. The uberization module merges the citation data from different publication records (including AI record) in a way that it creates a single record which is clean and comprises best of the best citation data. After passing through the data validation, uber records are transformed into an RDF graph and loaded into Scholars@Cornell.</div

    OpenVIVOForce2016Poster.pdf

    No full text
    OpenVIVO

    Scholars@Cornell - demonstration of D3 visualizations

    No full text
    Scholars@Cornell: A new project at Cornell University.<div><i><br></i></div><div><i>powered by VIVO</i></div

    VIVO: A Community-driven Research Information Management System

    No full text
    VIVO is a community-driven open source software that creates a connected, integrated record of scholarly works ready for reporting, visualization and analysis. VIVO uses Linked Data model and in the core of this application is VIVO-ISF ontology – a data model used by many institutions around the world and thus makes the data interoperable.<br><br>In this presentation, we will discuss some of the history of VIVO project and question ourselves that “is VIVO a research data recording system only”? “Can VIVO be evolved in a research information management system where VIVO not only records the data but also provides aggregate views of the scholarship and scholarly work”. Can VIVO answer questions such as “who are the experts in what subject area” without having any manual input from faculty members? Can VIVO answer questions about internal and global collaborations for a specific academic unit? Can VIVO be used to collect impact evidences of a faculty member’s research for next grant application. We will discuss the opportunities and the challenges in the light of our work “Scholars@Cornell” at Cornell University Library

    Scholars@Cornell: Visualizing the Scholarship Data

    No full text
    In Scholars@Cornell, we provide aggregate views of scholarship data where dynamic visualizations become the entry points into a rich graph of knowledge that can be explored interactively to answer questions such as who are the experts in what areas? Which departments collaborate with each other? What are patterns of interdisciplinary research, and more [1]. We will discuss the new theme and the D3 visualizations that allowed us to move from List Views to Viz Views and leverages the power of state of the art dynamic web languages. We integrate visualizations at different level. Research interest of a faculty member are presented at the department level using Person to Subject Area Network Map visualization. The presented research interest are the subject area classifications of the publication venues where a faculty members have published their articles. We map these subject areas using Science-Metrix and Web of Science Journal classification. The person-to-subject-area map is helpful for the identification of i) list of research interests of a faculty member and ii) list of potential collaborators. The map demonstrates the overlap of research interests among different faculty members. This information can be helpful to identify future coauthors and potential collaborators. To demonstrate the Domain Expertise of a faculty member, we use the keywords from their authored articles and present them in the form a Keyword Cloud. These keywords are either asserted by the authors (i.e. keywords mentioned in the keyword section of an article), tagged by the publishers (e.g. MeSH terms tagged by PubMed) or inferred in our post-processing module. The size of each keyword (in the cloud) is directly proportional to the number of articles in which the keyword is been mentioned. The tooltips on each keyword displays the list of relevant articles. Interdepartmental and cross-unit co-authorships are presented at the College level using Co-Authorship Wheels. We present Global Collaborations at the homepage where academic organizations are mapped to their GRID ids wherever possible. We will discuss our process for selection, design, and development of an initial set of visualizations as well as our approach to the underlying technical architecture. What data is necessary for the generation of these visualization, and how it is modelled. By engaging an initial set of pilot partners, we are evaluating the use of these data-driven visualizations by multiple stakeholders, including faculty, students, librarians, administrators, and the public

    Uberization of Symplectic Elements Citation Data Entries and use of Curation Bins

    No full text
    At Cornell University Library, the primary entity of interest is scholarship, of which people and organizations are, by definition, both the creators and consumers. From this perspective, the attention is focused on aggregate views of scholarship data. In Scholars@Cornell, we use “Symplectic Elements” [1] for the continuous and automated collection of scholarship metadata from multiple internal and external data sources. For the journal articles category, Elements captures the title of the article, list of the authors, name of the journal, volume number, issue, ISSN number, DOI, publication status, pagination, external identifiers etc. - named as citation items. These citation items may or may not be available in every data source. The Crossref version may be different in some details from the Pubmed version and so forth. Some fields may be missing from one version of the metadata but present in the another. This leads to the different metadata versions of the same scholarly publication - named as version entries. In Elements, a user can specify his/her preferred data source for their scholarly publications and VIVO Harvester API [2] can be used to push the preferred citation data entries from Elements to Scholars@Cornell. In Scholars@Cornell, rather using VIVO Harvester API, we built an uberization module that merge the version entries from multiple data sources and creates a “uber record”. For creation of an uber record for a publication, we ranked the sources based on the experience and intuition of two senior Cornell librarians and started with the metadata from the source they considered best. The uberization module allowed us to generate and present best of the best scholarship metadata (in terms of correctness and completeness) to the users. In addition to external sources (such as WoS, PubMed etc.), we use Activity Insight (AI) feed as an internal local source. Any person can manually enter scholarship metadata in AI. We use such manually entered metadata (which is error-prone) as a seed (in Elements) to harvest additional metadata from external sources. Once additional metadata is harvested, uberization process merge these version entries and present the best of the best scholarship metadata that is later fed into Scholars@Cornell. Any scholarship metadata that could not pass through the validation step of Elements-to-Scholars transition, is pushed into a curation bin. A manual curation is required here to resolve the metadata issues. We believe such curation bins can also be used to enhance the scholarship metadata, such as adding ORCID ids for the authors, GRID ids for the organizations, adding abstracts of the articles, keywords, etc. We will briefly discuss the (VIVO-ISF ontology driven) data modelling and data architecture issues, as lessons learnt, that were encountered during the first phase of Scholar@Cornell launch. https://scholars.cornell.ed

    VIVO for visualizations and analysis

    No full text
    A presentation given virtually at "Research Output and Impact: New Tools and Concepts", DTU, Denmark

    Scholars@Cornell: A Journey from Data in Peace to Data in Use

    No full text
    In 2016, the Scholars@Cornell project was initiated aiming to advance the visibility and accessibility of Cornell scholarship and to preserve them for future generations. However, in data life cycle, data preservation and providing access to the recorded data is not the final stage. Data stored in a database is merely a record and can be of use only if human experience and insight is applied to it, data analysis is performed and data is transformed into a knowledge. The faculty and publication data is capable of revealing much more about patterns and dynamics of scholarship and the institution. Such data can support universities in their systems for managing faculty information, scholar's websites, faculty reporting and strategic decisions in general. We explore the scholarship data from the lens of a scholar, academic unit and an institutions. Unlike systems that provide web pages of researcher profiles using lists and directory-style metaphors, our work explores the power of graph analytics and infographics for navigating a rich semantic graph of scholarly data. We believe that the scholarship data, accessible in RDF format through VIVO webpages, is not easy to reuse, specifically by the software developers who have limited knowledge of semantic technologies and the VIVO data model. In Scholars@Cornell, the scholarship data is open for reuse in different ways. The data can be accessed via Data Distribution API in RDF or in JSON format. The infographics built using D3 javascript libraries can be embedded on different institutional websites. Additionally, new web applications can be developed that use scholarship knowledge graph, showcasing research areas and expertise. In this presentation, I will present an overview of the project, lessons learnt and will emphasis on data reuse and data analysis. I will discuss about our journey, how we moved from counting list items to connected graph, from data list views to data analysis and from data in peace to data in use

    Scholars@Cornell: Visualizing the scholarly record

    No full text
    As stewards of the scholarly record, Cornell University Library is developing a data and visualization service known as Scholars@Cornell with the goal of improving the visibility of Cornell research and enabling discovery of explicit and latent patterns of scholarly collaboration. We provide aggregate views of data where dynamic visualizations become the entry points into a rich graph of knowledge that can be explored interactively to answer questions such as: Who are the experts in what areas? Which departments collaborate with each other? What are patterns of interdisciplinary research? And more. Key components of the system are Symplectic Elements to provide automated citation feeds from external sources such as Web of Science, the Scholars "Feed Machine" that performs automated data curation tasks, and the VIVO semantic linked data store. The new "VIZ-VIVO" component bridges the chasm between the back-end of semantically rich data with a front-end user experience that takes advantage of new developments in the world of dynamic web visualizations. We will demonstrate a set of D3 visualizations that leverage relationships between people (e.g., faculty), their affiliations (e.g., academic departments), and published research outputs (e.g., journal articles by subject area). We will discuss our results with two of the initial pilot partners at Cornell University, the School of Engineering and the Johnson School of Management

    Scholars@Cornell: Visualizing the Scholarship Data

    No full text
    Extended Abstract, Published as a Poster at Visualizations in Practice (VIP), IEEE Workshop (Phoenix, Arizona, Oct. 2017
    corecore