7 research outputs found

    The Ensemble MESH-Term Query Expansion Models Using Multiple LDA Topic Models and ANN Classifiers in Health Information Retrieval

    Get PDF
    Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH (Medical Subject Headings) terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), which were included in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by 1) threshold values of topic probability (TP) and word probability (WP) or 2) an ANN (Artificial Neural Network) classifier. Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result. However, the threshold values optimized for a particular LDA model did not perform well in other LDA models with different numbers of topics. An ANN classifier was employed to overcome the weakness of the QE model depending on LDA thresholds by automatically categorizing MeSH terms (positive/negative/neutral) for QE. ANN classifiers were trained on word features related to the LDA model and collection. Two types of QE models (WSW & PWS) using an LDA model and an ANN classifier were proposed: 1) Word Score Weighting (WSW) where the probability of being a positive/negative/neutral word was used to weight the original word score, and 2) Positive Word Selection (PWS) where positive words were identified by the ANN classifier. Forty WSW models showed better average mean infAP and infNDCG scores than the PWS models when the top 7 words were selected for QE. Both approaches based on a binary ANN classifier were effective in increasing infAP and infNDCG, statistically, significantly, compared with the scores of the baseline run. A 3-class classifier performed worse than the binary classifier. The proposed ensemble QE models integrated multiple ANN classifiers with multiple LDA models. Ensemble QE models combined multiple WSW/PWS models and one or multiple classifiers. Multiple classifiers were more effective in selecting relevant words for QE than one classifier. In ensemble QE (WSW/PWS) models, the top k words added to the original queries were effective to increase infAP and infNDCG scores. The ensemble QE model (WSW) using three classifiers showed statistically significant improvements for infAP and infNDCG in the mean scores for 30 queries when the top 3 words were added. The ensemble QE model (PWS) using four classifiers showed statistically significant improvements for 30 queries in the mean infAP and infNDCG scores

    Open Peer Review in Scientific Publishing: A Web Mining Study of PeerJ Authors and Reviewers

    Get PDF
    Purpose: To understand how authors and reviewers are accepting and embracing Open Peer Review (OPR), one of the newest innovations in the open science movement. Design: This research collected and analyzed data from the Open Access journal PeerJ over its first three years (2013-2016). Web data were scraped, cleaned, and structured using several Web tools and programs. The structured data were imported into a relational database. Data analyses were conducted using analytical tools as well as programs developed by the researchers. Findings: PeerJ, which supports optional OPR, has a broad international representation of authors and referees. Approximately 73.89% of articles provide full review histories. Of the articles with published review histories, 17.61% had identities of all reviewers and 52.57% had at least one signed reviewer. In total, 43.23% of all reviews were signed. The observed proportions of signed reviews have been relatively stable over the period since the journal’s inception. Limitations: This research is constrained by the availability of the peer review history data. Some peer reviews were not available when the authors opted out of publishing their review histories. The anonymity of reviewers made it impossible to give an accurate count of reviewers who contributed to the review process. Implications: These findings shed light on the current characteristics of OPR. Given the policy that authors are encouraged to make their articles’ review history public and referees are encouraged to sign their review reports, the three years of PeerJ review data demonstrate that there is still some reluctance by authors to make their reviews public and by reviewers to identify themselves. Originality/Value: This is the first study to closely examine PeerJ as an example of an OPR model journal. As open science moves further towards open research, OPR is a final and critical component. Research in this area must identify the best policies and paths towards a transparent and open peer review process for scientific communication

    Designing quantitative data representations to support people’s understanding of the risk of Covid-19

    Get PDF
    Since the COVID-19 outbreak, various forms of data representations (e.g., graphs, tables, and charts) have served to illustrate the diverse risks from the virus. These risks include daily cases, hospitalizations, and deaths and involve numerous and sometimes complex quantified attributes (e.g., numbers, time series, and indices) in their representation. Complicating the matter, even in cases in which people analyze identical data, they often interpret it differently. Everyone interacts with data differently in daily living (Ryan & Evers, 2020) due to varied competence in interpreting quantitative data as well as different applications of quantitative ideas in real-world contexts (Hallett, 2003; Wiest et al., 2007). This project aims to support people’s informed- and evidence-based decisions about the severity of COVID-19 and their behavioral choices by assisting their productive assessment and interpretations of quantitative structures in data representations. In such an effort, this research team has developed interactive applets (available at www.covidtaser.com) with three data representations: Risk Comparison, Projection, and Log scaled Graphs of COVID-19. These representations are based on empirical research designed to investigate and promote people’s understandings of: a) chances of facing the risks from the virus in comparison to those from daily activities (e.g., driving), b) impacts of preventive measures (e.g., social distancing), and c) interpreting linear and log scaled graphs. The project representations are designed in a way that better facilitates people’s quantitative reasonings based on the cognitive models of mathematical thinking found in the project and models from prior research. Conducting task-based clinical interviews, this project is studying how the interactive applets promote people’s in making sense of what COVID-19 data conveys and its implications in their health behavior. The project results contribute to the literature in STEM education by providing insights into the importance and utility of quantitative models. Furthermore, the research-based products also add value to promoting data literate society

    Situational Virtual Reference: Get Help When You Need It

    Get PDF
    This study aims to increase the use of virtual reference service by increasing the awareness of the availability of the service to users who really need it. A new situationally-based virtual reference interface, called the sVR interface, has been designed to reflect different levels of user search success. Findings from an eight-month field study done in a university library improved our understanding of how to effectively enhance the availability of virtual reference service to users who need it. A discussion about balancing the availability and the intrusiveness of virtual reference service is also provided.publishedye

    Open Peer Review in Scientific Publishing: A Web Mining Study of <i>PeerJ</i> Authors and Reviewers

    No full text
    &lt;b&gt;Purpose:&lt;/b&gt; To understand how authors and reviewers are accepting and embracing Open Peer Review (OPR), one of the newest innovations in the Open Science movement.&lt;br&gt;&lt;b&gt;Design/methodology/approach:&lt;/b&gt; This research collected and analyzed data from the Open Access journal &lt;i&gt;PeerJ&lt;/i&gt; over its first three years (2013-2016). Web data were scraped, cleaned, and structured using several Web tools and programs. The structured data were imported into a relational database. Data analyses were conducted using analytical tools as well as programs developed by the researchers.&lt;br&gt;&lt;b&gt;Findings:&lt;/b&gt; &lt;i&gt;PeerJ&lt;/i&gt;, which supports optional OPR, has a broad international representation of authors and referees. Approximately 73.89% of articles provide full review histories. Of the articles with published review histories, 17.61% had identities of all reviewers and 52.57% had at least one signed reviewer. In total, 43.23% of all reviews were signed. The observed proportions of signed reviews have been relatively stable over the period since the Journal&#39;s inception.&lt;br&gt;&lt;b&gt;Research limitations:&lt;/b&gt; This research is constrained by the availability of the peer review history data. Some peer reviews were not available when the authors opted out of publishing their review histories. The anonymity of reviewers made it impossible to give an accurate count of reviewers who contributed to the review process.&lt;br&gt;&lt;b&gt;Practical implications:&lt;/b&gt; These findings shed light on the current characteristics of OPR. Given the policy that authors are encouraged to make their articles&#39; review history public and referees are encouraged to sign their review reports, the three years of &lt;i&gt;PeerJ&lt;/i&gt; review data demonstrate that there is still some reluctance by authors to make their reviews public and by reviewers to identify themselves.&lt;br&gt;&lt;b&gt;Originality/value:&lt;/b&gt; This is the first study to closely examine &lt;i&gt;PeerJ&lt;/i&gt; as an example of an OPR model journal. As Open Science moves further towards open research, OPR is a final and critical component. Research in this area must identify the best policies and paths towards a transparent and open peer review process for scientific communication.&lt;b&gt;Purpose:&lt;/b&gt; To understand how authors and reviewers are accepting and embracing Open Peer Review (OPR), one of the newest innovations in the Open Science movement.&lt;br&gt;&lt;b&gt;Design/methodology/approach:&lt;/b&gt; This research collected and analyzed data from the Open Access journal &lt;i&gt;PeerJ&lt;/i&gt; over its first three years (2013-2016). Web data were scraped, cleaned, and structured using several Web tools and programs. The structured data were imported into a relational database. Data analyses were conducted using analytical tools as well as programs developed by the researchers.&lt;br&gt;&lt;b&gt;Findings:&lt;/b&gt; &lt;i&gt;PeerJ&lt;/i&gt;, which supports optional OPR, has a broad international representation of authors and referees. Approximately 73.89% of articles provide full review histories. Of the articles with published review histories, 17.61% had identities of all reviewers and 52.57% had at least one signed reviewer. In total, 43.23% of all reviews were signed. The observed proportions of signed reviews have been relatively stable over the period since the Journal&#39;s inception.&lt;br&gt;&lt;b&gt;Research limitations:&lt;/b&gt; This research is constrained by the availability of the peer review history data. Some peer reviews were not available when the authors opted out of publishing their review histories. The anonymity of reviewers made it impossible to give an accurate count of reviewers who contributed to the review process.&lt;br&gt;&lt;b&gt;Practical implications:&lt;/b&gt; These findings shed light on the current characteristics of OPR. Given the policy that authors are encouraged to make their articles&#39; review history public and referees are encouraged to sign their review reports, the three years of &lt;i&gt;PeerJ&lt;/i&gt; review data demonstrate that there is still some reluctance by authors to make their reviews public and by reviewers to identify themselves.&lt;br&gt;&lt;b&gt;Originality/value:&lt;/b&gt; This is the first study to closely examine &lt;i&gt;PeerJ&lt;/i&gt; as an example of an OPR model journal. As Open Science moves further towards open research, OPR is a final and critical component. Research in this area must identify the best policies and paths towards a transparent and open peer review process for scientific communication.</span
    corecore