Search CORE

70,756 research outputs found

Closing the loop: assisting archival appraisal and information retrieval in one sweep

Author: Kim Y.
Ross S.
Publication venue
Publication date: 01/01/2013
Field of study

In this article, we examine the similarities between the concept of appraisal, a process that takes place within the archives, and the concept of relevance judgement, a process fundamental to the evaluation of information retrieval systems. More specifically, we revisit selection criteria proposed as result of archival research, and work within the digital curation communities, and, compare them to relevance criteria as discussed within information retrieval's literature based discovery. We illustrate how closely these criteria relate to each other and discuss how understanding the relationships between the these disciplines could form a basis for proposing automated selection for archival processes and initiating multi-objective learning with respect to information retrieval

Crossref

Enlighten

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Author: Acevedo Francisco
Armengol Victor Diego
Bao Yujia
Barzilay Regina
Braun Danielle
Deng Zhengyi
Hughes Kevin S
Kim Heeyoon
Ouardaoui Nofal
Parmigiani Giovanni
Wang Cathy
Wang Yan
Publication venue
Publication date: 24/04/2019
Field of study

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

arXiv.org e-Print Archive

DSpace@MIT

Methods for Classifying Nonprofit Organizations According to their Field of Activity: A Report on Semi-automated Methods Based on Text

Author: Karner Dominik
Litofcenko Julia
Maier Florentine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

There are various methods for classifying nonprofit organizations (NPOs) according to their field of activity. We report our experiences using two semi-automated methods based on textual data: rule-based classification and machine learning with curated keywords. We use those methods to classify Austrian nonprofit organizations based on the International Classification of Nonprofit Organizations. Those methods can provide a solution to the widespread research problem that quantitative data on the activities of NPOs are needed but not readily available from administrative data, long high-quality texts describing NPOs' activities are mostly unavailable, and human labor resources are limited. We find that in such a setting, rule-based classification performs about as well as manual human coding in terms of precision and sensitivity, while being much more labor-saving. Hence, we share our insights on how to efficiently implement such a rule-based approach. To address scholars with a background in data analytics as well as those without, we provide non-technical explanations and open-source sample code that is free to use and adapt

Elektronische Publikationen der Wirtschaftsuniversität Wien

Using a neural network-based feature extraction method to facilitate citation screening for systematic reviews

Author: KONTONATSIOS GEORGIOS
KORKONTZELOS YANNIS
MATTHEW PETER
SPENCER SALLY
Publication venue: 'Elsevier BV'
Publication date: 01/07/2020
Field of study

Edge Hill University Research Information Repository