Search CORE

2 research outputs found

Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey

Author: Feng Xia
Jelodar Hamed
Jiang Xiahui
Li Yanchao
Wang Yongli
Yuan Chi
Zhao Liang
Publication venue
Publication date: 05/12/2018
Field of study

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modeling, which Latent Dirichlet allocation (LDA) is one of the most popular methods in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated scholarly articles highly (between 2003 to 2016) related to Topic Modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. Also, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.Comment: arXiv admin note: text overlap with arXiv:1505.07302 by other author

arXiv.org e-Print Archive

Towards FAIR protocols and workflows: The OpenPREDICT case study

Author: Ayyar Sandeep
Celebi Remzi
Dumontier Michel
Hassan Ahmed A.
Kuhn Tobias
Moreira Joao Rebelo
Ridder Lars
Publication venue
Publication date: 20/11/2019
Field of study

It is essential for the advancement of science that scientists and researchers share, reuse and reproduce workflows and protocols used by others. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize a number of important points regarding the means by which digital objects are found and reused by others. The question of how to apply these principles not just to the static input and output data but also to the dynamic workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe our inclusive and overarching approach to apply the FAIR principles to workflows and protocols and demonstrate its benefits. We apply and evaluate our approach on a case study that consists of making the PREDICT workflow, a highly cited drug repurposing workflow, open and FAIR. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. A semantic model was proposed to better address these specific requirements and were evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.Comment: Preprint. Submitted to PeerJ on 13th November 2019. 3 appendixes as PDF file

arXiv.org e-Print Archive