2 research outputs found
Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey
Topic modeling is one of the most powerful techniques in text mining for data
mining, latent data discovery, and finding relationships among data, text
documents. Researchers have published many articles in the field of topic
modeling and applied in various fields such as software engineering, political
science, medical and linguistic science, etc. There are various methods for
topic modeling, which Latent Dirichlet allocation (LDA) is one of the most
popular methods in this field. Researchers have proposed various models based
on the LDA in topic modeling. According to previous work, this paper can be
very useful and valuable for introducing LDA approaches in topic modeling. In
this paper, we investigated scholarly articles highly (between 2003 to 2016)
related to Topic Modeling based on LDA to discover the research development,
current trends and intellectual structure of topic modeling. Also, we summarize
challenges and introduce famous tools and datasets in topic modeling based on
LDA.Comment: arXiv admin note: text overlap with arXiv:1505.07302 by other author
Towards FAIR protocols and workflows: The OpenPREDICT case study
It is essential for the advancement of science that scientists and
researchers share, reuse and reproduce workflows and protocols used by others.
The FAIR principles are a set of guidelines that aim to maximize the value and
usefulness of research data, and emphasize a number of important points
regarding the means by which digital objects are found and reused by others.
The question of how to apply these principles not just to the static input and
output data but also to the dynamic workflows and protocols that consume and
produce them is still under debate and poses a number of challenges. In this
paper we describe our inclusive and overarching approach to apply the FAIR
principles to workflows and protocols and demonstrate its benefits. We apply
and evaluate our approach on a case study that consists of making the PREDICT
workflow, a highly cited drug repurposing workflow, open and FAIR. This
includes FAIRification of the involved datasets, as well as applying semantic
technologies to represent and store data about the detailed versions of the
general protocol, of the concrete workflow instructions, and of their execution
traces. A semantic model was proposed to better address these specific
requirements and were evaluated by answering competency questions. This
semantic model consists of classes and relations from a number of existing
ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then
to formulate and answer new kinds of competency questions. Our evaluation shows
the high degree to which our FAIRified OpenPREDICT workflow now adheres to the
FAIR principles and the practicality and usefulness of being able to answer our
new competency questions.Comment: Preprint. Submitted to PeerJ on 13th November 2019. 3 appendixes as
PDF file