4,849 research outputs found

    ENHANCING LITERATURE REVIEW METHODS - TOWARDS MORE EFFICIENT LITERATURE RESEARCH WITH LATENT SEMANTIC INDEXING

    Get PDF
    Nowadays, the facilitated access to increasing amounts of information and scientific resources means that more and more effort is required to conduct comprehensive literature reviews. Literature search, as a fundamental, complex, and time-consuming step in every literature research process, is part of many established scientific methods. However, it is still predominantly supported by search techniqus based on conventional term-matching methods. We address the lack of semantic approaches in this context by proposing an enhancement of established literature review methods. For this purpose, we followed design science research (DSR) principles in order to develop artifacts and implement a prototype of our Tool for Semantic Indexing and Similarity Quries (TSISQ) based on the core concepts of latent semantic indexing (LSI). Its applicability is demonstrated and evaluated in a case study. Results indicate that the presented approach can help save valuable time in finding basic literature in a desired research field or increasing the comprehensiveness of a review by efficiently identifying sources that otherwise would not have been taken into account. The target audience for our findings includes researchers who need to efficiently gain an overview of a specific research field, deepen their knowledge or refine the theoretical foundations of their research

    Temporal word embeddings for dynamic user profiling in Twitter

    Get PDF
    The research described in this paper focused on exploring the domain of user profiling, a nascent and contentious technology which has been steadily attracting increased interest from the research community as its potential for providing personalised digital services is realised. An extensive review of related literature revealed that limited research has been conducted into how temporal aspects of users can be captured using user profiling techniques. This, coupled with the notable lack of research into the use of word embedding techniques to capture temporal variances in language, revealed an opportunity to extend the Random Indexing word embedding technique such that the interests of users could be modelled based on their use of language. To achieve this, this work concerned itself with extending an existing implementation of Temporal Random Indexing to model Twitter users across multiple granularities of time based on their use of language. The product of this is a novel technique for temporal user profiling, where a set of vectors is used to describe the evolution of a Twitter user’s interests over time through their use of language. The vectors produced were evaluated against a temporal implementation of another state-of-the-art word embedding technique, the Word2Vec Dynamic Independent Skip-gram model, where it was found that Temporal Random Indexing outperformed Word2Vec in the generation of temporal user profiles

    TOPIC MODELLING METHODOLOGY: ITS USE IN INFORMATION SYSTEMS AND OTHER MANAGERIAL DISCIPLINES

    Get PDF
    Over the last decade, quantitative text mining approaches to content analysis have gained increasing traction within information systems research, and related fields, such as business administration. Recently, topic models, which are supposed to provide their user with an overview of themes being dis-cussed in documents, have gained popularity. However, while convenient tools for the creation of this model class exist, the evaluation of topic models poses significant challenges to their users. In this research, we investigate how questions of model validity and trustworthiness of presented analyses are addressed across disciplines. We accomplish this by providing a structured review of methodological approaches across the Financial Times 50 journal ranking. We identify 59 methodological research papers, 24 implementations of topic models, as well as 33 research papers using topic models in In-formation Systems (IS) research, and 29 papers using such models in other managerial disciplines. Results indicate a need for model implementations usable by a wider audience, as well as the need for more implementations of model validation techniques, and the need for a discussion about the theoretical foundations of topic modelling based research

    A Comprehensive Review of the Three Main Topic Modeling Algorithms and Challenges in Albanian Employability Skills

    Get PDF
    Today’s jobseekers face many obstacles while trying to find a career that aligns with their interests, employability soft skills, and professional experience. In Albania, jobseekers frequently initiate their job search by actively exploring job vacancies listed on various online job portals. The analysis of job vacancies posted online provides an added advantage to the labour market actors compared to traditional survey-based analyses. This is because it enables a faster analytical process, promotes decision-making based on accurate data, and should be carefully considered by every country when formulating their Labor Market Policies. Since the data posted online are unlabelled, it has been proven that the potential of unsupervised learning techniques, more precisely the Topic Modelling algorithms, is outstanding when applied to analysing job vacancies, mainly with regard to assessing employability soft skills. Algorithms in topic modelling are essential for uncovering hidden patterns in texts, facilitating the extraction of important data, generating document summaries, and enhancing content comprehension. This paper analyses and compares the three primary methodologies and algorithms used in topic modelling, which can be applied to analyse employability soft-skills: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and BERTopic. At the end of the paper, conclusions are drawn regarding superior performance and optimal algorithm applicability, challenges, and limitations through a review of studies conducted in the Albanian job market

    Semantics-based clustering approach for similar research area detection

    Get PDF
    The manual process of searching out individuals in an already existing research field is cumbersome and time-consuming. Prominent and rookie researchers alike are predisposed to seek existing research publications in a research field of interest before coming up with a thesis. From extant literature, automated similar research area detection systems have been developed to solve this problem. However, most of them use keyword-matching techniques, which do not sufficiently capture the implicit semantics of keywords thereby leaving out some research articles. In this study, we propose the use of Ontology-based pre-processing, Latent Semantic Indexing and K-Means Clustering to develop a prototype similar research area detection system, that can be used to determine similar research domain publications. Our proposed system solves the challenge of high dimensionality and data sparsity faced by the traditional document clustering technique. Our system is evaluated with randomly selected publications from faculties in Nigerian universities and results show that the integration of ontologies in preprocessing provides more accurate clustering results

    Information Retrieval Performance Enhancement Using The Average Standard Estimator And The Multi-criteria Decision Weighted Set

    Get PDF
    Information retrieval is much more challenging than traditional small document collection retrieval. The main difference is the importance of correlations between related concepts in complex data structures. These structures have been studied by several information retrieval systems. This research began by performing a comprehensive review and comparison of several techniques of matrix dimensionality estimation and their respective effects on enhancing retrieval performance using singular value decomposition and latent semantic analysis. Two novel techniques have been introduced in this research to enhance intrinsic dimensionality estimation, the Multi-criteria Decision Weighted model to estimate matrix intrinsic dimensionality for large document collections and the Average Standard Estimator (ASE) for estimating data intrinsic dimensionality based on the singular value decomposition (SVD). ASE estimates the level of significance for singular values resulting from the singular value decomposition. ASE assumes that those variables with deep relations have sufficient correlation and that only those relationships with high singular values are significant and should be maintained. Experimental results over all possible dimensions indicated that ASE improved matrix intrinsic dimensionality estimation by including the effect of both singular values magnitude of decrease and random noise distracters. Analysis based on selected performance measures indicates that for each document collection there is a region of lower dimensionalities associated with improved retrieval performance. However, there was clear disagreement between the various performance measures on the model associated with best performance. The introduction of the multi-weighted model and Analytical Hierarchy Processing (AHP) analysis helped in ranking dimensionality estimation techniques and facilitates satisfying overall model goals by leveraging contradicting constrains and satisfying information retrieval priorities. ASE provided the best estimate for MEDLINE intrinsic dimensionality among all other dimensionality estimation techniques, and further, ASE improved precision and relative relevance by 10.2% and 7.4% respectively. AHP analysis indicates that ASE and the weighted model ranked the best among other methods with 30.3% and 20.3% in satisfying overall model goals in MEDLINE and 22.6% and 25.1% for CRANFIELD. The weighted model improved MEDLINE relative relevance by 4.4%, while the scree plot, weighted model, and ASE provided better estimation of data intrinsic dimensionality for CRANFIELD collection than Kaiser-Guttman and Percentage of variance. ASE dimensionality estimation technique provided a better estimation of CISI intrinsic dimensionality than all other tested methods since all methods except ASE tend to underestimate CISI document collection intrinsic dimensionality. ASE improved CISI average relative relevance and average search length by 28.4% and 22.0% respectively. This research provided evidence supporting a system using a weighted multi-criteria performance evaluation technique resulting in better overall performance than a single criteria ranking model. Thus, the weighted multi-criteria model with dimensionality reduction provides a more efficient implementation for information retrieval than using a full rank model
    corecore