17 research outputs found
Does deep learning help topic extraction? A kernel k-means clustering method with word embedding
© 2018 All rights reserved. Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities
Characterizing the potential of being emerging generic technologies: A Bi-Layer Network Analytics-based Prediction Method
© 2019 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. All rights reserved. Despite tremendous involvement of bibliometrics in profiling technological landscapes and identifying emerging topics, how to predict potential technological change is still unclear. This paper proposes a bi-layer network analytics-based prediction method to characterize the potential of being emerging generic technologies. Initially, based on the innovation literature, three technological characteristics are defined, and quantified by topological indicators in network analytics; a link prediction approach is applied for reconstructing the network with weighted missing links, and such reconstruction will also result in the change of related technological characteristics; the comparison between the two ranking lists of terms can help identify potential emerging generic technologies. A case study on predicting emerging generic technologies in information science demonstrates the feasibility and reliability of the proposed method
VERDAD Y VALIDEZ DEL CONOCIMIENTO DESDE LA TEORÍA DE SEÑUELOS POR SUPUESTAS PREMISAS FALSAS
The aim of the study was to describe the truth and validity of knowledge from the theory of decoys by supposed false premises. Th e study was carried out between January and February 2022 where the following was considered, from the ScScienceatabase (Editorial Elsevier) and through the search equation and without fi lters: the truth of knowledge. The corresponding articles were selected through a non-probabilistic sampling for convenience: 1st) period of 2019 - 2021, and 2nd) review. Then, a systematic random probabilistic sampling with a rank of 5 in the arithmetic progression was carried out where 35 articles (10% / total) were selected for the conceptual analysis of truth and validity. It was considered, the recognition of the contrast of hypotheses from an almost maximum probability (0.01) and valued in the process of searching for the truth and the existence of validity, the test on two lures of false premises (P1*, P2* and P1**, P2**), since their analysis of logical reasoning determines that they are true. It is concluded that the truth and validity of knowledge can be, from accepting or rejecting the premises, but a criterion of judgment for the decision to contrast is to indicate any determination where the rejection is considered from the interpretive logic itself among what is selected. to prove, and the premises that are thought to be “allegedly false.”El objetivo del estudio fue describir la verdad y validez del conocimiento desde la teoría de señuelos por supuestas premisas falsas. El estudio se realizó entre enero y febrero de 2022 donde se consideró, desde la base de datos Sciendirect (Editorial Elsevier) y mediante la ecuación de búsqueda y sin filtros lo siguiente: truth of knowledge. Se seleccionó mediante un muestro no probabilístico por conveniencia, los artículos que correspondieron: 1ro) periodo de 2019 – 2021, y 2do) de revisión. Luego, se realizó un muestreo probabilístico aleatorio sistemático con rango de 5 en la progresión aritmética donde se seleccionaron 35 artículos (10 % / total) para el análisis conceptual de la verdad y la validez. Se consideró, el reconocimiento del contraste de hipótesis desde una probabilidad casi máxima (0,01) y valorarse en el proceso de búsqueda de la verdad y la existencia de la validez, el examen sobre dos señuelos de premisas falsas (P1*, P2* y P1**, P2**), pues su análisis de razonamiento lógico, determina que sean verdaderas. Se concluye, que la verdad y validez del conocimiento pueden ser, desde aceptar o rechazar las premisas, pero un criterio de juicio para la decisión a contrastar, es indicarse cualquier determinación donde se considere el rechazo desde la propia lógica interpretativa entre lo que se selecciona para demostrar, y las premisas que se piensan sean “supuestas falsas”
Semi-automated extraction of research topics and trends from NCI funding in radiological sciences from 2000-2020
Investigators, funders, and the public desire knowledge on topics and trends
in publicly funded research but current efforts in manual categorization are
limited in scale and understanding. We developed a semi-automated approach to
extract and name research topics, and applied this to \$1.9B of NCI funding
over 21 years in the radiological sciences to determine micro- and macro-scale
research topics and funding trends. Our method relies on sequential clustering
of existing biomedical-based word embeddings, naming using subject matter
experts, and visualization to discover trends at a macroscopic scale above
individual topics. We present results using 15 and 60 cluster topics, where we
found that 2D projection of grant embeddings reveals two dominant axes:
physics-biology and therapeutic-diagnostic. For our dataset, we found that
funding for therapeutics- and physics-based research have outpaced diagnostics-
and biology-based research, respectively. We hope these results may (1) give
insight to funders on the appropriateness of their funding allocation, (2)
assist investigators in contextualizing their work and explore neighboring
research domains, and (3) allow the public to review where their tax dollars
are being allocated.Comment: Presented at the American Society of Radiation Oncology annual
meeting in 2021 ((doi: 10.1016/j.ijrobp.2021.07.263) and the Practical Big
Data Workshop 202
Recommended from our members
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods
Hierarchical topic tree: A hybrid model comprising network analysis and density peak search
Topic hierarchies can help researchers to develop a quick and concise understanding of the main themes and concepts in a field of interest. This is especially useful for newcomers to a field or those with a passing need for basic knowledge of a research landscape. Yet, despite a plethora of studies into hierarchical topic identification, there still lacks a model that is comprehensive enough or adaptive enough to extract the topics from a corpus, deal with the concepts shared by multiple topics, arrange the topics in a hierarchy, and give each topic an appropriate name. Hence, this paper presents a one-stop framework for generating fully-conceptualized hierarchical topic trees. First, we generate a co-occurrence network based on key terms extracted from a corpus of documents. Then a density peak search algorithm is developed and applied to identify the core topic terms, which are subsequently used as topic labels. An overlapping community allocation algorithm follows to detect topics and possible overlaps between them. Lastly, the density peak search and overlapping community allocation algorithms run recursively to structure the topics into a hierarchical tree. The feasibility, reliability, and extensibility of the proposed framework are demonstrated through a case study on the field of computer science
Verdad y validez del conocimiento. Premisas para la consultoría administrativa.
This research shows relevant information to get into the current situation of MSMEs in Latin America, regarding the use of digital marketing strategies. The interest in exploring this topic is due to the relevance of the use of digital media, since currently many micro and small businesses are still afraid to enter these areas, because they have limited knowledge of their properties and management. This research seeks to promote the continued adoption of these digital tools, by applying appropriate e-marketing solutions to ensure the sustained success of the business over time and thus be prepared to face the challenges of the environment. For the selection of the bibliographic sources different filtering was used, specifying to choose researches according to our study topic that have been published from the year 2016 to the year 2021. At the end of this process, a total of 12 indexed journal articles, 2 undergraduate theses and 1 book were obtained. According to the results found, it can be observed that, in most of the countries analyzed, large companies are the ones that have obtained the best and greatest benefit from digital marketing, on the other hand, micro and small companies have not yet been able to efficiently apply this type of digital tools, due to the degree of ignorance of these and also because of the scarcity of their budget.El estudio tiene como objetivo sintetizar las argumentaciones acerca de verdad y conocimiento científico, desde teorías y apuntes de diferentes autores; para referir el proceso de validez del conocimiento como resultado de investigación. A través de tres motores de búsquedas (ScienceDirect, Scopus y SciELO), se realizó una exploración pormenorizada con la ecuación en español e inglés: “verdad”, “validez cognoscitiva”, “conocimiento científico” y “consultoría administrativa”. Se revisaron 29 artículos para el análisis conceptual de las palabras claves, mediante 28 revistas científicas, considerándose el contraste entre la relación de las mismas con la finalidad de construcción de premisas para la instrumentación e implementación de la consultoría administrativa. Pues, a partir del conocimiento empírico, se realiza un análisis para la construcción del conocimiento científico, definiendo las cualidades superiores entre ellos. Se constata la relación del conocimiento con la administración de consultoría empresarial, constituyendo premisas en la gestión del conocimiento, debido a que, los análisis referenciales estudiados constituyen aportes, que permiten al gestor en consultoría administrativa contribuir a la planeación y concepción estratégica, en la gestión integral de los procesos organizacionales. Se concluye con supuestas premisas de aceptar o rechazar, a partir de la necesidad de interacción de diferentes disciplinas para validar conocimiento como fuente originaria para la administración
Data for: Does deep learning help topic extraction? A kernel k-means clustering method with word embedding
The 4770 dataset includes 4770 articles in the Web of Science database, covering 10 disciplines, such as artificial intelligence, business, history, and chemistry.The 577 dataset includes 577 proposals granted by the National Science Foundation of the United States, and all the 577 proposals are within the area of computer science but are in different sub areas of computer science.The 6767 dataset includes 6767 articles published in Journal of the Association for Information Science and Technology, Journal of Informetrics, and Scientometrics from 2000 to 2016. No labels are given for this dataset.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV
Dynamic network analytics for recommending scientific collaborators
Collaboration is one of the most important contributors to scientific advancement and a crucial aspect of an academic’s career. However, the explosion in academic publications has, for some time, been making it more challenging to find suitable research partners. Recommendation approaches to help academics find potential collaborators are not new. However, the existing methods operate on static data, which can render many suggestions less useful or out of date. The approach presented in this paper simulates a dynamic network from static data to gain further insights into the changing research interests, activities and co-authorships of scholars in a field–all insights that can improve the quality of the recommendations produced. Following a detailed explanation of the entire framework, from data collection through to recommendation modelling, we provide a case study on the field of information science to demonstrate the reliability of the proposed method, and the results provide empirical insights to support decision-making in related stakeholders—e.g., scientific funding agencies, research institutions and individual researchers in the field