9 research outputs found

    Author Matching Classification with Anomaly Detection Approach for Bibliomethric Repository Data

    Get PDF
    Authors name disambiguation (AND) is a complex problem in the process of identifying an author in a digital library (DL). The AND data classification process is very much determined by the grouping process and data processing techniques before entering the classifier algorithm. In general, the data pre-processing technique used is pairwise and similarity to do author matching. In a large enough data set scale, the pairwise technique used in this study is to do a combination of each attribute in the AND dataset and by defining a binary class for each author matching combination, where the unequal author is given a value of 0 and the same author is given a value of 1. The technique produces very high imbalance data where class 0 becomes 98.9% of the amount of data compared to 1.1% of class 1. The results bring up an analysis in which class 1 can be considered and processed as data anomaly of the whole data. Therefore, anomaly detection is the method chosen in this study using the Isolation Forest algorithm as its classifier. The results obtained are very satisfying in terms of accuracy which can reach 99.5%

    Author identification in bibliographic data using deep neural networks

    Get PDF
    Author name disambiguation (AND) is a challenging task for scholars who mine bibliographic information for scientific knowledge. A constructive approach for resolving name ambiguity is to use computer algorithms to identify author names. Some algorithm-based disambiguation methods have been developed by computer and data scientists. Among them, supervised machine learning has been stated to produce decent to very accurate disambiguation results. This paper presents a combination of principal component analysis (PCA) as a feature reduction and deep neural networks (DNNs), as a supervised algorithm for classifying AND problems. The raw data is grouped into four classes, i.e., synonyms, homonyms, homonyms-synonyms, and non-homonyms-synonyms classification. We have taken into account several hyperparameters tuning, such as learning rate, batch size, number of the neuron and hidden units, and analyzed their impact on the accuracy of results. To the best of our knowledge, there are no previous studies with such a scheme. The proposed DNNs are validated with other ML techniques such as Naïve Bayes, random forest (RF), and support vector machine (SVM) to produce a good classifier. By exploring the result in all data, our proposed DNNs classifier has an outperformed other ML technique, with accuracy, precision, recall, and F1-score, which is 99.98%, 97.98%, 97.86%, and 99.99%, respectively. In the future, this approach can be easily extended to any dataset and any bibliographic records provider

    Neural network technique with deep structure for improving author homonym and synonym classification in digital libraries

    Get PDF
    Author name disambiguation (AND), also recognized as name-identification, has long been seen as a challenging issue in bibliographic data. In other words, the same author may appear under separate names, synonyms, or distinct authors may have similar to those referred to as homonyms. Some previous research has proposed AND problem. To the best of our knowledge, no study discussed specifically synonym and homonym, whereas such cases are the core in AND topic. This paper presents the classification of non-homonym-synonym, homonym-synonym, synonym, and homonym cases by using the DBLP computer science bibliography dataset. Based on the DBLP raw data, the classification process is proposed by using deep neural networks (DNNs). In the classification process, the DBLP raw data divided into five features, including name, author, title, venue, and year. Twelve scenarios are designed with a different structure to validate and select the best model of DNNs. Furthermore, this paper is also compared DNNs with other classifiers, such as support vector machine (SVM) and decision tree. The results show DNNs outperform SVM and decision tree methods in all performance metrics. The DNNs performances with three hidden layers as the best model, achieve accuracy, sensitivity, specificity, precision, and F1-score are 98.85%, 95.95%, 99.26%, 94.80%, and 95.36%, respectively. In the future, DNNs are more performing with the automated feature representation in AND processing

    Deep Neural Network Structure to Improve Individual Performance based Author Classification

    Get PDF
    This paper proposed an improved method for author name disambiguation problem, both homonym and synonym. The data prepared is the distance data of each pair of author’s attributes, Levenshtein distance are used. Using Deep Neural Networks, we found large gains on performance. The result shows that level of accuracy is 99.6% with a low number of hidden layer

    A knowledge graph embeddings based approach for author name disambiguation using literals

    Get PDF
    Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively

    Applications of artificial neural networks in three agro-environmental systems: microalgae production, nutritional characterization of soils and meteorological variables management

    Get PDF
    La agricultura es una actividad esencial para los humanos, es altamente dependiente de las condiciones meteorológicas y foco de investigación e innovación con el objetivo de enfrentar diversos desafíos. El cambio climático, calentamiento global y la degradación de los ecosistemas agrícolas son sólo algunos de los problemas que los humanos enfrentamos para continuar con la esencial producción de alimentos. Buscando la innovación en el sector agrícola, se consideraron tres tópicos principales de investigación para esta tesis; la producción de microalgas, el color del suelo y la fertilidad, y la adquisición de datos meteorológicos. Estos temas tienen roles cada vez más importantes en la agricultura, especialmente bajo la incertidumbre del futuro de la producción de alimentos. Las microalgas son una interesante alternativa para la fertilización de cultivos y la sostenibilidad del suelo; mientras que los parámetros de fertilidad del suelo necesitan ser más estudiados para desarrollar métodos de análisis de menor costo y más rápidos para ayudar al manejo. La agricultura, como actividad altamente dependiente del clima, necesita de datos meteorológicos para anticipar eventos, planificar y manejar los cultivos eficientemente. Estos temas se seleccionaron con el propósito de mejorar el estado actual de la técnica, proponer nuevas alternativas basadas, principalmente, en la aplicación de redes neuronales artificiales (ANN) como una manera novedosa de resolver los problemas y generar conocimiento de aplicación directa en sistemas de cultivos. El objetivo principal de esta tesis fue generar modelos de ANNs capaces de abordar problemas relacionados con la agricultura, como una alternativa a los métodos tradicionales y más costosos empleados en el manejo, análisis y adquisición de datos en los sistemas agrarios.Departamento de Ingeniería Agrícola y ForestalDoctorado en Ciencia e Ingeniería Agroalimentaria y de Biosistema
    corecore