333 research outputs found
Effect of forename string on author name disambiguation
In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting realāworld scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machineālearningābased disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using fullālength strings. These findings provide practical suggestions, such as restoring initialized forenames into a fullāstring format via record linkage for improved disambiguation performances.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/1/asi24298.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/2/asi24298_am.pd
Scaleāfree collaboration networks: An author name disambiguation perspective
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/1/asi24158.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/2/asi24158_am.pd
A Syllable-based Technique for Word Embeddings of Korean Words
Word embedding has become a fundamental component to many NLP tasks such as
named entity recognition and machine translation. However, popular models that
learn such embeddings are unaware of the morphology of words, so it is not
directly applicable to highly agglutinative languages such as Korean. We
propose a syllable-based learning model for Korean using a convolutional neural
network, in which word representation is composed of trained syllable vectors.
Our model successfully produces morphologically meaningful representation of
Korean words compared to the original Skip-gram embeddings. The results also
show that it is quite robust to the Out-of-Vocabulary problem.Comment: 5 pages, 3 figures, 1 table. Accepted for EMNLP 2017 Workshop - The
1st Workshop on Subword and Character level models in NLP (SCLeM
Online Review Mining: Health and Environmental Concerns on Beauty Products
Scholars in advertising, communication, marketing, and public relations have used various text-mining techniques to assess sentiments about brands, social issues, products, and policies. Introducing large-scale text analysis as its method, this study investigated the trend of consumers\u27 interests or concerns over beauty products, focusing on health and environmental issues. A dataset of 249,152 reviews by 177,345 Amazon users on around 75,000 beauty products during the 2004-2013 period was analyzed for this study. The most frequently used words were natural , healthy , and chemical . However, contrary to our expectation, beauty product consumers did not demonstrate much interest in health or environmental concerns. The main topics of the reviews did not center on health or environmental interests; they were mostly dedicated to the descriptions of product, price, delivery, and satisfaction. These findings can help scholars obtain a better understanding of consumer perception and behavior on beauty products related to health and environment issues
Hybrid Deep Learning Architecture to Forecast Maximum Load Duration Using Time-of-Use Pricing Plans
Load forecasting has received crucial research attention to reduce peak load and contribute to the stability of power grid using machine learning or deep learning models. Especially, we need the adequate model to forecast the maximum load duration based on time-of-use, which is the electricity usage fare policy in order to achieve the goals such as peak load reduction in a power grid. However, the existing single machine learning or deep learning forecasting cannot easily avoid overfitting. Moreover, a majority of the ensemble or hybrid models do not achieve optimal results for forecasting the maximum load duration based on time-of-use. To overcome these limitations, we propose a hybrid deep learning architecture to forecast maximum load duration based on time-of-use. Experimental results indicate that this architecture could achieve the highest average of recall and accuracy (83.43%) compared to benchmarkmodels. To verify the effectiveness of the architecture, another experimental result shows that energy storage system (ESS) scheme in accordance with the forecast results of the proposed model (LSTM-MATO) in the architecture could provide peak load cost savings of 17,535,700KRWeach year comparing with original peak load costs without the method. Therefore, the proposed architecture could be utilized for practical applications such as peak load reduction in the grid
- ā¦