4 research outputs found

    Positive and Negative Sentiment Words in a Blog Corpus Written in Hebrew

    Get PDF
    AbstractIn this research, given a corpus containing blog posts written in Hebrew and two seed sentiment lists, we analyze the positive and negative sentences included in the corpus, and special groups of words that are associated with the positive and negative seed words. We discovered many new negative words (around half of the top 50 words) but only one positive word. Among the top words that are associated with the positive seed words, we discovered various first-person and third-person pronouns. Intensifiers were found for both the positive and negative seed words. Most of the corpus’ sentences are neutral. For the rest, the rate of positive sentences is above 80%. The sentiment scores of the top words that are associated with the positive words are significantly higher than those of the top words that are associated with the negative words.Our conclusions are as follows. Positive sentences more “refer to” the authors themselves (first-person pronouns and related words) and are also more general, e.g., more related to other people (third-person pronouns), while negative sentences are much more concentrated on negative things and therefore contain many new negative words. Israeli bloggers tend to use intensifiers in order to emphasize or even exaggerate their sentiment opinions (both positive and negative). These bloggers not only write much more positive sentences than negative sentences, but also write much longer positive sentences than negative sentences

    Study of keyword extraction techniques for electric double-layer capacitor domain using text similarity indexes: An experimental analysis

    Get PDF
    Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques

    An automated materials and processes identification tool for material informatics using deep learning approach

    Get PDF
    This article reports a tool that enables Materials Informatics, termed as MatRec, via a deep learning approach. The tool captures data, makes appropriate domain suggestions, extracts various entities such as materials and processes, and helps to establish entity-value relationships. This tool uses keyword extraction, a document similarity index to suggest relevant documents, and a deep learning approach employing Bi-LSTM for entity extraction. For example, materials and processes for electrical charge storage under an electric double layer capacitor (EDLC) mechanism are demonstrated herewith. A knowledge graph approach finds and visualizes different latent knowledge sets from the processed information. The MatRec received an F1 score of 9̃6% for entity extraction, 8̃3% for material-value relationship extraction, and 8̃7% for process-value relationship extraction, respectively. The proposed MatRec could be extended to solve material selection issues for various applications and could be an excellent tool for academia and industry
    corecore