15 research outputs found
Leveraging Temporal Word Embeddings for the Detection of Scientific Trends
Tracking the dynamics of science and early detection of the emerging research trends could potentially revolutionise the way research is done. For this reason, computational history of science and trend analysis have become an important area in academia and industry. This is due to the significant implications for research funding and public policy. The literature presents several emerging approaches to detecting new research trends. Most of these approaches rely mainly on citation counting. While citations have been widely used as indicators of emerging research topics, they pose several limitations. Most importantly, citations can take months or even years to progress and then to reveal trends. Furthermore, they fail to dig into the paper content.
To overcome this problem, this thesis leverages a natural language processing method – namely temporal word embeddings – that learns semantic and syntactic relations among words over time. The principle objective of this method is to study the change in pairwise similarities between pairs of scientific keywords over time, which helps to track the dynamics of science and detect the emerging scientific trends. To this end, this thesis proposes a methodological approach to tune the hyper-parameters of word2vec – the word embedding technique used in this thesis – within the scientific text. Then, it provides a suite of novel approaches that aim to perform the computational history of science by detecting the emerging scientific trends and tracking the dynamics of science. The detection of the emerging scientific trends is performed through the two approaches Hist2vec and Leap2Trend.These two approaches are, respectively, devoted to the detection of converging keywords and contextualising keywords. On the other hand, the dynamics of science is performed by Vec2Dynamics that tracks the evolvement of semantic neighborhood of keywords over time.
All of the proposed approaches have been applied to the area of machine learning and validated against different gold standards. The obtained results reveal the effectiveness of the proposed approaches to detect trends and track the dynamics of science. More specifically, Hist2vec strongly correlates with citation counts with 100% Spearman’s positive correlation. Additionally, Leap2Trend performs with more than 80% accuracy and 90% precision in detecting emerging trends. Also, Vec2Dynamics shows great potential to trace the history of machine learning literature exactly as the machine learning timeline does. Such significant findings evidence the utility of the proposed approaches for performing the computational history of science
Leveraging Word Embeddings and Transformers to Extract Semantics from Building Regulations Text
In the recent years, the interest to knowledge extraction in the architecture, engineering and construction (AEC) domain has grown dramatically. Along with the advances in the AEC domain, a massive amount of data is collected from sensors, project management software, drones and 3D scanning. However, the construction regulatory knowledge has maintained primarily in the form of unstructured text. Natural Language Processing (NLP) has been recently introduced to the construction industry to extract underlying knowledge from unstructured data. For instance, NLP can be used to extract key information from construction contracts and specifications, identify potential risks, and automate compliance checking. It is considered impractical for construction engineers and stakeholders to author formal, accurate, and structured building regulatory rules. However, previous efforts on extracting knowledge from unstructured text in AEC domain have mainly focused on basic concepts and hierarchies for ontology engineering using traditional NLP techniques, rather than deeply digging in the nature of the used NLP techniques and their abilities to capture semantics from the building regulations text. In this context, this paper focuses on the development of a semantic-based testing approach that studies the performance of modern NLP techniques, namely word embeddings and transformers, on extracting semantic regularities within the building regulatory text. Specifically, this paper studies the ability of word2vec, BERT, and Sentence BERT (SBERT) to extract semantic regularities from the British building regulations at both word and sentence levels. The UK building regulations code has been used as a dataset. The ground truth of semantic regulations has been manually curated from the well-established Brick Ontology to test the performance of the proposed NLP techniques to capture the semantic regularities from the building regulatory text. Both quantitative and qualitative analyses have been performed, and the obtained results show that modern NLP techniques can reliably capture semantic regularities from the building regulations text at both word and sentence levels, with an accuracy that reaches 80% at the word-level, and hits 100% at the sentence-level
Leap2Trend: A Temporal Word Embedding Approach for Instant Detection of Emerging Scientific Trends
Early detection of emerging research trends could potentially revolutionise the way research is done. For this reason, trend analysis has become an area of paramount importance in academia and industry. This is due to the significant implications for research funding and public policy. The literature presents several emerging approaches to detecting new research trends. Most of these approaches rely mainly on citation counting. While citations have been widely used as indicators of emerging research topics, they suffer from some limitations. For instance, citations can take months to years to progress and then to reveal trends. Furthermore, they fail to dig into paper content. To overcome this problem, we introduce Leap2Trend, a novel approach to instant detection of research trends. Leap2Trend relies on temporal word embeddings ( word2vec) to track the dynamics of similarities between pairs of keywords, their rankings and respective uprankings (ascents) over time. We applied Leap2Trend to two scientific corpora on different research areas, namely computer science and bioinformatics and we evaluated it against two gold standards Google Trends hits and Google Scholar citations. The obtained results reveal the effectiveness of our approach to detect trends with more than 80% accuracy and 90% precision in some cases. Such significant findings evidence the utility of our Leap2Trend approach for tracking and detecting emerging research trends instantly
Vec2Dynamics: A Temporal Word Embedding Approach to Exploring the Dynamics of Scientific Keywords—Machine Learning as a Case Study
The study of the dynamics or the progress of science has been widely explored with descriptive and statistical analyses. Also this study has attracted several computational approaches that are labelled together as the Computational History of Science, especially with the rise of data science and the development of increasingly powerful computers. Among these approaches, some works have studied dynamism in scientific literature by employing text analysis techniques that rely on topic models to study the dynamics of research topics. Unlike topic models that do not delve deeper into the content of scientific publications, for the first time, this paper uses temporal word embeddings to automatically track the dynamics of scientific keywords over time. To this end, we propose Vec2Dynamics, a neural-based computational history approach that reports stability of k-nearest neighbors of scientific keywords over time; the stability indicates whether the keywords are taking new neighborhood due to evolution of scientific literature. To evaluate how Vec2Dynamics models such relationships in the domain of Machine Learning (ML), we constructed scientific corpora from the papers published in the Neural Information Processing Systems (NIPS; actually abbreviated NeurIPS) conference between 1987 and 2016. The descriptive analysis that we performed in this paper verify the efficacy of our proposed approach. In fact, we found a generally strong consistency between the obtained results and the Machine Learning timeline
DeepHist: Towards a Deep Learning-based Computational History of Trends in the NIPS
Research in analysis of big scholarly data has increased in the recent past and it aims to understand research dynamics and forecast research trends. The ultimate objective in this research is to design and implement novel and scalable methods for extracting knowledge and computational history.
While citations are highly used to identify emerging/rising research topics, they can take months or even years to stabilise enough to reveal research trends. Consequently, it is necessary to develop faster yet accurate methods for trend analysis and computational history that dig into content and semantics of an article. Therefore, this paper aims to conduct a fine-grained content analysis of scientific corpora from the domain of {\it Machine Learning}. This analysis uses {DeepHist, a deep learning-based computational history approach; the approach relies on a dynamic word embedding that aims to represent words with low-dimensional vectors computed by deep neural networks. The scientific corpora come from 5991 publications from Neural Information Processing Systems (NIPS) conference between 1987 and 2015 which are divided into six -year timespans. The analysis of these corpora generates visualisations produced by applying t-distributed stochastic neighbor embedding (t-SNE) for dimensionality reduction. The qualitative and quantitative study reported here reveals the evolution of the prominent Machine Learning keywords; this evolution supports the popularity of current research topics in the field. This support is evident given how well the popularity of the detected keywords correlates with the citation counts received by their corresponding papers: Spearman's positive correlation is 100%. With such a strong result, this work evidences the utility of deep learning techniques for determining the computational history of science
Leveraging Temporal Word Embeddings for the Detection of Scientific Trends
Tracking the dynamics of science and early detection of the emerging research trends could potentially revolutionise the way research is done. For this reason, computational history of science and trend analysis have become an important area in academia and industry. This is due to the significant implications for research funding and public policy. The literature presents several emerging approaches to detecting new research trends. Most of these approaches rely mainly on citation counting. While citations have been widely used as indicators of emerging research topics, they pose several limitations. Most importantly, citations can take months or even years to progress and then to reveal trends. Furthermore, they fail to dig into the paper content.
To overcome this problem, this thesis leverages a natural language processing method – namely temporal word embeddings – that learns semantic and syntactic relations among words over time. The principle objective of this method is to study the change in pairwise similarities between pairs of scientific keywords over time, which helps to track the dynamics of science and detect the emerging scientific trends. To this end, this thesis proposes a methodological approach to tune the hyper-parameters of word2vec – the word embedding technique used in this thesis – within the scientific text. Then, it provides a suite of novel approaches that aim to perform the computational history of science by detecting the emerging scientific trends and tracking the dynamics of science. The detection of the emerging scientific trends is performed through the two approaches Hist2vec and Leap2Trend.These two approaches are, respectively, devoted to the detection of converging keywords and contextualising keywords. On the other hand, the dynamics of science is performed by Vec2Dynamics that tracks the evolvement of semantic neighborhood of keywords over time.
All of the proposed approaches have been applied to the area of machine learning and validated against different gold standards. The obtained results reveal the effectiveness of the proposed approaches to detect trends and track the dynamics of science. More specifically, Hist2vec strongly correlates with citation counts with 100% Spearman’s positive correlation. Additionally, Leap2Trend performs with more than 80% accuracy and 90% precision in detecting emerging trends. Also, Vec2Dynamics shows great potential to trace the history of machine learning literature exactly as the machine learning timeline does. Such significant findings evidence the utility of the proposed approaches for performing the computational history of science
Financial Sentiment Analysis on Twitter During Covid-19 Pandemic in the UK
The surge in Covid-19 cases seen in 2020 has caused the UK government to enact regulations to stop the virus’s spread. Along with other aspects like altered customer confidence and activity, the financial effects of these actions must be taken into account. This later can be studied from the user generated content posted on social net- works such as Twitter. In this paper, we provide a supervised technique to analyze tweets exhibiting bullish and bearish sentiments, by predicting a sentiment class positive, negative, or neutral. Both machine learning and deep learning techniques are implemented to predict our financial sentiment class. Our research highlights how word embeddings, most importantly word2vec may be effectively used to conduct sentiment analysis in the financial sector providing favourable solutions. In addition, comprehensive research has been elicited between our technique and a lexicon-based approach. The outcomes of the study indicate how well Word2Vec model with deep learning techniques outperforms the others with an accuracy of 87%
Financial Sentiment Analysis on Twitter During Covid-19 Pandemic in the UK
The surge in Covid-19 cases seen in 2020 has caused the UK government to enact regulations to stop the virus’s spread. Along with other aspects like altered customer confidence and activity, the financial effects of these actions must be taken into account. This later can be studied from the user generated content posted on social net- works such as Twitter. In this paper, we provide a supervised technique to analyze tweets exhibiting bullish and bearish sentiments, by predicting a sentiment class positive, negative, or neutral. Both machine learning and deep learning techniques are implemented to predict our financial sentiment class. Our research highlights how word embeddings, most importantly word2vec may be effectively used to conduct sentiment analysis in the financial sector providing favourable solutions. In addition, comprehensive research has been elicited between our technique and a lexicon-based approach. The outcomes of the study indicate how well Word2Vec model with deep learning techniques outperforms the others with an accuracy of 87%
FineNews: fine-grained semantic sentiment analysis on financial microblogs and news
In this paper, a fine-grained supervised approach is proposed to identify bullish and bearish sentiments associated with companies and stocks, by predicting a real-valued score between − 1 and + 1. We propose a supervised approach learned by using several feature sets, consisting of lexical features, semantic features and a combination of lexical and semantic features. Our study reveals that semantic features, most notably BabelNet synsets and semantic frames, can be successfully applied for Sentiment Analysis within the financial domain to achieve better results. Moreover, a comparative study has been conducted between our supervised approach and unsupervised approaches. The obtained experimental results show how our approach outperforms the others
MORE SENSE: MOvie REviews SENtiment analysis boosted with Semantics
Sentiment analysis is becoming one of the most active area in Natural Language Processing nowadays. Its importance coincides with the growth of social media and the open space they create for expressing opinions and emotions via reviews, forum discussions, microblogs, Twitter and social networks. Most of the existing approaches on sentiment analysis rely mainly on the presence of affect words that explicitly reflect sentiment. However, these approaches are semantically weak, that is, they do not take into account the semantics of words when detecting their sentiment in text. Only recently a few approaches (e.g. sentic computing) started investigating towards this direction. Following this trend, this paper investigates the role of semantics in sentiment analysis of movie reviews. To this end, frame semantics and lexical resources such as BabelNet are employed to extract semantic features from movie reviews that lead to more accurate sentiment analysis models. Experiments are conducted with different types of semantic information by assessing their impact in movie reviews dataset. A 10-fold cross-validation shows that F1 measure increases slightly when using semantics in sentiment analysis in social media. Results show that the proposed approach considering word's semantics for sentiment analysis is a promising direction