407 research outputs found

    Tracing the evolution of service robotics : Insights from a topic modeling approach

    Get PDF
    Acord transformatiu CRUE-CSICAltres ajuts: Helmholtz Association (HIRG-0069)Altres ajuts: Russian Science Foundation (RSF grant number 19-18-00262)Taking robotic patents between 1977 and 2017 and building upon the topic modeling technique, we extract their latent topics, analyze how important these topics are over time, and how they are related to each other looking at how often they are recombined in the same patents. This allows us to differentiate between more and less important technological trends in robotics based on their stage of diffusion and position in the space of knowledge represented by a topic graph, where some topics appear isolated while others are highly interconnected. Furthermore, utilizing external reference texts that characterize service robots from a technical perspective, we propose and apply a novel approach to match the constructed topics to service robotics. The matching procedure is based on frequency and exclusivity of words overlapping between the patents and the reference texts. We identify around 20 topics belonging to service robotics. Our results corroborate earlier findings, but also provide novel insights on the content and stage of development of application areas in service robotics. With this study we contribute to a better understanding of the highly dynamic field of robotics as well as to new practices of utilizing the topic modeling approach, matching the resulting topics to external classifications and applying to them metrics from graph theory

    Tracing the evolution of service robotics : Insights from a topic modeling approach

    Get PDF
    Altres ajuts: Acord transformatiu CRUE-CSICAltres ajuts: Helmholtz Association (HIRG-0069)Altres ajuts: Russian Science Foundation (RSF grant number 19-18-00262)Taking robotic patents between 1977 and 2017 and building upon the topic modeling technique, we extract their latent topics, analyze how important these topics are over time, and how they are related to each other looking at how often they are recombined in the same patents. This allows us to differentiate between more and less important technological trends in robotics based on their stage of diffusion and position in the space of knowledge represented by a topic graph, where some topics appear isolated while others are highly interconnected. Furthermore, utilizing external reference texts that characterize service robots from a technical perspective, we propose and apply a novel approach to match the constructed topics to service robotics. The matching procedure is based on frequency and exclusivity of words overlapping between the patents and the reference texts. We identify around 20 topics belonging to service robotics. Our results corroborate earlier findings, but also provide novel insights on the content and stage of development of application areas in service robotics. With this study we contribute to a better understanding of the highly dynamic field of robotics as well as to new practices of utilizing the topic modeling approach, matching the resulting topics to external classifications and applying to them metrics from graph theory

    Metric for seleting the number of topics in the LDA Model

    Get PDF
    The latest technological trends are driving a vast and growing amount of textual data. Topic modeling is a useful tool for extracting information from large corpora of text. A topic template is based on a corpus of documents, discovers the topics that permeate the corpus and assigns documents to those topics. The Latent Dirichlet Allocation (LDA) model is the main, or most popular, of the probabilistic topic models. The LDA model is conditioned by three parameters: two Dirichlet hyperparameters (α and β ) and the number of topics (K). Determining the parameter K is extremely important and not extensively explored in the literature, mainly due to the intensive computation and long processing time. Most topic modeling methods implicitly assume that the number of topics is known in advance, thus considering it demands an exogenous parameter. That is annoying, leaving the technique prone to subjectivities. The quality of insights offered by LDA is quite sensitive to the value of the parameter K, and perhaps an excess of subjectivity in its choice might influence the confidence managers put on the techniques results, thus undermining its usage by firms. This dissertation’s main objective is to develop a metric to identify the ideal value for the parameter K of the LDA model that allows an adequate representation of the corpus and within a tolerable elapsed time of the process. We apply the proposed metric alongside existing metrics to two datasets. Experiments show that the proposed method selects a number of topics similar to that of other metrics, but with better performance in terms of processing time. Although each metric has its own method for determining the number of topics, some results are similar for the same database, as evidenced in the study. Our metric is superior when considering the processing time. Experiments show this method is effective.As tendências tecnológicas mais recentes impulsionam uma vasta e crescente quantidade de dados textuais. Modelagem de tópicos é uma ferramenta útil para extrair informações relevantes de grandes corpora de texto. Um modelo de tópico é baseado em um corpus de documentos, descobre os tópicos que permeiam o corpus e atribui documentos a esses tópicos. O modelo de Alocação de Dirichlet Latente (LDA) é o principal, ou mais popular, dos modelos de tópicos probabilísticos. O modelo LDA é condicionado por três parâmetros: os hiperparâmetros de Dirichlet (α and β ) e o número de tópicos (K). A determinação do parâmetro K é extremamente importante e pouco explorada na literatura, principalmente devido à computação intensiva e ao longo tempo de processamento. A maioria dos métodos de modelagem de tópicos assume implicitamente que o número de tópicos é conhecido com antecedência, portanto, considerando que exige um parâmetro exógeno. Isso é um tanto complicado para o pesquisador pois acaba acrescentando à técnica uma subjetividade. A qualidade dos insights oferecidos pelo LDA é bastante sensível ao valor do parâmetro K, e pode-se argumentar que um excesso de subjetividade em sua escolha possa influenciar a confiança que os gerentes depositam nos resultados da técnica, prejudicando assim seu uso pelas empresas. O principal objetivo desta dissertação é desenvolver uma métrica para identificar o valor ideal para o parâmetro K do modelo LDA que permita uma representação adequada do corpus e dentro de um tempo de processamento tolerável. Embora cada métrica possua método próprio para determinação do número de tópicos, alguns resultados são semelhantes para a mesma base de dados, conforme evidenciado no estudo. Nossa métrica é superior ao considerar o tempo de processamento. Experimentos mostram que esse método é eficaz

    Six papers on computational methods for the analysis of structured and unstructured data in the economic domain

    Get PDF
    This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events.This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events

    Research on the Evolution of Journal Topic Mining Based on the BERT-LDA Model

    Get PDF
    Scientific papers are an important form for researchers to summarize and display their research results. Information mining and analysis of scientific papers can help to form a comprehensive understanding of the subject. Aiming at the ignorance of contextual semantic information in current topic mining and the uncertainty of screening rules in association evolution research, this paper proposes a topic mining evolution model based on the BERT-LDA model. First, the model combines the contextual semantic information learned by the BERT model with the word vectors of the LDA model to mine deep semantic topics. Then construct topic filtering rules to eliminate invalid associations between topics. Finally, the relationship between themes is analyzed through the theme evolution, and the complex relationship between the themes such as fusion, diffusion, emergence, and disappearance is displayed. The experimental results show that, compared with the traditional LDA model, the topic mining evolution model based on BERTLDA can accurately mine topics with deep semantics and effectively analyze the development trend of scientific and technological paper topics

    Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media

    Get PDF
    abstract: Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model is developed to predict the emergence of viral hashtag breakouts, using network features. Next, another model is developed to detect and compare individual and organizational accounts, by using a set of domain and language-independent features. The third model exposes contentious issues, driving reactionary dynamics between opposing camps. The fourth model develops community detection and visualization methods to reveal underlying dynamics and key messages that drive dynamics. The final model presents a use case methodology for detecting and monitoring foreign influence, wherein a state actor and news media under its control attempt to shift public opinion by framing information to support multiple adversarial narratives that facilitate their goals. In each case, a discussion of novel aspects and contributions of the models is presented, as well as quantitative and qualitative evaluations. An analysis of multiple conflict situations will be conducted, covering areas in the UK, Bangladesh, Libya and the Ukraine where adversarial framing lead to polarization, declines in social cohesion, social unrest, and even civil wars (e.g., Libya and the Ukraine).Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Automated Sentiment Analysis for Personnel Survey Data in the US Air Force Context

    Get PDF
    When surveys are distributed across the Air Force (AF), whether it be an employee engagement survey, a climate survey, or similar, significant resources are put towards the development, distribution and analysis of the survey. However, when open ended questions are included on these surveys, respondent comments are generally underutilized, more often treated as a source for pull-quotes rather than a data source in and of themselves. This is due to a lack of transparency and confidence in the accuracy of machine-aided methods such as sentiment analysis and topic modeling. This confidence reduces further when the text has special context, such as within the Air Force context. No model or methodology has been universally identified as ideal for this use case, nor has any model been universally adapted. The inconsistencies in approaches across analytical teams tasked with assessing the results of these surveys leaves data on the field

    Essays on Machine Learning in Risk Management, Option Pricing, and Insurance Economics

    Get PDF
    Dealing with uncertainty is at the heart of financial risk management and asset pricing. This cumulative dissertation consists of four independent research papers that study various aspects of uncertainty, from estimation and model risk over the volatility risk premium to the measurement of unobservable variables. In the first paper, a non-parametric estimator of conditional quantiles is proposed that builds on methods from the machine learning literature. The so-called leveraging estimator is discussed in detail and analyzed in an extensive simulation study. Subsequently, the estimator is used to quantify the estimation risk of Value-at-Risk and Expected Shortfall models. The results suggest that there are significant differences in the estimation risk of various GARCH-type models while in general estimation risk for the Expected Shortfall is higher than for the Value-at-Risk. In the second paper, the leveraging estimator is applied to realized and implied volatility estimates of US stock options to empirically test if the volatility risk premium is priced in the cross-section of option returns. A trading strategy that is long (short) in a portfolio with low (high) implied volatility conditional on the realized volatility yields average monthly returns that are economically and statistically significant. The third paper investigates the model risk of multivariate Value-at-Risk and Expected Shortfall models in a comprehensive empirical study on copula GARCH models. The paper finds that model risk is economically significant, especially high during periods of financial turmoil, and mainly due to the choice of the copula. In the fourth paper, the relation between digitalization and the market value of US insurers is analyzed. Therefore, a text-based measure of digitalization building on the Latent Dirichlet Allocation is proposed. It is shown that a rise in digitalization efforts is associated with an increase in market valuations.:1 Introduction 1.1 Motivation 1.2 Conditional quantile estimation via leveraging optimal quantization 1.3 Cross-section of option returns and the volatility risk premium 1.4 Marginals versus copulas: Which account for more model risk in multivariate risk forecasting? 1.5 Estimating the relation between digitalization and the market value of insurers 2 Conditional Quantile Estimation via Leveraging Optimal Quantization 2.1 Introduction 2.2 Optimal quantization 2.3 Conditional quantiles through leveraging optimal quantization 2.4 The hyperparameters N, λ, and γ 2.5 Simulation study 2.6 Empirical application 2.7 Conclusion 3 Cross-Section of Option Returns and the Volatility Risk Premium 3.1 Introduction 3.2 Capturing the volatility risk premium 3.3 Empirical study 3.4 Robustness checks 3.5 Conclusion 4 Marginals Versus Copulas: Which Account for More Model Risk in Multivariate Risk Forecasting? 4.1 Introduction 4.2 Market risk models and model risk 4.3 Data 4.4 Analysis of model risk 4.5 Model risk for models in the model confidence set 4.6 Model risk and backtesting 4.7 Conclusion 5 Estimating the Relation Between Digitalization and the Market Value of Insurers 5.1 Introduction 5.2 Measuring digitalization using LDA 5.3 Financial data & empirical strategy 5.4 Estimation results 5.5 Conclusio

    The ECB Announcement Returns

    Get PDF
    • …
    corecore