10 research outputs found

    Latent Dirichlet Allocation (LDA) for improving the topic modeling of the official bulletin of the spanish state (BOE)

    Get PDF
    Since Internet was born most people can access fully free to a lot sources of information. Every day a lot of web pages are created and new content is uploaded and shared. Never in the history the humans has been more informed but also uninformed due the huge amount of information that can be access. When we are looking for something in any search engine the results are too many for reading and filtering one by one. Recommended Systems (RS) was created to help us to discriminate and filter these information according to ours preferences. This contribution analyses the RS of the official agency of publications in Spain (BOE), which is known as "Mi BOE'. The way this RS works was analysed, and all the meta-data of the published documents were analysed in order to know the coverage of the system. The results of our analysis show that more than 89% of the documents cannot be recommended, because they are not well described at the documentary level, some of their key meta-data are empty. So, this contribution proposes a method to label documents automatically based on Latent Dirichlet Allocation (LDA). The results are that using this approach the system could recommend (at a theoretical point of view) more than twice of documents that it now does, 11% vs 23% after applied this approach

    Six papers on computational methods for the analysis of structured and unstructured data in the economic domain

    Get PDF
    This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events.This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events

    Exploiting Domain Knowledge for Cross-domain Text Classification in Heterogeneous Data Sources

    Get PDF
    With the growing amount of data generated in large heterogeneous repositories (such as the Word Wide Web, corporate repositories, citation databases), there is an increased need for the end users to locate relevant information efficiently. Text Classification (TC) techniques provide automated means for classifying fragments of text (phrases, paragraphs or documents) into predefined semantic types, allowing an efficient way for organising and analysing such large document collections. Current approaches to TC rely on supervised learning, which perform well on the domains on which the TC system is built, but tend to adapt poorly to different domains. This thesis presents a body of work for exploring adaptive TC techniques across hetero- geneous corpora in large repositories with the goal of finding novel ways of bridging the gap across domains. The proposed approaches rely on the exploitation of domain knowledge for the derivation of stable cross-domain features. This thesis also investigates novel ways of estimating the performance of a TC classifier, by means of domain similarity measures. For this purpose, two novel knowledge-based similarity measures are proposed that capture the usefulness of the selected cross-domain features for cross-domain TC. The evaluation of these approaches and measures is presented on real world datasets against various strong baseline methods and content-based measures used in transfer learning. This thesis explores how domain knowledge can be used to enhance the representation of documents to address the lexical gap across the domains. Given that the effectiveness of a text classifier largely depends on the availability of annotated data, this thesis explores techniques which can leverage data from social knowledge sources (such as DBpedia and Freebase). Techniques are further presented, which explore the feasibility of exploiting different semantic graph structures from knowledge sources in order to create novel cross- domain features and domain similarity metrics. The methodologies presented provide a novel representation of documents, and exploit four wide coverage knowledge sources: DBpedia, Freebase, SNOMED-CT and MeSH. The contribution of this thesis demonstrates the feasibility of exploiting domain knowl- edge for adaptive TC and domain similarity, providing an enhanced representation of docu- ments with semantic information about entities, that can indeed reduce the lexical differences between domains

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    Congress UPV Proceedings of the 21ST International Conference on Science and Technology Indicators

    Get PDF
    This is the book of proceedings of the 21st Science and Technology Indicators Conference that took place in València (Spain) from 14th to 16th of September 2016. The conference theme for this year, ‘Peripheries, frontiers and beyond’ aimed to study the development and use of Science, Technology and Innovation indicators in spaces that have not been the focus of current indicator development, for example, in the Global South, or the Social Sciences and Humanities. The exploration to the margins and beyond proposed by the theme has brought to the STI Conference an interesting array of new contributors from a variety of fields and geographies. This year’s conference had a record 382 registered participants from 40 different countries, including 23 European, 9 American, 4 Asia-Pacific, 4 Africa and Near East. About 26% of participants came from outside of Europe. There were also many participants (17%) from organisations outside academia including governments (8%), businesses (5%), foundations (2%) and international organisations (2%). This is particularly important in a field that is practice-oriented. The chapters of the proceedings attest to the breadth of issues discussed. Infrastructure, benchmarking and use of innovation indicators, societal impact and mission oriented-research, mobility and careers, social sciences and the humanities, participation and culture, gender, and altmetrics, among others. We hope that the diversity of this Conference has fostered productive dialogues and synergistic ideas and made a contribution, small as it may be, to the development and use of indicators that, being more inclusive, will foster a more inclusive and fair world

    Advances in Computational Social Science and Social Simulation

    Get PDF
    Aquesta conferència és la celebració conjunta de la "10th Artificial Economics Conference AE", la "10th Conference of the European Social Simulation Association ESSA" i la "1st Simulating the Past to Understand Human History SPUHH".Conferència organitzada pel Laboratory for Socio­-Historical Dynamics Simulation (LSDS-­UAB) de la Universitat Autònoma de Barcelona.Readers will find results of recent research on computational social science and social simulation economics, management, sociology,and history written by leading experts in the field. SOCIAL SIMULATION (former ESSA) conferences constitute annual events which serve as an international platform for the exchange of ideas and discussion of cutting edge research in the field of social simulations, both from the theoretical as well as applied perspective, and the 2014 edition benefits from the cross-fertilization of three different research communities into one single event. The volume consists of 122 articles, corresponding to most of the contributions to the conferences, in three different formats: short abstracts (presentation of work-in-progress research), posters (presentation of models and results), and full papers (presentation of social simulation research including results and discussion). The compilation is completed with indexing lists to help finding articles by title, author and thematic content. We are convinced that this book will serve interested readers as a useful compendium which presents in a nutshell the most recent advances at the frontiers of computational social sciences and social simulation researc

    As Energias Renováveis na Transição Energética : Livro de Comunicações do XVII Congresso Ibérico e XIII Congresso Ibero-americano de Energia Solar

    Get PDF
    CIES2020: XVII Congresso Ibérico e XIII Congresso Ibero-Americano de Energia Solar, Lisboa, Portugal: LNEG, 3-5 Novembro, 2020.RESUMO: O CIES2020, reúne sob o lema da “As Energias Renováveis na Transição Energética”, refletindo uma conjuntura de mudança necessária e urgente em todos os sectores das nossas Sociedades, no nosso comportamento no uso da “Energia”, quer em termos individuais, nas famílias nas empresas e sobretudo na mudança de paradigma dos Sistemas Energéticos que impactam a todos os níveis, nas Cidades, nos Edifícios, nos Transportes, e onde o papel das Energias Renováveis assume um papel prioritário e principal, na luta contra as alterações climáticas, a descarbonização energética na defesa do Planeta e da sustentabilidade das futuras gerações. O CIES2020, apresentou-se com 3 tópicos principais: 1) As Energias Renováveis na Tran sição Energética; 2) As Energias Renováveis no Desenvolvimento Sustentável das Comunidades e 3) As Energias Renováveis a Sociedade e a Economia . Tentámos assim abranger todas as áreas tecnológicas das Energias Renováveis, as suas aplicações e utilizações, bem como os novos desafios futuros que estão a acontecer em termos de Inovação Tecnológica e respetivos impactos na Sociedade.info:eu-repo/semantics/publishedVersio
    corecore