51 research outputs found

    Technology Forecasting Using Data Mining and Semantics: First Annual Report

    Get PDF
    The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strategies. Examples of the types of information sources which we exploit are diverse and include academic journals, patents, blogs and news stories. The intended outputs of the project include growth forecasts for various technological sectors (with an emphasis on sustainable energy), an improved understanding of the underlying research landscape, as well as the identification of influential researchers or research groups. This paper focuses on the development of techniques to both organize and visualize the data in a way which reflects the semantic relationships between keywords. We studied the use of the joint term frequencies of pairs of keywords, as a means of characterizing this semantic relationship – this is based on the intuition that terms which frequently appear together are more likely to be closely related. Some of the results reported herein describe: (1) Using appropriate tools and methods, exploitable patterns and information can certainly be extracted from publicly available databases, (2) Adaptation of the Normalized Google Distance (NGD) formalism can provide measures of keyword distances that facilitate keyword clustering and hierarchical visualization, (3) Further adaptation of the NGD formalism can be used to provide an asymmetric measure of keyword distances to allow the automatic creation of a keyword taxonomy, and (4) Adaptation of the Latent Semantic Approach (LSA) can be used to identify concepts underlying collections of keywords

    Social review-based recommender systems from theory to practice

    Get PDF
    Premi al millor PFC en l'Àrea de Sistemes de la informació d'Enginyeria de Telecomunicació o d'Enginyeria Electrònica de l'ETSETB-UPC (curs 2013-2014). Atorgat per Cátedra Red.esSocial Recommender Systems were born with the goal to mitigate the current information overload caused by the birth of Social Networks among other causes. They have enabled Internet actors (e.g. users, web browsers, sensors, actuators, etc.) to make more informed decisions based on the information that is been shown to them, up to the point that some actors even blindly trust the recommendation generated by these systems. Within this scenario, this thesis proposes a novel Hybrid Social Recommender System purely based on the text reviews typed by users. The proposed engine treats the review content and sentiment separately and finally, combines both into a single recommendation. Very little scientific research has been published on mining text reviews with the aim of performing item recommendation. Moreover, among all Hybrid Recommendation Systems in the literature, none use the above-mentioned review features into a collaborative and content-based recommender. With the purpose in mind of assessing the platform effectiveness, we present a methodology that goes from the process of extracting the data directly from a Social Network, cleaning and pre-processing the text data, building the predictive model with different state-of-the art machine learning techniques, up to the point of evaluating the system in terms of several key metrics. The data extraction process gains our attention due to the challenges imposed by most social platforms in obtaining all the geo-positioned data generated in a bounded region. To overcome the platform limitations, we introduce the use of the Quadtree algorithm with the goal of crawling all the geo-positioned reviews. The algorithm is enhanced with a module that copes with the time dynamics and captures the time-stamped data as well. Moreover, we study the effectiveness of the Quadtree partition method to crawl any type of spatial data, which tends to be softly distributed in the area. This thesis draws several conclusions from the available data about the use of several state-of-the art text mining techniques and the effectiveness of the proposed recommender setup. Nonetheless, future work needs to design and propose novel evaluation methodologies that uncouple the system evaluation from the data.Award-winnin

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Preference mining techniques for customer behavior analysis

    Full text link
    The thesis has studied a number of critical problems in data mining for customer behavior analysis and has proposed novel techniques for better modeling of the customers’ decision making process, more efficient analysis of their travel behavior, and more effective identification of their emerging preference

    Advances in Analysis and Exploration in Medical Imaging

    Get PDF
    With an ever increasing life expectancy, we see a concomitant increase in diseases capable of disrupting normal cognitive processes. Their diagnoses are difficult, and occur usually after daily living activities have already been compromised. This dissertation proposes machine learning methods for the study of the neurological implications of brain lesions. It addresses the analysis and exploration of medical imaging data, with particular emphasis to (f)MRI. Two main research directions are proposed. In the first, a brain tissue segmentation approach is detailed. In the second, a document mining framework, applied to reports of neuroscientific studies, is described. Both directions are based on retrieving consistent information from multi-modal data. A contribution in this dissertation is the application of a semi-supervised method, discriminative clustering, to identify different brain tissues and their partial volume information. The proposed method relies on variations of tissue distributions in multi-spectral MRI, and reduces the need for a priori information. This methodology was successfully applied to the study of multiple sclerosis and age related white matter diseases. It was also showed that early-stage changes of normal-appearing brain tissue can already predict decline in certain cognitive processes. Another contribution in this dissertation is in neuroscience meta-research. One limitation in neuroimage processing relates to data availability. Through document mining of neuroscientific reports, using images as source of information, one can harvest research results dealing with brain lesions. The context of such results can be extracted from textual information, allowing for an intelligent categorisation of images. This dissertation proposes new principles, and a combination of several techniques to the study of published fMRI reports. These principles are based on a number of distance measures, to compare various brain activity sites. Application to studies of the default mode network validated the proposed approach. The aforementioned methodologies rely on clustering approaches. When dealing with such strategies, most results depend on the choice of initialisation and parameter settings. By defining distance measures that search for clusters of consistent elements, one can estimate a degree of reliability for each data grouping. In this dissertation, it is shown that such principles can be applied to multiple runs of various clustering algorithms, allowing for a more robust estimation of data agglomeration

    Structural and dynamical interdependencies in complex networks at meso- and macroscale: nestedness, modularity, and in-block nestedness

    Get PDF
    Many real systems like the brain are considered to be complex, i.e. they are made of several interacting components and display a collective behaviour that cannot be inferred from how the individual parts behave. They are usually described as networks, with the components represented as nodes and the interactions between them as links. Research into networks mainly focuses on exploring how a network's dynamic behaviour is constrained by the nature and topology of the interactions between its elements. Analyses of this sort are performed on three scales: the microscale, based on single nodes; the macroscale, which explores the whole network; and the mesoscale, which studies groups of nodes. Nonetheless, most studies so far have focused on only one scale, despite increasing evidence suggesting that networks exhibit structure on several scales. In our thesis, we apply structural analysis to a variety of synthetic and empirical networks on multiple scales. We focus on the examination of nested, modular, and in-block nested patterns, and the effects that they impose on each other. Finally, we introduce a theoretical model to help us to better understand some of the mechanisms that enable such patterns to emerge.Molts sistemes, com el cervell o internet, són considerats complexos: sistemes formats per una gran quantitat d'elements que interactuen entre si, que exhibeixen un comportament col·lectiu que no es pot inferir des de les propietats dels seus elements aïllats. Aquests sistemes s'estudien mitjançant xarxes, en les quals els elements constituents són els nodes, i les interaccions entre ells, els enllaços. La recerca en xarxes s'enfoca principalment a explorar com el comportament dinàmic d'una xarxa està definit per la naturalesa i la topologia de les interaccions entre els seus elements. Aquesta anàlisi sovint es fa en tres escales: la microescala, que estudia les propietats dels nodes individuals; la macroescala, que explora les propietats de tota la xarxa, i la mesoescala, basada en les propietats de grups de nodes. No obstant, la majoria dels estudis se centren només en una escala, tot i la creixent evidència que suggereix que les xarxes sovint exhibeixen estructura a múltiples escales. En aquesta tesi estudiarem les propietats estructurals de les xarxes a escala múltiple. Analitzarem les propietats estructurals dels patrons in-block nested i la seva relació amb els patrons niats i modulars. Finalment, introduirem un model teòric per explorar alguns dels mecanismes que permeten l'emergència d'aquests patrons.Muchos sistemas, como el cerebro o internet, son considerados complejos: sistemas formados por una gran cantidad de elementos que interactúan entre sí, que exhiben un comportamiento colectivo que no puede inferirse desde las propiedades de sus elementos aislados. Estos sistemas se estudian mediante redes, en las que los elementos constituyentes son los nodos, y las interacciones entre ellos, los enlaces. La investigación en redes se enfoca principalmente a explorar cómo el comportamiento dinámico de una red está definido por la naturaleza y la topología de las interacciones entre sus elementos. Este análisis a menudo se hace en tres escalas: la microescala, que estudia las propiedades de los nodos individuales; la macroescala, que explora las propiedades de toda la red, y la mesoescala, basada en las propiedades de grupos de nodos. No obstante, la mayoría de los estudios se centran solo en una escala, a pesar de la creciente evidencia que sugiere que las redes a menudo exhiben estructura a múltiples escalas. En esta tesis estudiaremos las propiedades estructurales de las redes a escala múltiple. Analizaremos las propiedades estructurales de los patrones in-block nested y su relación con los patrones anidados y modulares. Finalmente, introduciremos un modelo teórico para explorar algunos de los mecanismos que permiten la emergencia de estos patrones.Tecnologías de la información y de rede

    Machine Learning Methods with Noisy, Incomplete or Small Datasets

    Get PDF
    In many machine learning applications, available datasets are sometimes incomplete, noisy or affected by artifacts. In supervised scenarios, it could happen that label information has low quality, which might include unbalanced training sets, noisy labels and other problems. Moreover, in practice, it is very common that available data samples are not enough to derive useful supervised or unsupervised classifiers. All these issues are commonly referred to as the low-quality data problem. This book collects novel contributions on machine learning methods for low-quality datasets, to contribute to the dissemination of new ideas to solve this challenging problem, and to provide clear examples of application in real scenarios