182 research outputs found

    Learning the Consensus of Multiple Correspondences between Data Structures

    Get PDF
    En aquesta tesi presentem un marc de treball per aprendre el consens donades múltiples correspondències. S'assumeix que les diferents parts involucrades han generat aquestes correspondències per separat, i el nostre sistema actua com un mecanisme que calibra diferents característiques i considera diferents paràmetres per aprendre les millors assignacions i així, conformar una correspondència amb la major precisió possible a costa d'un cost computacional raonable. Aquest marc de treball de consens és presentat en una forma gradual, començant pels desenvolupaments més bàsics que utilitzaven exclusivament conceptes ben definits o únicament un parell de correspondències, fins al model final que és capaç de considerar múltiples correspondències, amb la capacitat d'aprendre automàticament alguns paràmetres de ponderació. Cada pas d'aquest marc de treball és avaluat fent servir bases de dades de naturalesa variada per demostrar efectivament que és possible tractar diferents escenaris de matching. Addicionalment, dos avanços suplementaris relacionats amb correspondències es presenten en aquest treball. En primer lloc, una nova mètrica de distància per correspondències s'ha desenvolupat, la qual va derivar en una nova estratègia per a la cerca de mitjanes ponderades. En segon lloc, un marc de treball específicament dissenyat per a generar correspondències al camp del registre d'imatges s'ha modelat, on es considera que una de les imatges és una imatge completa, i l'altra és una mostra petita d'aquesta. La conclusió presenta noves percepcions de com el nostre marc de treball de consens pot ser millorada, i com els dos desenvolupaments paral·lels poden convergir amb el marc de treball de consens.En esta tesis presentamos un marco de trabajo para aprender el consenso dadas múltiples correspondencias. Se asume que las distintas partes involucradas han generado dichas correspondencias por separado, y nuestro sistema actúa como un mecanismo que calibra distintas características y considera diferentes parámetros para aprender las mejores asignaciones y así, conformar una correspondencia con la mayor precisión posible a expensas de un costo computacional razonable. El marco de trabajo de consenso es presentado en una forma gradual, comenzando por los acercamientos más básicos que utilizaban exclusivamente conceptos bien definidos o únicamente un par de correspondencias, hasta el modelo final que es capaz de considerar múltiples correspondencias, con la capacidad de aprender automáticamente algunos parámetros de ponderación. Cada paso de este marco de trabajo es evaluado usando bases de datos de naturaleza variada para demostrar efectivamente que es posible tratar diferentes escenarios de matching. Adicionalmente, dos avances suplementarios relacionados con correspondencias son presentados en este trabajo. En primer lugar, una nueva métrica de distancia para correspondencias ha sido desarrollada, la cual derivó en una nueva estrategia para la búsqueda de medias ponderadas. En segundo lugar, un marco de trabajo específicamente diseñado para generar correspondencias en el campo del registro de imágenes ha sido establecida, donde se considera que una de las imágenes es una imagen completa, y la otra es una muestra pequeña de ésta. La conclusión presenta nuevas percepciones de cómo nuestro marco de trabajo de consenso puede ser mejorada, y cómo los dos desarrollos paralelos pueden converger con éste.In this work, we present a framework to learn the consensus given multiple correspondences. It is assumed that the several parties involved have generated separately these correspondences, and our system acts as a mechanism that gauges several characteristics and considers different parameters to learn the best mappings and thus, conform a correspondence with the highest possible accuracy at the expense of a reasonable computational cost. The consensus framework is presented in a gradual form, starting from the most basic approaches that used exclusively well-known concepts or only two correspondences, until the final model which is able to consider multiple correspondences, with the capability of automatically learning some weighting parameters. Each step of the framework is evaluated using databases of varied nature to effectively demonstrate that it is capable to address different matching scenarios. In addition, two supplementary advances related on correspondences are presented in this work. Firstly, a new distance metric for correspondences has been developed, which lead to a new strategy for the weighted mean correspondence search. Secondly, a framework specifically designed for correspondence generation in the image registration field has been established, where it is considered that one of the images is a full image, and the other one is a small sample of it. The conclusion presents insights of how our consensus framework can be enhanced, and how these two parallel developments can converge with it

    Network Analysis of World Trade using the BACI-CEPII dataset

    Get PDF
    In this paper we explore the BACI-CEPII database using Network Analysis. Starting from the visualization of the World Trade Network, we then define and describe the topology of the network, both in its binary version and in its weighted version, calculating and discussing some of the commonly used network's statistics. We finally discuss some specic topics that can be studied using Network Analysis and International Trade data, both at the aggregated and sectoral level. The analysis is done using multiple software (Stata, R, and Pajek). The scripts to replicate part of the analysis are included in the appendix, and can be used as an handson tutorial. Moreover,the World Trade Network local and global centrality measures, for the unweighted and the weighted version of the Network, calculated using the bilateral aggregate trade data for each country (178 in total) and each year (from 1995 to 2010,) can be downloaded from the CEPII webpage

    Herramientas informáticas y de inteligencia artificial para el meta-análisis en la frontera entre la bioinformática y las ciencias jurídicas

    Get PDF
    [Resumen] Los modelos computacionales, conocidos por su acrónimo en idioma Inglés como QSPR (Quantitative Structure-Property Relationships) pueden usarse para predecir propiedades de sistemas complejos. Estas predicciones representan una aplicación importante de las Tecnologías de la Información y la Comunicación (TICs). La mayor relevancia es debido a la reducción de costes de medición experimental en términos de tiempo, recursos humanos, recursos materiales, y/o el uso de animales de laboratorio en ciencias biomoleculares, técnicas, sociales y/o jurídicas. Las Redes Neuronales Artificiales (ANNs) son una de las herramientas informáticas más poderosas para buscar modelos QSPR. Para ello, las ANNs pueden usar como variables de entrada (input) parámetros numéricos que cuantifiquen información sobre la estructura del sistema. Los parámetros conocidos como Índices Topológicos (TIs) se encuentran entre los más versátiles. Los TIs se calculan en Teoría de Grafos a partir de la representación de cualquier sistema como una red de nodos interconectados; desde moléculas a redes biológicas, tecnológicas, y sociales. Esta tesis tiene como primer objetivo realizar una revisión y/o introducir nuevos TIs y software de cálculo de TIs útiles como inputs de ANNs para el desarrollo de modelos QSPR de redes bio-moleculares, biológicas, tecnológico-económicas y socio-jurídicas. En ellas, por una parte, los nodos representan biomoléculas, organismos, poblaciones, leyes tributarias o concausas de delitos. Por otra parte, en la interacción TICs-Ciencias Biomoleculares- Derecho se hace necesario un marco de seguridad jurídica que permita el adecuado desarrollo de las TICs y sus aplicaciones en Ciencias Biomoleculares. Por eso, el segundo objetivo de esta tesis es revisar el marco jurídico-legal de protección de los modelos QSAR/QSPR de sistemas moleculares. El presente trabajo de investigación pretende demostrar la utilidad de estos modelos para predecir características y propiedades de estos sistemas complejos.[Resumo] Os modelos de ordenador coñecidos pola súas iniciais en inglés QSPR (Quantitative Structure-Property Relationships) poden prever as propiedades de sistemas complexos e reducir os custos experimentais en termos de tempo, recursos humanos, materiais e/ou o uso de animais de laboratorio nas ciencias biomoleculares, técnicas, e sociais. As Redes Neurais Artificiais (ANNs) son unha das ferramentas máis poderosas para buscar modelos QSPR. Para iso, as ANNs poden facer uso, coma variables de entrada (input), dos parámetros numéricos da estrutura do sistema chamados Índices Topolóxicos (TIs). Os TI calcúlanse na teoría dos grafos a partir da representación do sistema coma unha rede de nós conectados, incluíndo tanto moléculas coma redes sociais e tecnolóxicas. Esta tese ten como obxectivo principal revisar e/ou desenvolver novos TIs, programas de cálculo de TIs, e/ou modelos QSPR facendo uso de ANNs para predicir redes bio-moleculares, biolóxicas, económicas, e sociais ou xurídicas onde os nós representan moléculas biolóxicas, organismos, poboacións, ou as leis fiscais ou as concausas dun delito. Ademais, a interacción das TIC con as ciencias biolóxicas e xurídicas necesita dun marco de seguridade xurídica que permita o bo desenvolvemento das TIC e as súas aplicacións en Ciencias Biomoleculares. Polo tanto, o segundo obxectivo desta tese é analizar o marco xurídico e legal de protección dos modelos QSPR. O presente traballo de investigación pretende demostrar a utilidade destes modelos para predicir características e propiedades destes sistemas complexos.[Abstract] QSPR (Quantitative Structure-Property Relationships) computer models can predict properties of complex systems reducing experimental costs in terms of time, human resources, material resources, and/or the use of laboratory animals in bio-molecular, technical, and/or social sciences. Artificial Neural Networks (ANNs) are one of the most powerful tools to search QSPR models. For this, the ANNs may use as input variables numerical parameters of the system structure called Topological Indices (TIs). The TIs are calculated in Graph Theory from a representation of any system as a network of interconnected nodes, including molecules or social and technological networks. The first aim of this thesis is to review and/or develop new TIs, TIs calculation software, and QSPR models using ANNs to predict bio-molecular, biological, commercial, social, and legal networks where nodes represent bio-molecules, organisms, populations, products, tax laws, or criminal causes. Moreover, the interaction of ICTs with Biomolecular and law Sciences needs a legal security framework that allows the proper development of ICTs and their applications in Biomolecular Sciences. Therefore, the second objective of this thesis is to review the legal framework and legal protection of QSPR techniques. The present work of investigation tries to demonstrate the usefulness of these models to predict characteristics and properties of these complex systems

    Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages

    Full text link
    We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.Comment: 40 pages, 13 figures, 5 table

    Detecting Communities and Analysing Interactions with Learning Objects in Online Learning Repositories

    Get PDF
    The widespread use of online learning object repositories has raised the need of studies that assess the quality of their contents, and their user’s performance and engagement. The present research addresses two fundamental problems that are central to that need: the need to explore user interaction with these repositories and the detection of emergent communities of users. The current dissertation approaches those directions through investigating and mining the Khan Academy repository as a free, open access, popular online learning repository addressing a wide content scope. It includes large numbers of different learning objects such as instructional videos, articles, and exercises. In addition to a large number of users. Data was collected using the repository’s public application programming interfaces combined with Web scraping techniques to gather data and user interactions. Different research activities were carried out to generate useful insights out of the gathered data. We conducted descriptive analysis to investigate the learning repository and its core features such as growth rate, popularity, and geographical distribution. A number of statistical and quantitative analysis were applied to examine the relation between the users’ interactions and different metrics related to the use of learning objects in a step to assess the users’ behaviour. We also used different Social Network Analysis (SNA) techniques on a network graph built from a large number of user interactions. The resulting network consisted of more than 3 million interactions distributed across more than 300,000 users. The type of those interactions is questions and answers posted on Khan Academy’s instructional videos (more than 10,000 video). In order to analyse this graph and explore the social network structure, we studied two different community detection algorithms to identify the learning interactions communities emerged in Khan Academy then we compared between their effectiveness. After that, we applied different SNA measures including modularity, density, clustering coefficients and different centrality measures in order to assess the users’ behaviour patterns and their presence. Using descriptive analysis, we discovered many characteristics and features of the repository. We found that the number of learning objects in Khan Academy’s repository grows linearly over time, more than 50% of the users do not complete the watched videos, and we found that the average duration for video lessons 5 to 10 minutes which aligns with the recommended duration in literature. By applying community detection techniques and social network analysis, we managed to identify learning communities in Khan Academy’s network. The size distribution of those communities found to follow the power-law distribution which is the case of many real-world networks. Those learning communities are related to more than one domain which means the users are active and interacting across domains. Different centrality measures we applied to focus on the most influential players in those communities. Despite the popularity of online learning repositories and their wide use, the structure of the emerged learning communities and their social networks remain largely unexplored. Our findings could be considered initial insights that may help researchers and educators in better understanding online learning repositories, the learning process inside those repositories, and learner behaviou

    Unsupervised Structural Embedding Methods for Efficient Collective Network Mining

    Full text link
    How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient. In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd

    ネットワーク科学及び政策決定過程の観点から見たデジタル経済における国際課税制度

    Get PDF
    付記する学位プログラム名: 京都大学大学院思修館京都大学新制・課程博士博士(総合学術)甲第23344号総総博第17号新制||総総||3(附属図書館)京都大学大学院総合生存学館総合生存学専攻(主査)教授 池田 裕一, 特定教授 武田 英俊, 教授 諸富 徹学位規則第4条第1項該当Doctor of PhilosophyKyoto UniversityDFA
    corecore