8 research outputs found

    Query processing in large-scale networks.

    Get PDF
    由于现今在各个领域涌现的图数据规模都愈加庞大,在这些大规模图数据上进行任何一种简单的查询都成为一件有富有挑战性的工作。在本文中,我们着重在大规模图上研究三个具有广泛应用的查询:最短路查询,权重限制查询和最近k关键字查询。具体来说, 最短路查询是一个计算两点间最短距离的基本查询。而权重限制查询判断两点间是否存在一条沿路边权都满足用指定条件的可行路径。对于一个查询节点,最近k关键字查询返回k个距离最近的带有指定关键字的节点。在面对一个拥有超过一亿节点的图时,我们需要为这些查询开发有效的索引和查询优化算法。在本文中,对于最短路查询,我们提出了两个基于地标嵌入的算法,一个是有误差控制的地标嵌入算法,另一个则是本地化地标嵌入算法。前者通过对地标的筛选和组织,能对估计的最短距离给予一定的误差保证; 而后者提出的本地化机制能够在不增加预处理复杂度和在线查询复杂度的情况下大幅度提高估计的精准度。对于权重限制查询,我们先提出一个能够保证常数查询时间的内存算法。除此之外,为了提高算法对大规模数据的处理能力,我们使用编码技术设计了一个有效的外存算法。对于最近k关键字查询,我们先在一个特殊的图,即一颗树上,开发一个有效算法来在常数时间内回答最近k关键字查询, 并由此得出一个图上的近似算法;此外我们还通过一个全局存储的技术来进一步减少索引大小和缩短查询时间。我们在真实和模拟的数据上做了大量的实验,实验结果证明我们的算法在大图上对上述三个查询都具有高效性能。Due to the massive size of graphs from various domains nowadays, even simple graph queries become challenging tasks. In this thesis, three queries with a wide range of applications are investigated on large graphs. One is shortest distance query, a fundamental query which computes the shortest distance between two nodes. Another query, weight constraint reachability (WCR), checks if there is a feasible path between two nodes where edge weights along the path satisfy a side constraint. And the third one, a top-k nearest keywords (k-NK) query, reports, for a query node, the k nearest nodes bearing some user-specified keywords. When confronting with a large-scale graph with over tens of millions of nodes, we need to develop efficient indexing and query optimization techniques for these queries.In this thesis, for a shortest distance query, we devise two landmark embedding schemes, an error bounded landmark scheme and a local landmark scheme, where the former can guarantee an error bound for estimated distance, and the latter can significantly improve the distance estimation accuracy without increasing the offline embedding or the online query complexity. For a WCR query, we propose a memorybased approach which promises a constant query time. Besides, in order to increase its scalability, we devise an I/O-efficient approach for answering a WCR query on massive graphs. For a k-NK query, we start with a special case when the graph is a tree, based on which we present our algorithm for approximate k-NK query on a graph. A global storage technique is devised to further reduce the index size and the query time. We did extensive experiments on the three queries respectively to show the effectiveness and efficiency of our methods.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Qiao, Miao.Thesis (Ph.D.)--Chinese University of Hong Kong, 2013.Includes bibliographical references (leaves 141-151).Abstract also in Chinese.Abstract --- p.iAbstract in Chinese --- p.iiAcknowledgements --- p.iiiContents --- p.vChapter 1. --- Introduction --- p.1Chapter 1.1. --- Motivation --- p.1Chapter 1.1.1. --- Shortest Distance Query --- p.1Chapter 1.1.2. --- Weight Constraint Reachability Query --- p.4Chapter 1.1.3. --- Top-k Nearest Keyword Query --- p.7Chapter 1.2. --- Contributions --- p.9Chapter 1.3. --- Roadmap --- p.11Chapter 2. --- RelatedWork --- p.12Chapter 2.1. --- Shortest Distance Query --- p.12Chapter 2.2. --- Reachability Query --- p.14Chapter 2.3. --- Keyword Related Query --- p.15Chapter 3. --- Querying Shortest Distance --- p.17Chapter 3.1. --- Landmark Embedding --- p.17Chapter 3.2. --- Error Bounded Landmark Scheme --- p.18Chapter 3.2.1. --- Problem Statement --- p.18Chapter 3.2.2. --- Proposed Algorithm --- p.18Chapter 3.2.3. --- Graph Partitioning-based Heuristic --- p.22Chapter 3.2.4. --- Experiments --- p.27Chapter 3.3. --- Query-Dependent Local Landmark Scheme --- p.34Chapter 3.3.1. --- Problem Statement --- p.34Chapter 3.3.2. --- Shortest Path Tree Based Local Landmark --- p.37Chapter 3.3.3. --- Optimization Techniques --- p.41Chapter 3.3.4. --- Local Landmark Scheme on Relational Database --- p.48Chapter 3.3.5. --- Experiment --- p.56Chapter 3.4. --- Summary --- p.64Chapter 4. --- QueryingWeight Constraint Reachability --- p.65Chapter 4.1. --- Problem Definition --- p.65Chapter 4.1.1. --- Edge Weight Constraint --- p.65Chapter 4.1.2. --- Node Weight Constraint --- p.66Chapter 4.1.3. --- Two Basic Solutions --- p.67Chapter 4.2. --- An Efficient Memory Algorithm --- p.68Chapter 4.2.1. --- Properties of WCR --- p.68Chapter 4.2.2. --- Novel Edge Based Indexing --- p.70Chapter 4.2.3. --- Extension to Other Constraint Formats --- p.76Chapter 4.3. --- An I/O-Efficient Index --- p.77Chapter 4.3.1. --- Vertex Coding --- p.78Chapter 4.3.2. --- MST Re-balancing --- p.80Chapter 4.3.3. --- Disk-Based Index Construction --- p.84Chapter 4.3.4. --- Query Processing --- p.85Chapter 4.4. --- Experiments --- p.87Chapter 4.5. --- Summary --- p.101Chapter 5. --- Querying Top K-Nearest Keyword --- p.102Chapter 5.1. --- Problem Definition --- p.102Chapter 5.2. --- Existing Solutions --- p.103Chapter 5.2.1. --- Approximate k-NK on a Graph --- p.104Chapter 5.2.2. --- Exact 1-NK on a Tree --- p.106Chapter 5.3. --- Solution Overview --- p.108Chapter 5.4. --- K-NK on a Tree for a Small K --- p.110Chapter 5.4.1. --- Query Processing --- p.110Chapter 5.4.2. --- Construction of Entry Edge Partition --- p.115Chapter 5.4.3. --- Construction of Candidate List --- p.118Chapter 5.5. --- K-NK on a Tree for a Large K --- p.120Chapter 5.5.1. --- A Basic Pivot Approach --- p.121Chapter 5.5.2. --- Pivot Approach with Tree Balancing --- p.122Chapter 5.5.3. --- Index Construction --- p.125Chapter 5.6. --- Approximate K-NK on a Graph --- p.128Chapter 5.7. --- Experiments --- p.133Chapter 5.8. --- Summary --- p.138Chapter 6. --- Conclusions and Future Work --- p.139Bibliography --- p.14

    Development of novel software tools and methods for investigating the significance of overlapping transcription factor genomic interactions

    Get PDF
    Identifying overlapping DNA binding patterns of different transcription factors is a major objective of genomic studies, but existing methods to archive large numbers of datasets in a personalised database lack sophistication and utility. To address this need, various database systems were benchmarked and a tool BiSA (Binding Sites Analyser) was developed for archiving of genomic regions and easy identification of overlap with or proximity to other regions of interest. BiSA can also calculate statistical significance of overlapping regions and can also identify genes located near binding regions of interest or genomic features near a gene or locus of interest. BiSA was populated with >1000 datasets from previously published genomic studies describing transcription factor binding sites and histone modifications. Using BiSA, the relationships between binding sites for a range of transcription factors were analysed and a number of statistically significant relationships were identified. This included an extensive comparison of estrogen receptor alpha (ERα) and progesterone receptor (PR) in breast cancer cells, which revealed a statistically significant functional relationship at a subset of sites. In summary, the BiSA comprehensive knowledge base contains publicly available datasets describing transcription factor binding sites and epigenetic modification and provides an easy graphical interface to biologists for advanced analysis of genomic interactions

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023

    Get PDF
    Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida

    XLIII Jornadas de Automática: libro de actas: 7, 8 y 9 de septiembre de 2022, Logroño (La Rioja)

    Get PDF
    [Resumen] Las Jornadas de Automática (JA) son el evento más importante del Comité Español de Automática (CEA), entidad científico-técnica con más de cincuenta años de vida y destinada a la difusión e implantación de la Automática en la sociedad. Este año se celebra la cuadragésima tercera edición de las JA, que constituyen el punto de encuentro de la comunidad de Automática de nuestro país. La presente edición permitirá dar visibilidad a los nuevos retos y resultados del ámbito, y su uso en un gran número de aplicaciones, entre otras, las energías renovables, la bioingeniería o la robótica asistencial. Además de la componente científica, que se ve reflejada en este libro de actas, las JA son un punto de encuentro de las diferentes generaciones de profesores, investigadores y profesionales, incluyendo la componente social que es de vital importancia. Esta edición 2022 de las JA se celebra en Logroño, capital de La Rioja, región mundialmente conocida por la calidad de sus vinos de Denominación de Origen y que ha asumido el desafío de poder ganar competitividad a través de la transformación verde y digital. Pero también por ser la cuna del castellano e impulsar el Valle de la Lengua con la ayuda de las nuevas tecnologías, entre ellas la Automática Inteligente. Los organizadores de estas JA, pertenecientes al Área de Ingeniería de Sistemas y Automática del Departamento de Ingeniería Eléctrica de la Universidad de La Rioja (UR), constituyen un pilar fundamental en el apoyo a la región para el estudio, implementación y difusión de estos retos. Esta edición, la primera en formato íntegramente presencial después de la pandemia de la covid-19, cuenta con más de 200 asistentes y se celebra a caballo entre el Edificio Politécnico de la Escuela Técnica Superior de Ingeniería Industrial y el Monasterio de Yuso situado en San Millán de la Cogolla, dos marcos excepcionales para la realización de las JA. Como parte del programa científico, dos sesiones plenarias harán hincapié, respectivamente, sobre soluciones de control para afrontar los nuevos retos energéticos, y sobre la calidad de los datos para una inteligencia artificial (IA) imparcial y confiable. También, dos mesas redondas debatirán aplicaciones de la IA y la implantación de la tecnología digital en la actividad profesional. Adicionalmente, destacaremos dos clases magistrales alineadas con tecnología de última generación que serán impartidas por profesionales de la empresa. Las JA también van a albergar dos competiciones: CEABOT, con robots humanoides, y el Concurso de Ingeniería de Control, enfocado a UAVs. A todas estas actividades hay que añadir las reuniones de los grupos temáticos de CEA, las exhibiciones de pósteres con las comunicaciones presentadas a las JA y los expositores de las empresas. Por último, durante el evento se va a proceder a la entrega del “Premio Nacional de Automática” (edición 2022) y del “Premio CEA al Talento Femenino en Automática”, patrocinado por el Gobierno de La Rioja (en su primera edición), además de diversos galardones enmarcados dentro de las actividades de los grupos temáticos de CEA. Las actas de las XLIII Jornadas de Automática están formadas por un total de 143 comunicaciones, organizadas en torno a los nueve Grupos Temáticos y a las dos Líneas Estratégicas de CEA. Los trabajos seleccionados han sido sometidos a un proceso de revisión por pares
    corecore