2 research outputs found

    A new Nested Graph Model for Data Integration

    Get PDF
    Despite graph data gained increasing interest in several fields, no data model suitable for both querying and integrating differently structured graph and (semi)structured data has been currently conceived. The lack of operators allowing combinations of (multiple) graphs in current graph query languages (graph joins), and on graph data structure allowing neither data integration nor nested multidimensional representations (graph nesting) are a possible motivation. In order to make such data integration possible, this thesis proposes a novel model (General Semistructured data Model) allowing the representation of both graphs and arbitrarily nested contents (e.g., one node can be contained by more than just one parent node), thus allowing the definition of a nested graph model, where both vertices and edges may include (overlapping) graphs. We provide two graph joins algorithms (Graph Conjunctive Equijoin Algorithm and Graph Conjunctive Less-equal Algorithm) and one graph nesting algorithm (Two HOp Separated Patterns). Their evaluation on top of our secondary memory representation showed the inefficiency of existing query languages’ query plan on top of their respective data models (relational, graph and document-oriented). In all three algorithms, the enhancement was possible by using an adjacency list graph representation, thus reducing the cost of joining the vertices with their respective outgoing (or ingoing) edges, and by associating hash values to both vertices and edges. As a secondary outcome of this thesis, a general data integration scenario is provided where both graph data and other semistructured and structured data could be represented and integrated into the General Semistructured data Model. A new query language outlines the feasibility of this approach (General Semistructured Query Language) over the former data model, also allowing to express both graph joins and graph nestings. This language is also capable of representing both traversal and data manipulation operators

    Predicting drug effectiveness in Cancer Cell Lines using Machine Learning and Graph Mining

    Get PDF
    O cancro é uma doença heterogênea, com um nivel de diversidade entre tumores considerável. Os biomarcadores, no contexto de uma doença oncológica, permitem a identificação da capacidade de resposta de um paciente a um dado fármaco. Estes tratamentos especificos têm produzido resultados em média superiores aos de uso mais abrangente. No entanto a ligação entre a resposta ao tratamento e o valor de um dado biomarcador é em muitos casos ainda desconhecida. O objectivo deste projecto é, com base em resultados prévios e na caracterização tanto dos fármacos como dos tecidos celulares, conseguir prever a eficácia de um fármaco em um tumor .Cancer is an heterogeneous disease, with a high degree of diversity between tumours. Biomarkers, in the context of an oncological disease, allow the identification of the response from a patient to a given drug. These specific treatments have been producing results that are superior on average to broader ones. However, the relationship between a drug's response a biomarkers value is in many cases yet unknown. Some models to predict this relationship have already been built, using machine learning methods. The input arecharacterizations of both the drug and the tissue along with the result of the drug's use on a given tissue.The goal of this thesis is to improve on previous models and the characterization of both the drug and the tissue through the introduction of graph mining and other machine learning methods
    corecore