89 research outputs found

    Pairwise Compatibility Graphs (Invited Talk)

    Get PDF
    Pairwise Compatibility Graphs (PCG) are graphs introduced in relation to the biological problem of reconstructing phylogenetic trees. Without demanding to be exhaustive, in this note we take a quick look at what is known in the literature for these graphs. The evolutionary history of a set of organisms is usually represented by a tree-like structure called phylogenetic tree, where the leaves are the known species and the internal nodes are the possible ancestors that might have led, through evolution, to this set of species. Edges are evolutionary relationships between species, while the edge weights represent evolutionary distances among species (evolutionary times). The phylogenetic tree reconstruction problem consists in finding a fully labeled phylogenetic tree that'best' explains the evolution of given species, where'best' means that it optimizes a specific target function. Tree reconstruction problem is proved to be NP-hard under many criteria of optimality, so the performance of the heuristics for this problem is usually experimentally evaluated by comparing the output trees with the partial trees that are unanimously recognized as sure by biologists. But real data consist of a huge number of species, and it is unfeasible to compare trees with such a number of leaves, so it is common to exploit sample techniques. The idea is to find efficient ways to sample subsets of species from a large set in order to test the heuristics on the smaller sub-trees induced by the sample. The constraints on the sample attempt to ensure that the behavior of the heuristics will not be biased by the fact it is applied on the sample instead of on the whole tree. Since very close or very distant taxa can create problems for phylogenetic reconstruction heuristics [9], the following definition of Pairwise Compatibility Graphs [12] appears natura

    Relating threshold tolerance graphs to other graph classes

    Get PDF
    A graph G=(V, E) is a threshold tolerance if it is possible to associate weights and tolerances with each node of G so that two nodes are adjacent exactly when the sum of their weights exceeds either one of their tolerances. Threshold tolerance graphs are a special case of the well-known class of tolerance graphs and generalize the class of threshold graphs which are also extensively studied in literature. In this note we relate the threshold tolerance graphs with other important graph classes. In particular we show that threshold tolerance graphs are a proper subclass of co-strongly chordal graphs and strictly include the class of co-interval graphs. To this purpose, we exploit the relation with another graph class, min leaf power graphs (mLPGs)

    Pairwise Compatibility Graphs: A Survey

    Get PDF
    International audienceA graph G=(V,E)G=(V,E) is a pairwise compatibility graph (PCG) if there exists an edge-weighted tree TT and two nonnegative real numbers dmind_{min} and dmaxd_{max} such that each leaf uu of TT is a node of VV and there is an edge (u,v)E(u,v) \in E if and only if dmindT(u,v)dmaxd_{min} \leq d_T (u, v) \leq d_{max}, where dT(u,v)d_T (u, v) is the sum of weights of the edges on the unique path from uu to vv in TT. In this article, we survey the state of the art concerning this class of graphs and some of its subclasses

    Simultaneous Graph Representation Problems

    Get PDF
    Many graphs arising in practice can be represented in a concise and intuitive way that conveys their structure. For example: A planar graph can be represented in the plane with points for vertices and non-crossing curves for edges. An interval graph can be represented on the real line with intervals for vertices and intersection of intervals representing edges. The concept of ``simultaneity'' applies for several types of graphs: the idea is to find representations for two graphs that share some common vertices and edges, and ensure that the common vertices and edges are represented the same way. Simultaneous representation problems arise in any situation where two related graphs should be represented consistently. A main instance is for temporal relationships, where an old graph and a new graph share some common parts. Pairs of related graphs arise in many other situations. For example, two social networks that share some members; two schedules that share some events, overlap graphs of DNA fragments of two similar organisms, circuit graphs of two adjacent layers on a computer chip etc. In this thesis, we study the simultaneous representation problem for several graph classes. For planar graphs the problem is defined as follows. Let G1 and G2 be two graphs sharing some vertices and edges. The simultaneous planar embedding problem asks whether there exist planar embeddings (or drawings) for G1 and G2 such that every vertex shared by the two graphs is mapped to the same point and every shared edge is mapped to the same curve in both embeddings. Over the last few years there has been a lot of work on simultaneous planar embeddings, which have been called `simultaneous embeddings with fixed edges'. A major open question is whether simultaneous planarity for two graphs can be tested in polynomial time. We give a linear-time algorithm for testing the simultaneous planarity of any two graphs that share a 2-connected subgraph. Our algorithm also extends to the case of k planar graphs, where each vertex [edge] is either common to all graphs or belongs to exactly one of them. Next we introduce a new notion of simultaneity for intersection graph classes (interval graphs, chordal graphs etc.) and for comparability graphs. For interval graphs, the problem is defined as follows. Let G1 and G2 be two interval graphs sharing some vertices I and the edges induced by I. G1 and G2 are said to be `simultaneous interval graphs' if there exist interval representations of G1 and G2 such that any vertex of I is assigned to the same interval in both the representations. The `simultaneous representation problem' for interval graphs asks whether G1 and G2 are simultaneous interval graphs. The problem is defined in a similar way for other intersection graph classes. For comparability graphs and any intersection graph class, we show that the simultaneous representation problem for the graph class is equivalent to a graph augmentation problem: given graphs G1 and G2, sharing vertices I and the corresponding induced edges, do there exist edges E' between G1-I and G2-I such that the graph G1 U G_2 U E' belongs to the graph class. This equivalence implies that the simultaneous representation problem is closely related to other well-studied classes in the literature, namely, sandwich graphs and probe graphs. We give efficient algorithms for solving the simultaneous representation problem for interval graphs, chordal graphs, comparability graphs and permutation graphs. Further, our algorithms for comparability and permutation graphs solve a more general version of the problem when there are multiple graphs, any two of which share the same common graph. This version of the problem also generalizes probe graphs

    Retail Shelf Analytics Through Image Processing and Deep Learning

    Get PDF
    The present thesis promotes an innovative approach based on modern deep learning and image processing techniques for retail shelf analytics within an actual business context. To achieve this goal, the research focused on recent developments in computer vision while maintaining a business-oriented approach. The project involved the full-stack software development of a product to analyze structured and unstructured data and provide business intelligence services for retail systems

    Automating the multidimensional design of data warehouses

    Get PDF
    Les experiències prèvies en l'àmbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament híbrid; això és, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procés de disseny.Com a qualsevol altre sistema, els requeriments són necessaris per garantir que el sistema desenvolupat satisfà les necessitats de l'usuari. A més, essent aquest un procés de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot ésser poblat amb dades de l'organització, i, a més, (ii) descobrir capacitats d'anàlisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procés de modelatge del magatzem de dades. No obstant això, les propostes basades en un anàlisi dels requeriments assumeixen que aquestos són exhaustius, i no consideren que pot haver-hi informació rellevant amagada a les fonts de dades. Contràriament, les propostes basades en un anàlisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqüentment, generen massa resultats. En aquest escenari, l'automatització del disseny del magatzem de dades és essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar únicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A més, l'automatització de la tasca allibera al dissenyador del sempre complex i costós anàlisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anàlisi dels requeriments no consideren l'automatització del procés, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situació es dona en els mètodes híbrids actual, que proposen un enfocament seqüencial, on l'anàlisi de les dades es complementa amb l'anàlisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a més, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clàssic, en el que els requeriments d'usuari són coneguts d'avantmà. Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procés des dels requeriments i, conseqüentment, és capaç de treballar sobre fonts de dades semànticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semànticament riques. Per aquest motiu, dirigeix el procés de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semànticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmà.Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament és més adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments són ben coneguts d'avantmà i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situació és quan l'usuari no té clares les capacitats d'anàlisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmà esmorteeix la necessitat de disposar de fonts de dades semànticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no són necessaris d'avantmà. Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version

    Metarel, an ontology facilitating advanced querying of biomedical knowledge

    Get PDF
    Knowledge management has become indispensible in the Life Sciences for integrating and querying the enormous amounts of detailed knowledge about genes, organisms, diseases, drugs, cells, etc. Such detailed knowledge is continuously generated in bioinformatics via both hardware (e.g. raw data dumps from micro‐arrays) and software (e.g. computational analysis of data). Well‐known frameworks for managing knowledge are relational databases and spreadsheets. The doctoral dissertation describes knowledge management in two more recently‐investigated frameworks: ontologies and the Semantic Web. Knowledge statements like ‘lions live in Africa’ and ‘genes are located in a cell nucleus’ are managed with the use of URIs, logics and the ontological distinction between instances and classes. Both theory and practice are described. Metarel, the core subject of the dissertation, is an ontology describing relations that can bridge the mismatch between network‐based relations that appeal to internet browsing and logic‐based relations that are formally expressed in Description Logic. Another important subject of the dissertation is BioGateway, which is a knowledge base that has integrated biomedical knowledge in the form of hundreds of millions of network‐based relations in the RDF format. Metarel was used to upgrade the logical meaning of these relations towards Description Logic. This has enabled to build a computer reasoner that could run over the knowledge base and derive new knowledge statements
    corecore