69 research outputs found

    On Complexity of 1-Center in Various Metrics

    Get PDF
    We consider the classic 1-center problem: Given a set P of n points in a metric space find the point in P that minimizes the maximum distance to the other points of P. We study the complexity of this problem in d-dimensional p\ell_p-metrics and in edit and Ulam metrics over strings of length d. Our results for the 1-center problem may be classified based on d as follows. \bullet Small d: We provide the first linear-time algorithm for 1-center problem in fixed-dimensional 1\ell_1 metrics. On the other hand, assuming the hitting set conjecture (HSC), we show that when d=ω(logn)d=\omega(\log n), no subquadratic algorithm can solve 1-center problem in any of the p\ell_p-metrics, or in edit or Ulam metrics. \bullet Large d. When d=Ω(n)d=\Omega(n), we extend our conditional lower bound to rule out sub quartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a (1+ϵ)(1+\epsilon)-approximation for 1-center in Ulam metric with running time Oϵ~(nd+n2d)\tilde{O_{\epsilon}}(nd+n^2\sqrt{d}). We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension d, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of n strings each of length n, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set

    New Approaches to Protein Structure Prediction

    Get PDF
    Protein structure prediction is concerned with the prediction of a protein's three dimensional structure from its amino acid sequence. Such predictions are commonly performed by searching the possible structures and evaluating each structure by using some scoring function. If it is assumed that the target protein structure resembles the structure of a known protein, the search space can be significantly reduced. Such an approach is referred to as comparative structure prediction. When such an assumption is not made, the approach is known as ab initio structure prediction. There are several difficulties in devising efficient searches or in computing the scoring function. Many of these problems have ready solutions from known mathematical methods. However, the problems that are yet unsolved have hindered structure prediction methods from more ideal predictions. The objective of this study is to present a complete framework for ab initio protein structure prediction. To achieve this, a new search strategy is proposed, and better techniques are devised for computing the known scoring functions. Some of the remaining problems in protein structure prediction are revisited. Several of them are shown to be intractable. In many of these cases, approximation methods are suggested as alternative solutions. The primary issues addressed in this thesis are concerned with local structures prediction, structure assembly or sampling, side chain packing, model comparison, and structural alignment. For brevity, we do not elaborate on these problems here; a concise introduction is given in the first section of this thesis. Results from these studies prompted the development of several programs, forming a utility suite for ab initio protein structure prediction. Due to the general usefulness of these programs, some of them are released with open source licenses to benefit the community

    Systems level investigation of the genetic basis of bovine muscle growth and development

    Get PDF
    Skeletal muscle growth is an economically and biologically important trait for livestock raised for meat production. As such, there is great interest in understanding the underlying genomic architecture influencing muscle growth and development. In spite of this, relatively little is known about the genes or biological processes regulating bovine muscle growth. In this thesis, several approaches were undertaken in order to elucidate some of the mechanisms which may be controlling bovine muscle growth and development. The first objective of this thesis was the development of a novel software tool (SNPdat) for the rapid and comprehensive annotation of SNP data for any organism with a draft sequence and annotation. SNPdat was subsequently utilised in chapters 3 and 6 to facilitate the identification of candidate genes and regions involved in bovine muscle growth. In chapter 4, a number of metrics were explored for their usefulness in assessing convergence of a Markov Chain using a Bayesian approach used in genetic prediction. The need to adequately assess convergence using multiple metrics is addressed and recommendations put forward. These recommendations were then implemented in chapter 3. In addition, three separate investigations of bovine muscle growth and development were performed. In chapter 3, a genome-wide association study was performed to identify regions of the bovine genome associated with four economically important carcass traits. This was followed by an examination of the transcriptional responses in muscle tissue of animals undergoing dietary restriction and compensatory growth (chapter 5). Finally, using high-throughput DNA sequencing, a candidate list of 200 genes was interrogated to identify genes which may be evolving at different rates, and under evolutionary selection pressure, in beef compared to dairy animals (chapter 6). A number of genes and biological pathways were found to be involved in traits related to bovine muscle growth, several of which were identified in more than one study

    Label Ranking with Probabilistic Models

    Get PDF
    Diese Arbeit konzentriert sich auf eine spezielle Prognoseform, das sogenannte Label Ranking. Auf den Punkt gebracht, kann Label Ranking als eine Erweiterung des herkömmlichen Klassifizierungproblems betrachtet werden. Bei einer Anfrage (z. B. durch einen Kunden) und einem vordefinierten Set von Kandidaten Labels (zB AUDI, BMW, VW), wird ein einzelnes Label (zB BMW) zur Vorhersage in der Klassifizierung benötigt, während ein komplettes Ranking aller Label (zB BMW> VW> Audi) für das Label Ranking erforderlich ist. Da Vorhersagen dieser Art, bei vielen Problemen der realen Welt nützlich sind, können Label Ranking-Methoden in mehreren Anwendungen, darunter Information Retrieval, Kundenwunsch Lernen und E-Commerce eingesetzt werden. Die vorliegende Arbeit stellt eine Auswahl an Methoden für Label-Ranking vor, die Maschinelles Lernen mit statistischen Bewertungsmodellen kombiniert. Wir konzentrieren wir uns auf zwei statistische Ranking-Modelle, das Mallows- und das Plackett-Luce-Modell und zwei Techniken des maschinellen Lernens, das Beispielbasierte Lernen und das Verallgemeinernde Lineare Modell

    Decision making under uncertainty

    Get PDF
    Almost all important decision problems are inevitably subject to some level of uncertainty either about data measurements, the parameters, or predictions describing future evolution. The significance of handling uncertainty is further amplified by the large volume of uncertain data automatically generated by modern data gathering or integration systems. Various types of problems of decision making under uncertainty have been subject to extensive research in computer science, economics and social science. In this dissertation, I study three major problems in this context, ranking, utility maximization, and matching, all involving uncertain datasets. First, we consider the problem of ranking and top-k query processing over probabilistic datasets. By illustrating the diverse and conflicting behaviors of the prior proposals, we contend that a single, specific ranking function may not suffice for probabilistic datasets. Instead we propose the notion of parameterized ranking functions, that generalize or can approximate many of the previously proposed ranking functions. We present novel exact or approximate algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations or the probability distributions are continuous. The second problem concerns with the stochastic versions of a broad class of combinatorial optimization problems. We observe that the expected value is inadequate in capturing different types of risk-averse or risk-prone behaviors, and instead we consider a more general objective which is to maximize the expected utility of the solution for some given utility function. We present a polynomial time approximation algorithm with additive error ε for any ε > 0, under certain conditions. Our result generalizes and improves several prior results on stochastic shortest path, stochastic spanning tree, and stochastic knapsack. The third is the stochastic matching problem which finds interesting applications in online dating, kidney exchange and online ad assignment. In this problem, the existence of each edge is uncertain and can be only found out by probing the edge. The goal is to design a probing strategy to maximize the expected weight of the matching. We give linear programming based constant-factor approximation algorithms for weighted stochastic matching, which answer an open question raised in prior work

    Transfer k-means: a new supervised clustering approach

    Get PDF
    Η επιτηρούμενη και η μη-επιτηρούμενη μάθηση είναι δύο θεμελιώδη σχήματα μάθησης, των οποίων η διαφορά έγγυται στην παρουσία και απουσία ενός καθηγητή (δηλαδή μιας οντότητας που παρέχει παραδείγματα) αντίστοιχα. Από την άλλη πλευρά, η μεταφορά μάθησης είναι μια ιδέα που στοχεύει να βελτιώσει την μάθηση ενός έργου χρησιμοποιώντας βοηθητική γνώση. Ο στόχος της παρούσας διπλωματικής είναι να διερευνήσει πως αυτά τα δύο θεμελιώδη παραδείγματα μάθησης, επιτηρούμενη και μη-επιτηρούμενη μάθηση, μπορούν να συνεργαστούν στο πλαίσιο της μεταφοράς μάθησης. Ως αποτέλεσμα, αναπτύξαμε τη μέθοδο transfer-KKmeans, μια παραλλαγή της δημοφιλής ευριστικής μεθόδου KKmeans, που βασίζεται στην μεταφορά μάθησης. Η προτεινόμενη μέθοδος εμπλουτίζει την μη-επιτηρούμενη φύση του KKmeans χρησιμοποιώντας επιτήρηση από ένα διαφορετικό αλλά σχετικό χώρο ως τεχνική αρχικοποίησης των συστάδων, με σκοπό να βελτιώσει την απόδοση της ευριστικής αυτής μεθόδου. Παρέχουμε προσεγγιστικές εγγυήσεις σύμφωνα με την φύση της εισόδου και επαληθεύουμε πειραματικά τα οφέλη του transfer-KKmeans χρησιμοποιώντας κείμενα σε φυσική γλώσσα ως ρεαλιστική εφαρμογή.Supervised and unsupervised learning are two fundamental learning schemes whose difference lies in the presence and absence of a supervisor (i.e. entity which provides examples) respectively. On the other hand, transfer learning aims at improving the learning of a task by using auxiliary knowledge. The goal of this thesis was to investigate how the two fundamental paradigms, supervised and unsupervised learning, can collaborate in the setting of transfer learning. As a result, we developed transfer-KKmeans, a transfer learning variant of the popular KKmeans heuristic. The proposed method enhances the unsupervised nature of KKmeans, using supervision from a different but related context as a seeding technique, in order to improve the heuristic's performance towards more meaningful results. We provide approximation guarantees based on the nature of the input and we experimentally validate the benefits of the proposed method using documents as a real-world example

    Edge computing infrastructure for 5G networks: a placement optimization solution

    Get PDF
    This thesis focuses on how to optimize the placement of the Edge Computing infrastructure for upcoming 5G networks. To this aim, the core contributions of this research are twofold: 1) a novel heuristic called Hybrid Simulated Annealing to tackle the NP-hard nature of the problem and, 2) a framework called EdgeON providing a practical tool for real-life deployment optimization. In more detail, Edge Computing has grown into a key solution to 5G latency, reliability and scalability requirements. By bringing computing, storage and networking resources to the edge of the network, delay-sensitive applications, location-aware systems and upcoming real-time services leverage the benefits of a reduced physical and logical path between the end-user and the data or service host. Nevertheless, the edge node placement problem raises critical concerns regarding deployment and operational expenditures (i.e., mainly due to the number of nodes to be deployed), current backhaul network capabilities and non-technical placement limitations. Common approaches to the placement of edge nodes are based on: Mobile Edge Computing (MEC), where the processing capabilities are deployed at the Radio Access Network nodes and Facility Location Problem variations, where a simplistic cost function is used to determine where to optimally place the infrastructure. However, these methods typically lack the flexibility to be used for edge node placement under the strict technical requirements identified for 5G networks. They fail to place resources at the network edge for 5G ultra-dense networking environments in a network-aware manner. This doctoral thesis focuses on rigorously defining the Edge Node Placement Problem (ENPP) for 5G use cases and proposes a novel framework called EdgeON aiming at reducing the overall expenses when deploying and operating an Edge Computing network, taking into account the usage and characteristics of the in-place backhaul network and the strict requirements of a 5G-EC ecosystem. The developed framework implements several placement and optimization strategies thoroughly assessing its suitability to solve the network-aware ENPP. The core of the framework is an in-house developed heuristic called Hybrid Simulated Annealing (HSA), seeking to address the high complexity of the ENPP while avoiding the non-convergent behavior of other traditional heuristics (i.e., when applied to similar problems). The findings of this work validate our approach to solve the network-aware ENPP, the effectiveness of the heuristic proposed and the overall applicability of EdgeON. Thorough performance evaluations were conducted on the core placement solutions implemented revealing the superiority of HSA when compared to widely used heuristics and common edge placement approaches (i.e., a MEC-based strategy). Furthermore, the practicality of EdgeON was tested through two main case studies placing services and virtual network functions over the previously optimally placed edge nodes. Overall, our proposal is an easy-to-use, effective and fully extensible tool that can be used by operators seeking to optimize the placement of computing, storage and networking infrastructure at the users’ vicinity. Therefore, our main contributions not only set strong foundations towards a cost-effective deployment and operation of an Edge Computing network, but directly impact the feasibility of upcoming 5G services/use cases and the extensive existing research regarding the placement of services and even network service chains at the edge

    Abstracts for the twentyfirst European workshop on Computational geometry, Technische Universiteit Eindhoven, The Netherlands, March 9-11, 2005

    Get PDF
    This volume contains abstracts of the papers presented at the 21st European Workshop on Computational Geometry, held at TU Eindhoven (the Netherlands) on March 9–11, 2005. There were 53 papers presented at the Workshop, covering a wide range of topics. This record number shows that the field of computational geometry is very much alive in Europe. We wish to thank all the authors who submitted papers and presented their work at the workshop. We believe that this has lead to a collection of very interesting abstracts that are both enjoyable and informative for the reader. Finally, we are grateful to TU Eindhoven for their support in organizing the workshop and to the Netherlands Organisation for Scientific Research (NWO) for sponsoring the workshop

    On clustering and related problems on curves under the Fréchet distance

    Get PDF
    Sensor measurements can be represented as points in Rd. Ordered by the time-stamps of these measurements, they yield a time series, that can be interpreted as a polygonal curve in the d-dimensional ambient space. The number of the vertices is called complexity of the curve. In this thesis we study several fundamental computational tasks on curves: clustering, sim- plification, and embedding, under the Fr´echet distance, which is a popular distance measure for curves, in its continuous and discrete version. We focus on curves in one-dimensional ambient space R. We study the problem of clustering of the curves in R under the Fr´echet distance, in particular, the following variations of the well-known k-center and k- median problems. Given is a set P of n curves in R, each of complexity at most m. Our goal is to find k curves in R, not necessarily from P , called cluster centers and that each has complexity at most R. In the (k, R)-center problem, the maximum distance of an element of P to its nearest cluster center is minimized. In the (k, R)-median problem, the sum of these distances is minimized. We show that both problems are NP-hard under both versions of the Fr´echet distance, if k is part of the input. Under the continuous Fr´echet distance, we give (1 + ε)-approximation algorithms for both (k, R)-center and (k, R)-median problem, with running time near-linear in the input size for constant ε, k and R. Our techniques yield constant-factor approximation algorithms for the observed problems under the discrete Fr´echet distance. To obtain the (1 + ε)-approximation algorithms for the clustering prob- lems under the continuous Fr´echet distance, we develop a new simplification technique on one-dimensional curve, called δ-signature. The signatures al- ways exist, and we can compute them efficiently. We also study the problem of embedding of the Fr´echet distance into space R. We show that, in the worst case and under reasonable assumptions, the discrete Fr´echet distance between two polygonal curves of complexity m in Rd, where 2 ≤ d ≤ 7, degrades by a factor linear in m with constant probability, when the curves are projected onto a randomly chosen line. We show upper and lower bounds on the distortion. Sensor measurements can also define a discrete distribution over possi- ble locations of a point in Rd. Then, the input consists of n probabilistic points. We study the probabilistic 1-center problem in Euclidean space Rd, also known as the probabilistic smallest enclosing ball (pSEB) problem. To improve the best existing algorithm for the pSEB problem by reducing its exponential dependence on the dimension to linear, we study the determinis- tic set median problem, that generalizes both the 1-center and the 1-median problems. We present a (1 + ε)-approximation algorithm for the set median problem, using a novel combination of sampling techniques and stochastic subgradient descent. Our (1 + ε)-approximation algorithm for the pSEB problem takes linear time in d and n, making the pSEB algorithm applicable to shape fitting problems in Hilbert spaces of unbounded dimension using kernel functions. We present an exemplary application by extending the support vector data description (SVDD) shape fitting method to the probabilistic case
    corecore