451 research outputs found

    The Bane of Low-Dimensionality Clustering

    Get PDF
    In this paper, we give a conditional lower bound of nΩ(k)n^{\Omega(k)} on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time nO(k1−1/d)n^{O(k^{1-1/d})} or 2n1−1/d2^{n^{1-1/d}} in d dimensions, our work shows that widely-used clustering objectives have a lower bound of nΩ(k)n^{\Omega(k)}, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than no(k)n^{o(\sqrt{k})}, and provide a matching upper bound of nO(k)n^{O(\sqrt{k})}. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

    The French Atlantic littoral and the Massif Armoricain, part 1

    Get PDF
    The author has identified the following significant results. For interpretation of Isle of Jersey imagery, two types of taxons were defined according to their variability in time. On the whole, taxons with a similar spectral signature were opposed to those with strongly varying spectral signature. The taxon types were low diachronic variations and strong diachronic variation. Imagery interpretation was restricted to the landward part of the Fromentine area, including the sand beaches which were often difficult to spectrally separate from the barren coastal dunes in the southern part of Noirmoutier Island as well as along the Breton marsh. From 1972 to 1976, sandbanks reduced in area. Two high river discharge images showed over a two year period an identical outline for the Bilho bank to seaward, whereas upstream, the bank has receeded in the same time to a line joining Paimboeuf to Montoir. The Brillantes bank has receeded at both ends, partly due to dredging operations in the access channel to Donges harbor

    Parameterized k-Clustering: Tractability Island

    Get PDF
    In k-Clustering we are given a multiset of n vectors X subset Z^d and a nonnegative number D, and we need to decide whether X can be partitioned into k clusters C_1, ..., C_k such that the cost sum_{i=1}^k min_{c_i in R^d} sum_{x in C_i} |x-c_i|_p^p <= D, where |*|_p is the Minkowski (L_p) norm of order p. For p=1, k-Clustering is the well-known k-Median. For p=2, the case of the Euclidean distance, k-Clustering is k-Means. We study k-Clustering from the perspective of parameterized complexity. The problem is known to be NP-hard for k=2 and it is also NP-hard for d=2. It is a long-standing open question, whether the problem is fixed-parameter tractable (FPT) for the combined parameter d+k. In this paper, we focus on the parameterization by D. We complement the known negative results by showing that for p=0 and p=infty, k-Clustering is W1-hard when parameterized by D. Interestingly, the complexity landscape of the problem appears to be more intricate than expected. We discover a tractability island of k-Clustering: for every p in (0,1], k-Clustering is solvable in time 2^O(D log D) (nd)^O(1)

    Measuring the gap between HMM-based ASR and TTS

    Get PDF
    The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems, measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches

    Action representation in the mouse parieto-frontal network

    Get PDF
    The posterior parietal cortex (PPC) and frontal motor areas comprise a cortical network supporting goal-directed behaviour, with functions including sensorimotor transformations and decision making. In primates, this network links performed and observed actions via mirror neurons, which fire both when individuals perform an action and when they observe the same action performed by a conspecific. Mirror neurons are believed to be important for social learning, but it is not known whether mirror-like neurons occur in similar networks in other social species, such as rodents, or if they can be measured in such models using paradigms where observers passively view a demonstrator. Therefore, we imaged Ca2+ responses in PPC and secondary motor cortex (M2) while mice performed and observed pellet-reaching and wheel-running tasks, and found that cell populations in both areas robustly encoded several naturalistic behaviours. However, neural responses to the same set of observed actions were absent, although we verified that observer mice were attentive to performers and that PPC neurons responded reliably to visual cues. Statistical modelling also indicated that executed actions outperformed observed actions in predicting neural responses. These results raise the possibility that sensorimotor action recognition in rodents could take place outside of the parieto-frontal circuit, and underscore that detecting socially-driven neural coding depends critically on the species and behavioural paradigm used

    Graph embedding and geometric deep learning relevance to network biology and structural chemistry

    Get PDF
    Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning

    Network Representation Learning: A Survey

    Full text link
    With the widespread use of information technologies, information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life such as the structure of societies, information diffusion, and communication patterns. In reality, however, the large scale of information networks often makes network analytic tasks computationally expensive or intractable. Network representation learning has been recently proposed as a new learning paradigm to embed network vertices into a low-dimensional vector space, by preserving network topology structure, vertex content, and other side information. This facilitates the original network to be easily handled in the new vector space for further analysis. In this survey, we perform a comprehensive review of the current literature on network representation learning in the data mining and machine learning field. We propose new taxonomies to categorize and summarize the state-of-the-art network representation learning techniques according to the underlying learning mechanisms, the network information intended to preserve, as well as the algorithmic designs and methodologies. We summarize evaluation protocols used for validating network representation learning including published benchmark datasets, evaluation methods, and open source algorithms. We also perform empirical studies to compare the performance of representative algorithms on common datasets, and analyze their computational complexity. Finally, we suggest promising research directions to facilitate future study.Comment: Accepted by IEEE transactions on Big Data; 25 pages, 10 tables, 6 figures and 127 reference

    Fortschritte im unĂŒberwachten Lernen und Anwendungsbereiche: Subspace Clustering mit Hintergrundwissen, semantisches Passworterraten und erlernte Indexstrukturen

    Get PDF
    Over the past few years, advances in data science, machine learning and, in particular, unsupervised learning have enabled significant progress in many scientific fields and even in everyday life. Unsupervised learning methods are usually successful whenever they can be tailored to specific applications using appropriate requirements based on domain expertise. This dissertation shows how purely theoretical research can lead to circumstances that favor overly optimistic results, and the advantages of application-oriented research based on specific background knowledge. These observations apply to traditional unsupervised learning problems such as clustering, anomaly detection and dimensionality reduction. Therefore, this thesis presents extensions of these classical problems, such as subspace clustering and principal component analysis, as well as several specific applications with relevant interfaces to machine learning. Examples include password guessing using semantic word embeddings and learning spatial index structures using statistical models. In essence, this thesis shows that application-oriented research has many advantages for current and future research.In den letzten Jahren haben Fortschritte in der Data Science, im maschinellen Lernen und insbesondere im unĂŒberwachten Lernen zu erheblichen Fortentwicklungen in vielen Bereichen der Wissenschaft und des tĂ€glichen Lebens gefĂŒhrt. Methoden des unĂŒberwachten Lernens sind in der Regel dann erfolgreich, wenn sie durch geeignete, auf Expertenwissen basierende Anforderungen an spezifische Anwendungen angepasst werden können. Diese Dissertation zeigt, wie rein theoretische Forschung zu UmstĂ€nden fĂŒhren kann, die allzu optimistische Ergebnisse begĂŒnstigen, und welche Vorteile anwendungsorientierte Forschung hat, die auf spezifischem Hintergrundwissen basiert. Diese Beobachtungen gelten fĂŒr traditionelle unĂŒberwachte Lernprobleme wie Clustering, Anomalieerkennung und DimensionalitĂ€tsreduktion. Daher werden in diesem Beitrag Erweiterungen dieser klassischen Probleme, wie Subspace Clustering und Hauptkomponentenanalyse, sowie einige spezifische Anwendungen mit relevanten Schnittstellen zum maschinellen Lernen vorgestellt. Beispiele sind das Erraten von Passwörtern mit Hilfe semantischer Worteinbettungen und das Lernen von rĂ€umlichen Indexstrukturen mit Hilfe statistischer Modelle. Im Wesentlichen zeigt diese Arbeit, dass anwendungsorientierte Forschung viele Vorteile fĂŒr die aktuelle und zukĂŒnftige Forschung hat
    • 

    corecore