208 research outputs found

    Reliably Capture Local Clusters in Noisy Domains From Parallel Universes

    Get PDF
    When seeking for small local patterns it is very intricate to distinguish between incidental agglomeration of noisy points and true local patterns. We propose a new approach that addresses this problem by exploiting temporal information which is contained in most business data sets. The algorithm enables the detection of local patterns in noisy data sets more reliable compared to the case when the temporal information is ignored. This is achieved by making use of the fact that noise does not reproduce its incidental structure but even small patterns do. In particular, we developed a method to track clusters over time based on an optimal match of data partitions between time periods

    Immunology as a metaphor for computational information processing : fact or fiction?

    Get PDF
    The biological immune system exhibits powerful information processing capabilities, and therefore is of great interest to the computer scientist. A rapidly expanding research area has attempted to model many of the features inherent in the natural immune system in order to solve complex computational problems. This thesis examines the metaphor in detail, in an effort to understand and capitalise on those features of the metaphor which distinguish it from other existing methodologies. Two problem domains are considered — those of scheduling and data-clustering. It is argued that these domains exhibit similar characteristics to the environment in which the biological immune system operates and therefore that they are suitable candidates for application of the metaphor. For each problem domain, two distinct models are developed, incor-porating a variety of immunological principles. The models are tested on a number of artifical benchmark datasets. The success of the models on the problems considered confirms the utility of the metaphor

    Learning with Graphs using Kernels from Propagated Information

    Get PDF
    Traditional machine learning approaches are designed to learn from independent vector-valued data points. The assumption that instances are independent, however, is not always true. On the contrary, there are numerous domains where data points are cross-linked, for example social networks, where persons are linked by friendship relations. These relations among data points make traditional machine learning diffcult and often insuffcient. Furthermore, data points themselves can have complex structure, for example molecules or proteins constructed from various bindings of different atoms. Networked and structured data are naturally represented by graphs, and for learning we aimto exploit their structure to improve upon non-graph-based methods. However, graphs encountered in real-world applications often come with rich additional information. This naturally implies many challenges for representation and learning: node information is likely to be incomplete leading to partially labeled graphs, information can be aggregated from multiple sources and can therefore be uncertain, or additional information on nodes and edges can be derived from complex sensor measurements, thus being naturally continuous. Although learning with graphs is an active research area, learning with structured data, substantially modeling structural similarities of graphs, mostly assumes fully labeled graphs of reasonable sizes with discrete and certain node and edge information, and learning with networked data, naturally dealing with missing information and huge graphs, mostly assumes homophily and forgets about structural similarity. To close these gaps, we present a novel paradigm for learning with graphs, that exploits the intermediate results of iterative information propagation schemes on graphs. Originally developed for within-network relational and semi-supervised learning, these propagation schemes have two desirable properties: they capture structural information and they can naturally adapt to the aforementioned issues of real-world graph data. Additionally, information propagation can be efficiently realized by random walks leading to fast, flexible, and scalable feature and kernel computations. Further, by considering intermediate random walk distributions, we can model structural similarity for learning with structured and networked data. We develop several approaches based on this paradigm. In particular, we introduce propagation kernels for learning on the graph level and coinciding walk kernels and Markov logic sets for learning on the node level. Finally, we present two application domains where kernels from propagated information successfully tackle real-world problems

    Perturbative quantum simulation

    Full text link
    Approximations based on perturbation theory are the basis for most of the quantitative predictions of quantum mechanics, whether in quantum field theory, many-body physics, chemistry or other domains. Quantum computing provides an alternative to the perturbation paradigm, but the tens of noisy qubits currently available in state-of-the-art quantum processors are of limited practical utility. In this article, we introduce perturbative quantum simulation, which combines the complementary strengths of the two approaches, enabling the solution of large practical quantum problems using noisy intermediate-scale quantum hardware. The use of a quantum processor eliminates the need to identify a solvable unperturbed Hamiltonian, while the introduction of perturbative coupling permits the quantum processor to simulate systems larger than the available number of physical qubits. After introducing the general perturbative simulation framework, we present an explicit example algorithm that mimics the Dyson series expansion. We then numerically benchmark the method for interacting bosons, fermions, and quantum spins in different topologies, and study different physical phenomena on systems of up to 4848 qubits, such as information propagation, charge-spin separation and magnetism. In addition, we use 5 physical qubits on the IBMQ cloud to experimentally simulate the 88-qubit Ising model using our algorithm. The result verifies the noise robustness of our method and illustrates its potential for benchmarking large quantum processors with smaller ones.Comment: 35 pages, 12 figure

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

    Get PDF
    The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

    Probabilistic techniques in semantic mapping for mobile robotics

    Get PDF
    Los mapas semánticos son representaciones del mundo que permiten a un robot entender no sólo los aspectos espaciales de su lugar de trabajo, sino también el significado de sus elementos (objetos, habitaciones, etc.) y como los humanos interactúan con ellos (e.g. funcionalidades, eventos y relaciones). Para conseguirlo, un mapa semántico añade a las representaciones puramente espaciales, tales como mapas geométricos o topológicos, meta-información sobre los tipos de elementos y relaciones que pueden encontrarse en el entorno de trabajo. Esta meta-información, denominada conocimiento semántico o de sentido común, se codifica típicamente en Bases de Conocimiento. Un ejemplo de este tipo de información podría ser: "los frigoríficos son objetos grandes, con forma rectangular, colocados normalmente en las cocinas, y que pueden contener comida perecedera y medicación". Codificar y manejar este conocimiento semántico permite al robot razonar acerca de la información obtenida de un cierto lugar de trabajo, así como inferir nueva información con el fin de ejecutar eficientemente tareas de alto nivel como "¡hola robot! llévale la medicación a la abuela, por favor". La presente tesis propone la utilización de técnicas probabilísticas para construir y mantener mapas semánticos, lo cual presenta tres ventajas principales en comparación con los enfoques tradicionales: i) permite manejar incertidumbre (proveniente de los sensores imprecisos del robot y de los modelos empleados), ii) provee representaciones del entorno coherentes por medio del aprovechamiento de las relaciones contextuales entre los elementos observados (e.g. los frigoríficos usualmente se encuentran en las cocinas) desde un punto de vista holístico, y iii) produce valores de certidumbre que reflejan el grado de exactitud de la comprensión del robot acerca de su entorno. Específicamente, las contribuciones presentadas pueden agruparse en dos temas principales. El primer conjunto de contribuciones se basa en el problema del reconocimiento de objetos y/o habitaciones, ya que los sistemas de mapeo semántico deben contar con algoritmos de reconocimiento fiables para la construcción de representaciones válidas. Para ello se ha explorado la utilización de Modelos Gráficos Probabilísticos (Probabilistic Graphical Models o PGMs en inglés) con el fin de aprovechar las relaciones de contexto entre objetos y/o habitaciones a la vez que se maneja la incertidumbre inherente al problema de reconocimiento, y el empleo de Bases de Conocimiento para mejorar su desempeño de distintos modos, e.g., detectando resultados incoherentes, proveyendo información a priori, reduciendo la complejidad de los algoritmos de inferencia probabilística, generando ejemplos de entrenamiento sintéticos, habilitando el aprendizaje a partir de experiencias pasadas, etc. El segundo grupo de contribuciones acomoda los resultados probabilísticos provenientes de los algoritmos de reconocimiento desarrollados en una nueva representación semántica, denominada Multiversal Semantic Map (MvSmap). Este mapa gestiona múltiples interpretaciones del espacio de trabajo del robot, llamadas universos, los cuales son anotados con la probabilidad de ser los correctos de acuerdo con el conocimiento actual del robot. Así, este enfoque proporciona una creencia fundamentada sobre la exactitud de la comprensión del robot sobre su entorno, lo que le permite operar de una manera más eficiente y coherente. Los algoritmos probabilísticos propuestos han sido testeados concienzudamente y comparados con otros enfoques actuales e innovadores empleando conjuntos de datos del estado del arte. De manera adicional, esta tesis también contribuye con dos conjuntos de datos, UMA-Offices and Robot@Home, los cuales contienen información sensorial capturada en distintos entornos de oficinas y casas, así como dos herramientas software, la librería Undirected Probabilistic Graphical Models in C++ (UPGMpp), y el conjunto de herramientas Object Labeling Toolkit (OLT), para el trabajo con Modelos Gráficos Probabilísticos y el procesamiento de conjuntos de datos respectivamente

    Application and Optimization of Contact-Guided Replica Exchange Molecular Dynamics

    Get PDF
    Proteine sind komplexe Makromoleküle, die in lebenden Organismen eine große Vielfalt an wichtigen Aufgaben erfüllen. Proteine können beispielsweise Gene regulieren, Struktur stabilisieren, Zellsignale übertragen, Substanzen transportieren und vieles mehr. Typischerweise sind umfassende Kenntnisse von Struktur und Dynamik eines Proteins erforderlich um dessen physiologische Funktion und Interaktionsmechanismen vollständig zu verstehen. Gewonnene Erkenntnisse sind für Biowissenschaften unerlässlich und können auf viele Bereiche angewendet werden, wie z.B. für Arzneimitteldesign oder zur Krankheitsbehandlung. Trotz des unfassbaren Fortschritts experimenteller Techniken bleibt die Bestimmung einer Proteinstruktur immer noch eine herausfordernde Aufgabe. Außerdem können Experimente nur Teilinformationen liefern und Messdaten können mehrdeutig und schwer zu interpretieren sein. Aus diesem Grund werden häufig Computersimulationen durchgeführt um weitere Erkenntnisse zu liefern und die Lücke zwischen Theorie und Experiment zu schließen. Heute sind viele in-silico Methoden in der Lage genaue Protein Strukturmodelle zu erzeugen, sei es mit einem de novo Ansatz oder durch Verbesserung eines anfänglichen Modells unter Berücksichtigung experimenteller Daten. In dieser Dissertation erforsche ich die Möglichkeiten von Replica Exchange Molekulardynamik (REX MD) als ein physikbasierter Ansatz zur Erzeugung von physikalisch sinnvollen Proteinstrukturen. Dabei lege ich den Fokus darauf möglichst nativähnliche Strukturen zu erhalten und untersuche die Stärken und Schwächen der angewendeten Methode. Ich erweitere die Standardanwendung, indem ich ein kontaktbasiertes Bias-Potential integriere um die Leistung und das Endergebnis von REX zu verbessern. Die Einbeziehung nativer Kontaktpaare, die sowohl aus theoretischen als auch aus experimentellen Quellen abgeleitet werden können, treibt die Simulation in Richtung gewünschter Konformationen und reduziert dementsprechend den notwendigen Rechenaufwand. Während meiner Arbeit führte ich mehrere Studien durch mit dem Ziel, die Anreicherung von nativ-ähnlichen Strukturen zu maximieren, wodurch der End-to-End Prozess von geleitetem REX MD optimiert wird. Jede Studie zielt darauf ab wichtige Aspekte der verwendeten Methode zu untersuchen und zu verbessern: 1) Ich studiere die Auswirkungen verschiedener Auswahlen von Bias-Kontakten, insbesondere die Reichweitenabhängigkeit und den negativen Einfluss von fehlerhaften Kontakten. Dadurch kann ich ermitteln, welche Art von Bias zu einer signifikanten Anreicherung von nativ-ähnlichen Konformationen führen im Vergleich zu regulärem REX. 2) Ich führe eine Parameteroptimierung am verwendeten Bias-Potential durch. Der Vergleich von Ergebnissen aus REX-Simulationen unter Verwendung unterschiedlicher sigmoidförmiger Potentiale weist mir sinnvolle Parameter Bereiche auf, wodurch ich ein ideales Bias-Potenzial für den allgemeinen Anwendungsfall ableiten kann. 3) Ich stelle eine de novo Faltungsmethode vor, die möglichst schnell viele einzigartige Startstrukturen für REX generieren kann. Dabei untersuche ich ausführlich die Leistung dieser Methode und vergleiche zwei verschiedene Ansätze zur Auswahl der Startstruktur. Das Ergebnis von REX wird stark verbessert, falls Strukturen bereits zu Beginn eine große Bandbreite des Konformationsraumes abdecken und gleichzeitig eine geringe Distanz zum angestrebten Zustand aufweisen. 4) Ich untersuche vier komplexe Algorithmusketten, die in der Lage sind repräsentative Strukturen aus großen biomolekularen Ensembles zu extrahieren, welche durch REX erzeugt wurden. Dabei studiere ich ihre Robustheit und Zuverlässigkeit, vergleiche sie miteinander und bewerte ihre erbrachte Leistung numerisch. 5) Basierend auf meiner Erfahrung mit geleitetem REX MD habe ich ein Python-Paket entwickelt um REX-Projekte zu automatisieren und zu vereinfachen. Es ermöglicht einem Benutzer das Entwerfen, Ausführen, Analysieren und Visualisieren eines REX-Projektes in einer interaktiven und benutzerfreundlichen Umgebung

    Informed Segmentation Approaches for Studying Time-Varying Functional Connectivity in Resting State fMRI

    Full text link
    The brain is a complex dynamical system that is never truly “at rest”. Even in the absence of explicit task demands, the brain still manifests a stream of conscious thought, varying levels of vigilance and arousal, as well as a number of postulated ongoing “under the hood” functions such as memory consolidation. Over the past decade, the field of time-varying functional connectivity (TVFC) has emerged as a means of detecting dynamic reconfigurations of the network structure in the resting brain, as well as uncovering the relevance of these changing connectivity patterns with respect to cognition, behavior, and psychopathology. Since the nature and timescales of the underlying resting dynamics are unknown, methodologies that can detect changing temporal patterns in connectivity without imposing arbitrary timescales are required. Moreover, as the study of TVFC is still in its infancy, rigorous evaluation of new and existing methodologies is critical to better understand their behavior when applied in resting data, which lacks ground truth temporal landmarks against which accuracy can be assessed. In this dissertation, I contribute to the methodological component of the TVFC discourse. I propose two distinct, yet related, approaches for identifying TVFC using an informed segmentation framework. This data-driven framework bridges instantaneous and windowed approaches for studying TVFC, in an attempt to mitigate the limitations of each while simultaneously leveraging the advantages of both. I also present a comprehensive, head-to-head comparative analysis of several of the most promising TVFC methodologies proposed to date, which does not exist in the current body of literature.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170046/1/marlenad_1.pd
    corecore