Search CORE

11 research outputs found

μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix

Author: A Budhu
A Keller
A Keller
A Schaefer
AD McIver
B Efron
BS Taylor
C Ambroise
C Blenkiron
C Clape
C Ding
C Wang
CW Tseng
D Slezak
D Slezak
FJ Ortega
H Hirata
H Zhao
J Fang
J Guo
J Lu
JJ Valdes
JM Wei
JR Quinlan
KP Porkka
L Wang
M Hart
M Ozen
M Pesta
M Raponi
M Zhu
MG Schrauder
MV Iorio
P Buelmann
P Maji
P Maji
P Maji
P Maji
P Maji
P Maji
P Maji
PM Pereira
Pradipta Maji
Q Jiang
R Tibshirani
R Xu
S Arora
S Li
S Nasser
S Paul
S Paul
S Paul
S Volinia
Sushmita Paul
T Hastie
TR Golub
U Lehmann
U Ralfkiaer
Vapnik V
Y Chen
Y Sylvestre
Z Pawlak
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Novel Spark-based Attribute Reduction and Neighborhood Classification for Rough Evidence

Author: Ding Weiping
Huang Jiashuang
Ju Hengrong
Li Ming
Lin Chin-Teng
Liu J.
Sun Ying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2022
Field of study

Ulster University's Research Portal

Feature Scaling via Second-Order Cone Programming

Author: Zhizheng Liang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Feature scaling has attracted considerable attention during the past several decades because of its important role in feature selection. In this paper, a novel algorithm for learning scaling factors of features is proposed. It first assigns a nonnegative scaling factor to each feature of data and then adopts a generalized performance measure to learn the optimal scaling factors. It is of interest to note that the proposed model can be transformed into a convex optimization problem: second-order cone programming (SOCP). Thus the scaling factors of features in our method are globally optimal in some sense. Several experiments on simulated data, UCI data sets, and the gene data set are conducted to demonstrate that the proposed method is more effective than previous methods

Crossref

Directory of Open Access Journals

Correlation Clustering

Author: Zimek Arthur
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 30/06/2008
Field of study

Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The core step of the KDD process is the application of a Data Mining algorithm in order to produce a particular enumeration of patterns and relationships in large databases. Clustering is one of the major data mining techniques and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. This can serve to group customers with similar interests, or to group genes with related functionalities. Currently, a challenge for clustering-techniques are especially high dimensional feature-spaces. Due to modern facilities of data collection, real data sets usually contain many features. These features are often noisy or exhibit correlations among each other. However, since these effects in different parts of the data set are differently relevant, irrelevant features cannot be discarded in advance. The selection of relevant features must therefore be integrated into the data mining technique. Since about 10 years, specialized clustering approaches have been developed to cope with problems in high dimensional data better than classic clustering approaches. Often, however, the different problems of very different nature are not distinguished from one another. A main objective of this thesis is therefore a systematic classification of the diverse approaches developed in recent years according to their task definition, their basic strategy, and their algorithmic approach. We discern as main categories the search for clusters (i) w.r.t. closeness of objects in axis-parallel subspaces, (ii) w.r.t. common behavior (patterns) of objects in axis-parallel subspaces, and (iii) w.r.t. closeness of objects in arbitrarily oriented subspaces (so called correlation cluster). For the third category, the remaining parts of the thesis describe novel approaches. A first approach is the adaptation of density-based clustering to the problem of correlation clustering. The starting point here is the first density-based approach in this field, the algorithm 4C. Subsequently, enhancements and variations of this approach are discussed allowing for a more robust, more efficient, or more effective behavior or even find hierarchies of correlation clusters and the corresponding subspaces. The density-based approach to correlation clustering, however, is fundamentally unable to solve some issues since an analysis of local neighborhoods is required. This is a problem in high dimensional data. Therefore, a novel method is proposed tackling the correlation clustering problem in a global approach. Finally, a method is proposed to derive models for correlation clusters to allow for an interpretation of the clusters and facilitate more thorough analysis in the corresponding domain science. Finally, possible applications of these models are proposed and discussed.Knowledge Discovery in Databases (KDD) ist der Prozess der automatischen Extraktion von Wissen aus großen Datenmengen, das gültig, bisher unbekannt und potentiell nützlich für eine gegebene Anwendung ist. Der zentrale Schritt des KDD-Prozesses ist das Anwenden von Data Mining-Techniken, um nützliche Beziehungen und Zusammenhänge in einer aufbereiteten Datenmenge aufzudecken. Eine der wichtigsten Techniken des Data Mining ist die Cluster-Analyse (Clustering). Dabei sollen die Objekte einer Datenbank in Gruppen (Cluster) partitioniert werden, so dass Objekte eines Clusters möglichst ähnlich und Objekte verschiedener Cluster möglichst unähnlich zu einander sind. Hier können beispielsweise Gruppen von Kunden identifiziert werden, die ähnliche Interessen haben, oder Gruppen von Genen, die ähnliche Funktionalitäten besitzen. Eine aktuelle Herausforderung für Clustering-Verfahren stellen hochdimensionale Feature-Räume dar. Reale Datensätze beinhalten dank moderner Verfahren zur Datenerhebung häufig sehr viele Merkmale (Features). Teile dieser Merkmale unterliegen oft Rauschen oder Abhängigkeiten und können meist nicht im Vorfeld ausgesiebt werden, da diese Effekte in Teilen der Datenbank jeweils unterschiedlich ausgeprägt sind. Daher muss die Wahl der Features mit dem Data-Mining-Verfahren verknüpft werden. Seit etwa 10 Jahren werden vermehrt spezialisierte Clustering-Verfahren entwickelt, die mit den in hochdimensionalen Feature-Räumen auftretenden Problemen besser umgehen können als klassische Clustering-Verfahren. Hierbei wird aber oftmals nicht zwischen den ihrer Natur nach im Einzelnen sehr unterschiedlichen Problemen unterschieden. Ein Hauptanliegen der Dissertation ist daher eine systematische Einordnung der in den letzten Jahren entwickelten sehr diversen Ansätze nach den Gesichtspunkten ihrer jeweiligen Problemauffassung, ihrer grundlegenden Lösungsstrategie und ihrer algorithmischen Vorgehensweise. Als Hauptkategorien unterscheiden wir hierbei die Suche nach Clustern (1.) hinsichtlich der Nähe von Cluster-Objekten in achsenparallelen Unterräumen, (2.) hinsichtlich gemeinsamer Verhaltensweisen (Mustern) von Cluster-Objekten in achsenparallelen Unterräumen und (3.) hinsichtlich der Nähe von Cluster-Objekten in beliebig orientierten Unterräumen (sogenannte Korrelations-Cluster). Für die dritte Kategorie sollen in den weiteren Teilen der Dissertation innovative Lösungsansätze entwickelt werden. Ein erster Lösungsansatz basiert auf einer Erweiterung des dichte-basierten Clustering auf die Problemstellung des Korrelations-Clustering. Den Ausgangspunkt bildet der erste dichtebasierte Ansatz in diesem Bereich, der Algorithmus 4C. Anschließend werden Erweiterungen und Variationen dieses Ansatzes diskutiert, die robusteres, effizienteres oder effektiveres Verhalten aufweisen oder sogar Hierarchien von Korrelations-Clustern und den entsprechenden Unterräumen finden. Die dichtebasierten Korrelations-Cluster-Verfahren können allerdings einige Probleme grundsätzlich nicht lösen, da sie auf der Analyse lokaler Nachbarschaften beruhen. Dies ist in hochdimensionalen Feature-Räumen problematisch. Daher wird eine weitere Neuentwicklung vorgestellt, die das Korrelations-Cluster-Problem mit einer globalen Methode angeht. Schließlich wird eine Methode vorgestellt, die Cluster-Modelle für Korrelationscluster ableitet, so dass die gefundenen Cluster interpretiert werden können und tiefergehende Untersuchungen in der jeweiligen Fachdisziplin zielgerichtet möglich sind. Mögliche Anwendungen dieser Modelle werden abschließend vorgestellt und untersucht

Digitale Hochschulschriften der LMU

A Novel Variable Precision Reduction Approach to Comprehensive Knowledge Systems

Author: Chen C. L. Philip
Liu Hongbo
McLoone Sean
Wu Xindong
Yang Chao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/01/2018
Field of study

Queen's University Belfast Research Portal

Design of Physical System Experiments Using Bayes Linear Emulation and History Matching Methodology with Application to Arabidopsis Thaliana

Author: JACKSON SAMUEL,EDWARD
Publication venue
Publication date: 01/01/2018
Field of study

There are many physical processes within our world which scientists aim to understand. Computer models representing these processes are fundamental to achieving such understanding. Bayes linear emulation is a powerful tool for comprehensively exploring the behaviour of computationally intensive models. History matching is a method for finding the set of inputs to a computer model for which the corresponding model outputs give acceptable matches to observed data, given our state of uncertainty regarding the model itself, the measurements, and, if used, the emulators representing the model. This thesis provides three major developments to the current methodology in this area. We develop sequential history matching methodology by splitting the available data into groups and gaining insight about the information obtained from each group. Such insight is then realised through a wide array of novel visualisations. We develop emulation techniques for the case when there are hypersurfaces of input space across which we have essentially perfect knowledge about the model’s behaviour. Finally, we have developed the use of history matching methodology as criteria for the design of physical system experiments. We outline the general framework for design in a history matching setting, before discussing many extensions, including the performance of a comprehensive robustness analysis on our design choice. We outline our novel methodology on a model of hormonal crosstalk in the roots of an Arabidopsis plant

Durham e-Theses

Theoretische Untersuchungen Kovalenter Mechanochemie

Author: Müller Julian
Publication venue
Publication date: 01/01/2017
Field of study

This thesis is concerned with computational-chemistry investigations of mechanoresponsive molecules which feature predetermined breaking points (PBPs). The mechanophoric systems have been approached at different levels of theory. Reactive molecular dynamics (rMD), density functional theory (DFT), second order Møller-Plesset perturbation theory (MP2) and multireference methods were employed to obtain a complete picture of the mechanochemical reactions.Die vorliegende Arbeit befasst sich mit der Untersuchung von mechanoresponsiven Molekülen, die molekulare Sollbruchstellen (PBPs) enhalten, mit Methoden der Computerchemie. In Abhängigkeit von den jeweiligen Fragestellungen kamen dabei unterschiedliche Methoden zum Einsatz. Reaktive molekulare Dynamik (rMD), Dichtefunktionaltheorie (DFT), Møller-Plesset Störungstheorie zweiter Ordnung (MP2) und Multireferenzverfahren wurden verwendet, um ein vollständiges Bild der mechanochemischen Reaktionen zu erhalten

MACAU: Open Access Repository of Kiel University

Heterogeneous Populations in B Cell Memory Responses and Glioblastoma Growth

Author: Buchauer Lisa Franziska
Publication venue
Publication date: 01/01/2018
Field of study

Heterogeneity is a hallmark of biological systems at every conceivable scale. In this work, I develop computational methods for describing various interacting types of biological heterogeneity. I apply them to explore two scenarios of biomedical interest: the evocation of protective B cell responses by vaccination and the growth dynamics of an aggressive brain tumour. In the vast majority of currently licensed vaccines, antibody titres are strong correlates of vaccine-induced immunity. However, diseases like influenza, tuberculosis and malaria continue to escape efficient vaccination, and the mechanisms behind many established vaccines remain incompletely understood. In the first part of this work, I therefore develop a data-driven computational model of the B cell memory response to vaccination based on an ensemble of simulated germinal centres. This model can address immunisation problems of different difficulty levels by allowing both pathogen- and host-specific parameters to vary. Using this framework, I show that two distinct bottlenecks for successful vaccination exist: the availability of high-quality precursors for clonal selection and the efficiency of affinity maturation dependent on binding complexity. Together with experimental collaborators, we have used these results to interpret single-cell immunoglobulin sequencing data from a vaccination trial targeting the malaria parasite Plasmodium falciparum (Pf ). As predicted for a complex antigen, after repeated immunisation with Pf sporozoites, the clonal selection of potent germline and memory B cell precursors against a major surface protein outpaces affinity maturation because the majority of immunoglobulin gene mutations are affinity-neutral. These findings have implications for the design of potentially personalised vaccination strategies to induce potent B cell responses against structurally complex antigens. A quantitative understanding of functional cell heterogeneity in tumour growth promises insights into the fundamentals of cancer biology. In the second part of this work, I correspondingly develop mathematical models of glioblastoma growth. Employing a Bayesian approach to parameter estimation and incorporating a large body of experimental data from mouse models, I show that brain tumour stem cells drive exponential tumour growth while more differentiated tumour progenitor cells, although fast cycling, are unable to sustain expansion by themselves. Comparing a three-dimensional simulation of tumour growth to experimental growth curves, I derive that glioblastoma stem cells are highly migratory. Based on single-cell clonal tracing data and a combination of deterministic and stochastic modelling approaches, I identify their migration rate and explain experimentally observed clone size distributions. Finally, I employ the resulting fully quantified model of tumour growth to predict the response to two therapeutic interventions. These predictions were verified experimentally by our collaborators, suggesting that quantitative knowledge on the hierarchical subpopulation structure of a tumour may provide valuable guidance for treatment

Heidelberger Dokumentenserver