6 research outputs found
Fortschritte im unüberwachten Lernen und Anwendungsbereiche: Subspace Clustering mit Hintergrundwissen, semantisches Passworterraten und erlernte Indexstrukturen
Over the past few years, advances in data science, machine learning and, in particular, unsupervised learning have enabled significant progress in many scientific fields and even in everyday life. Unsupervised learning methods are usually successful whenever they can be tailored to specific applications using appropriate requirements based on domain expertise. This dissertation shows how purely theoretical research can lead to circumstances that favor overly optimistic results, and the advantages of application-oriented research based on specific background knowledge. These observations apply to traditional unsupervised learning problems such as clustering, anomaly detection and dimensionality reduction. Therefore, this thesis presents extensions of these classical problems, such as subspace clustering and principal component analysis, as well as several specific applications with relevant interfaces to machine learning. Examples include password guessing using semantic word embeddings and learning spatial index structures using statistical models. In essence, this thesis shows that application-oriented research has many advantages for current and future research.In den letzten Jahren haben Fortschritte in der Data Science, im maschinellen Lernen und insbesondere im unüberwachten Lernen zu erheblichen Fortentwicklungen in vielen Bereichen der Wissenschaft und des täglichen Lebens geführt. Methoden des unüberwachten Lernens sind in der Regel dann erfolgreich, wenn sie durch geeignete, auf Expertenwissen basierende Anforderungen an spezifische Anwendungen angepasst werden können. Diese Dissertation zeigt, wie rein theoretische Forschung zu Umständen führen kann, die allzu optimistische Ergebnisse begünstigen, und welche Vorteile anwendungsorientierte Forschung hat, die auf spezifischem Hintergrundwissen basiert. Diese Beobachtungen gelten für traditionelle unüberwachte Lernprobleme wie Clustering, Anomalieerkennung und Dimensionalitätsreduktion. Daher werden in diesem Beitrag Erweiterungen dieser klassischen Probleme, wie Subspace Clustering und Hauptkomponentenanalyse, sowie einige spezifische Anwendungen mit relevanten Schnittstellen zum maschinellen Lernen vorgestellt. Beispiele sind das Erraten von Passwörtern mit Hilfe semantischer Worteinbettungen und das Lernen von räumlichen Indexstrukturen mit Hilfe statistischer Modelle. Im Wesentlichen zeigt diese Arbeit, dass anwendungsorientierte Forschung viele Vorteile für die aktuelle und zukünftige Forschung hat
Simple, Scalable and Effective Clustering via One-Dimensional Projections
Clustering is a fundamental problem in unsupervised machine learning with
many applications in data analysis. Popular clustering algorithms such as
Lloyd's algorithm and -means++ can take time when clustering
points in a -dimensional space (represented by an matrix
) into clusters. In applications with moderate to large , the
multiplicative factor can become very expensive. We introduce a simple
randomized clustering algorithm that provably runs in expected time
for arbitrary . Here is the
total number of non-zero entries in the input dataset , which is upper
bounded by and can be significantly smaller for sparse datasets. We prove
that our algorithm achieves approximation ratio on
any input dataset for the -means objective. We also believe that our
theoretical analysis is of independent interest, as we show that the
approximation ratio of a -means algorithm is approximately preserved under a
class of projections and that -means++ seeding can be implemented in
expected time in one dimension. Finally, we show experimentally
that our clustering algorithm gives a new tradeoff between running time and
cluster quality compared to previous state-of-the-art methods for these tasks.Comment: 41 pages, 6 figures, to appear in NeurIPS 202
Cancer drug sensitivity prediction from routine histology images
Drug sensitivity prediction models can aid in personalising cancer therapy, biomarker discovery, and drug design. Such models require survival data from randomised controlled trials which can be time consuming and expensive. In this proof-of-concept study, we demonstrate for the first time that deep learning can link histological patterns in whole slide images (WSIs) of Haematoxylin & Eosin (H&E) stained breast cancer sections with drug sensitivities inferred from cell lines. We employ patient-wise drug sensitivities imputed from gene expression-based mapping of drug effects on cancer cell lines to train a deep learning model that predicts patients’ sensitivity to multiple drugs from WSIs. We show that it is possible to use routine WSIs to predict the drug sensitivity profile of a cancer patient for a number of approved and experimental drugs. We also show that the proposed approach can identify cellular and histological patterns associated with drug sensitivity profiles of cancer patients
XXIII Edición del Workshop de Investigadores en Ciencias de la Computación : Libro de actas
Compilación de las ponencias presentadas en el XXIII Workshop de Investigadores en Ciencias de la Computación (WICC), llevado a cabo en Chilecito (La Rioja) en abril de 2021.Red de Universidades con Carreras en Informátic