2,782 research outputs found
Comprehensible and Robust Knowledge Discovery from Small Datasets
Die Wissensentdeckung in Datenbanken (âKnowledge Discovery in Databasesâ, KDD) zielt darauf ab, nĂŒtzliches Wissen aus Daten zu extrahieren. Daten können eine Reihe
von Messungen aus einem realen Prozess reprÀsentieren oder eine Reihe von Eingabe-
Ausgabe-Werten eines Simulationsmodells. Zwei hĂ€ufig widersprĂŒchliche Anforderungen
an das erworbene Wissen sind, dass es (1) die Daten möglichst exakt zusammenfasst und
(2) in einer gut verstĂ€ndlichen Form vorliegt. EntscheidungsbĂ€ume (âDecision Treesâ) und
Methoden zur Entdeckung von Untergruppen (âSubgroup Discoveryâ) liefern Wissenszusammenfassungen in Form von Hyperrechtecken; diese gelten als gut verstĂ€ndlich.
Um die Bedeutung einer verstÀndlichen Datenzusammenfassung zu demonstrieren,
erforschen wir Dezentrale intelligente Netzsteuerung â ein neues System, das die Bedarfsreaktion in Stromnetzen ohne wesentliche Ănderungen in der Infrastruktur implementiert.
Die bisher durchgefĂŒhrte konventionelle Analyse dieses Systems beschrĂ€nkte sich auf
die BerĂŒcksichtigung identischer Teilnehmer und spiegelte daher die RealitĂ€t nicht ausreichend gut wider. Wir fĂŒhren viele Simulationen mit unterschiedlichen Eingabewerten durch und wenden EntscheidungsbĂ€ume auf die resultierenden Daten an. Mit den daraus resultierenden verstĂ€ndlichen Datenzusammenfassung konnten wir neue Erkenntnisse zum Verhalten der Dezentrale intelligente Netzsteuerung gewinnen.
EntscheidungsbĂ€ume ermöglichen die Beschreibung des Systemverhaltens fĂŒr alle Eingabekombinationen.
Manchmal ist man aber nicht daran interessiert, den gesamten Eingaberaum
zu partitionieren, sondern Bereiche zu finden, die zu bestimmten Ausgabe fĂŒhren
(sog. Untergruppen). Die vorhandenen Algorithmen zum Erkennen von Untergruppen
erfordern normalerweise groĂe Datenmengen, um eine stabile und genaue Ausgabe zu erzielen.
Der Datenerfassungsprozess ist jedoch hÀufig kostspielig. Unser Hauptbeitrag ist die
Verbesserung der Untergruppenerkennung aus DatensÀtzen mit wenigen Beobachtungen.
Die Entdeckung von Untergruppen in simulierten Daten wird als Szenarioerkennung
bezeichnet. Ein hĂ€ufig verwendeter Algorithmus fĂŒr die Szenarioerkennung ist PRIM
(Patient Rule Induction Method). Wir schlagen REDS (Rule Extraction for Discovering
Scenarios) vor, ein neues Verfahren fĂŒr die Szenarioerkennung. FĂŒr REDS, trainieren wir
zuerst ein statistisches Zwischenmodell und verwenden dieses, um eine groĂe Menge
neuer Daten fĂŒr PRIM zu erstellen. Die grundlegende statistische Intuition beschrieben wir
ebenfalls. Experimente zeigen, dass REDS viel besser funktioniert als PRIM fĂŒr sich alleine:
Es reduziert die Anzahl der erforderlichen SimulationslÀufe um 75% im Durchschnitt.
Mit simulierten Daten hat man perfekte Kenntnisse ĂŒber die Eingangsverteilung â eine
Voraussetzung von REDS. Um REDS auf realen Messdaten anwendbar zu machen, haben
wir es mit Stichproben aus einer geschÀtzten multivariate Verteilung der Daten kombiniert.
Wir haben die resultierende Methode in Kombination mit verschiedenen Methoden zur Generierung von Daten experimentell evaluiert. Wir haben dies fĂŒr PRIM und BestInterval â eine weitere reprĂ€sentative Methode zur Erkennung von Untergruppen â gemacht. In den meisten FĂ€llen hat unsere Methodik die QualitĂ€t der entdeckten Untergruppen erhöht
Machine learning in the social and health sciences
The uptake of machine learning (ML) approaches in the social and health
sciences has been rather slow, and research using ML for social and health
research questions remains fragmented. This may be due to the separate
development of research in the computational/data versus social and health
sciences as well as a lack of accessible overviews and adequate training in ML
techniques for non data science researchers. This paper provides a meta-mapping
of research questions in the social and health sciences to appropriate ML
approaches, by incorporating the necessary requirements to statistical analysis
in these disciplines. We map the established classification into description,
prediction, and causal inference to common research goals, such as estimating
prevalence of adverse health or social outcomes, predicting the risk of an
event, and identifying risk factors or causes of adverse outcomes. This
meta-mapping aims at overcoming disciplinary barriers and starting a fluid
dialogue between researchers from the social and health sciences and
methodologically trained researchers. Such mapping may also help to fully
exploit the benefits of ML while considering domain-specific aspects relevant
to the social and health sciences, and hopefully contribute to the acceleration
of the uptake of ML applications to advance both basic and applied social and
health sciences research
Integration of microRNA changes in vivo identifies novel molecular features of muscle insulin resistance in type 2 diabetes
Skeletal muscle insulin resistance (IR) is considered a critical component of type II diabetes, yet to date IR has evaded characterization at the global gene expression level in humans. MicroRNAs (miRNAs) are considered fine-scale rheostats of protein-coding gene product abundance. The relative importance and mode of action of miRNAs in human complex diseases remains to be fully elucidated. We produce a global map of coding and non-coding RNAs in human muscle IR with the aim of identifying novel disease biomarkers. We profiled >47,000 mRNA sequences and >500 human miRNAs using gene-chips and 118 subjects (n = 71 patients versus n = 47 controls). A tissue-specific gene-ranking system was developed to stratify thousands of miRNA target-genes, removing false positives, yielding a weighted inhibitor score, which integrated the net impact of both up- and down-regulated miRNAs. Both informatic and protein detection validation was used to verify the predictions of in vivo changes. The muscle mRNA transcriptome is invariant with respect to insulin or glucose homeostasis. In contrast, a third of miRNAs detected in muscle were altered in disease (n = 62), many changing prior to the onset of clinical diabetes. The novel ranking metric identified six canonical pathways with proven links to metabolic disease while the control data demonstrated no enrichment. The Benjamini-Hochberg adjusted Gene Ontology profile of the highest ranked targets was metabolic (P < 7.4 à 10-8), post-translational modification (P < 9.7 à 10-5) and developmental (P < 1.3 à 10-6) processes. Protein profiling of six development-related genes validated the predictions. Brain-derived neurotrophic factor protein was detectable only in muscle satellite cells and was increased in diabetes patients compared with controls, consistent with the observation that global miRNA changes were opposite from those found during myogenic differentiation. We provide evidence that IR in humans may be related to coordinated changes in multiple microRNAs, which act to target relevant signaling pathways. It would appear that miRNAs can produce marked changes in target protein abundance in vivo by working in a combinatorial manner. Thus, miRNA detection represents a new molecular biomarker strategy for insulin resistance, where micrograms of patient material is needed to monitor efficacy during drug or life-style interventions
Deep Causal Learning for Robotic Intelligence
This invited review discusses causal learning in the context of robotic
intelligence. The paper introduced the psychological findings on causal
learning in human cognition, then it introduced the traditional statistical
solutions on causal discovery and causal inference. The paper reviewed recent
deep causal learning algorithms with a focus on their architectures and the
benefits of using deep nets and discussed the gap between deep causal learning
and the needs of robotic intelligence
Achieving Causal Fairness in Recommendation
Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. âą We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; âą We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; âą We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; âą We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort
Achieving Causal Fairness in Recommendation
Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. âą We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; âą We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; âą We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; âą We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort
A Survey of Methods, Challenges and Perspectives in Causality
Deep Learning models have shown success in a large variety of tasks by
extracting correlation patterns from high-dimensional data but still struggle
when generalizing out of their initial distribution. As causal engines aim to
learn mechanisms independent from a data distribution, combining Deep Learning
with Causality can have a great impact on the two fields. In this paper, we
further motivate this assumption. We perform an extensive overview of the
theories and methods for Causality from different perspectives, with an
emphasis on Deep Learning and the challenges met by the two domains. We show
early attempts to bring the fields together and the possible perspectives for
the future. We finish by providing a large variety of applications for
techniques from Causality.Comment: 40 pages, 37 pages for the main paper and 3 pages for the supplement,
8 figures, submitted to ACM Computing Survey
- âŠ