2,782 research outputs found

    Comprehensible and Robust Knowledge Discovery from Small Datasets

    Get PDF
    Die Wissensentdeckung in Datenbanken (“Knowledge Discovery in Databases”, KDD) zielt darauf ab, nĂŒtzliches Wissen aus Daten zu extrahieren. Daten können eine Reihe von Messungen aus einem realen Prozess reprĂ€sentieren oder eine Reihe von Eingabe- Ausgabe-Werten eines Simulationsmodells. Zwei hĂ€ufig widersprĂŒchliche Anforderungen an das erworbene Wissen sind, dass es (1) die Daten möglichst exakt zusammenfasst und (2) in einer gut verstĂ€ndlichen Form vorliegt. EntscheidungsbĂ€ume (“Decision Trees”) und Methoden zur Entdeckung von Untergruppen (“Subgroup Discovery”) liefern Wissenszusammenfassungen in Form von Hyperrechtecken; diese gelten als gut verstĂ€ndlich. Um die Bedeutung einer verstĂ€ndlichen Datenzusammenfassung zu demonstrieren, erforschen wir Dezentrale intelligente Netzsteuerung — ein neues System, das die Bedarfsreaktion in Stromnetzen ohne wesentliche Änderungen in der Infrastruktur implementiert. Die bisher durchgefĂŒhrte konventionelle Analyse dieses Systems beschrĂ€nkte sich auf die BerĂŒcksichtigung identischer Teilnehmer und spiegelte daher die RealitĂ€t nicht ausreichend gut wider. Wir fĂŒhren viele Simulationen mit unterschiedlichen Eingabewerten durch und wenden EntscheidungsbĂ€ume auf die resultierenden Daten an. Mit den daraus resultierenden verstĂ€ndlichen Datenzusammenfassung konnten wir neue Erkenntnisse zum Verhalten der Dezentrale intelligente Netzsteuerung gewinnen. EntscheidungsbĂ€ume ermöglichen die Beschreibung des Systemverhaltens fĂŒr alle Eingabekombinationen. Manchmal ist man aber nicht daran interessiert, den gesamten Eingaberaum zu partitionieren, sondern Bereiche zu finden, die zu bestimmten Ausgabe fĂŒhren (sog. Untergruppen). Die vorhandenen Algorithmen zum Erkennen von Untergruppen erfordern normalerweise große Datenmengen, um eine stabile und genaue Ausgabe zu erzielen. Der Datenerfassungsprozess ist jedoch hĂ€ufig kostspielig. Unser Hauptbeitrag ist die Verbesserung der Untergruppenerkennung aus DatensĂ€tzen mit wenigen Beobachtungen. Die Entdeckung von Untergruppen in simulierten Daten wird als Szenarioerkennung bezeichnet. Ein hĂ€ufig verwendeter Algorithmus fĂŒr die Szenarioerkennung ist PRIM (Patient Rule Induction Method). Wir schlagen REDS (Rule Extraction for Discovering Scenarios) vor, ein neues Verfahren fĂŒr die Szenarioerkennung. FĂŒr REDS, trainieren wir zuerst ein statistisches Zwischenmodell und verwenden dieses, um eine große Menge neuer Daten fĂŒr PRIM zu erstellen. Die grundlegende statistische Intuition beschrieben wir ebenfalls. Experimente zeigen, dass REDS viel besser funktioniert als PRIM fĂŒr sich alleine: Es reduziert die Anzahl der erforderlichen SimulationslĂ€ufe um 75% im Durchschnitt. Mit simulierten Daten hat man perfekte Kenntnisse ĂŒber die Eingangsverteilung — eine Voraussetzung von REDS. Um REDS auf realen Messdaten anwendbar zu machen, haben wir es mit Stichproben aus einer geschĂ€tzten multivariate Verteilung der Daten kombiniert. Wir haben die resultierende Methode in Kombination mit verschiedenen Methoden zur Generierung von Daten experimentell evaluiert. Wir haben dies fĂŒr PRIM und BestInterval — eine weitere reprĂ€sentative Methode zur Erkennung von Untergruppen — gemacht. In den meisten FĂ€llen hat unsere Methodik die QualitĂ€t der entdeckten Untergruppen erhöht

    Machine learning in the social and health sciences

    Get PDF
    The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, and research using ML for social and health research questions remains fragmented. This may be due to the separate development of research in the computational/data versus social and health sciences as well as a lack of accessible overviews and adequate training in ML techniques for non data science researchers. This paper provides a meta-mapping of research questions in the social and health sciences to appropriate ML approaches, by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, and causal inference to common research goals, such as estimating prevalence of adverse health or social outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes. This meta-mapping aims at overcoming disciplinary barriers and starting a fluid dialogue between researchers from the social and health sciences and methodologically trained researchers. Such mapping may also help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences, and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research

    Integration of microRNA changes in vivo identifies novel molecular features of muscle insulin resistance in type 2 diabetes

    Get PDF
    Skeletal muscle insulin resistance (IR) is considered a critical component of type II diabetes, yet to date IR has evaded characterization at the global gene expression level in humans. MicroRNAs (miRNAs) are considered fine-scale rheostats of protein-coding gene product abundance. The relative importance and mode of action of miRNAs in human complex diseases remains to be fully elucidated. We produce a global map of coding and non-coding RNAs in human muscle IR with the aim of identifying novel disease biomarkers. We profiled >47,000 mRNA sequences and >500 human miRNAs using gene-chips and 118 subjects (n = 71 patients versus n = 47 controls). A tissue-specific gene-ranking system was developed to stratify thousands of miRNA target-genes, removing false positives, yielding a weighted inhibitor score, which integrated the net impact of both up- and down-regulated miRNAs. Both informatic and protein detection validation was used to verify the predictions of in vivo changes. The muscle mRNA transcriptome is invariant with respect to insulin or glucose homeostasis. In contrast, a third of miRNAs detected in muscle were altered in disease (n = 62), many changing prior to the onset of clinical diabetes. The novel ranking metric identified six canonical pathways with proven links to metabolic disease while the control data demonstrated no enrichment. The Benjamini-Hochberg adjusted Gene Ontology profile of the highest ranked targets was metabolic (P < 7.4 × 10-8), post-translational modification (P < 9.7 × 10-5) and developmental (P < 1.3 × 10-6) processes. Protein profiling of six development-related genes validated the predictions. Brain-derived neurotrophic factor protein was detectable only in muscle satellite cells and was increased in diabetes patients compared with controls, consistent with the observation that global miRNA changes were opposite from those found during myogenic differentiation. We provide evidence that IR in humans may be related to coordinated changes in multiple microRNAs, which act to target relevant signaling pathways. It would appear that miRNAs can produce marked changes in target protein abundance in vivo by working in a combinatorial manner. Thus, miRNA detection represents a new molecular biomarker strategy for insulin resistance, where micrograms of patient material is needed to monitor efficacy during drug or life-style interventions

    Deep Causal Learning for Robotic Intelligence

    Full text link
    This invited review discusses causal learning in the context of robotic intelligence. The paper introduced the psychological findings on causal learning in human cognition, then it introduced the traditional statistical solutions on causal discovery and causal inference. The paper reviewed recent deep causal learning algorithms with a focus on their architectures and the benefits of using deep nets and discussed the gap between deep causal learning and the needs of robotic intelligence

    Achieving Causal Fairness in Recommendation

    Get PDF
    Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. ‱ We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; ‱ We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; ‱ We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; ‱ We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort

    Achieving Causal Fairness in Recommendation

    Get PDF
    Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. ‱ We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; ‱ We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; ‱ We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; ‱ We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort

    A Survey of Methods, Challenges and Perspectives in Causality

    Full text link
    Deep Learning models have shown success in a large variety of tasks by extracting correlation patterns from high-dimensional data but still struggle when generalizing out of their initial distribution. As causal engines aim to learn mechanisms independent from a data distribution, combining Deep Learning with Causality can have a great impact on the two fields. In this paper, we further motivate this assumption. We perform an extensive overview of the theories and methods for Causality from different perspectives, with an emphasis on Deep Learning and the challenges met by the two domains. We show early attempts to bring the fields together and the possible perspectives for the future. We finish by providing a large variety of applications for techniques from Causality.Comment: 40 pages, 37 pages for the main paper and 3 pages for the supplement, 8 figures, submitted to ACM Computing Survey
    • 

    corecore