154 research outputs found

    Photometric redshift estimation based on data mining with PhotoRApToR

    Get PDF
    Photometric redshifts (photo-z) are crucial to the scientific exploitation of modern panchromatic digital surveys. In this paper we present PhotoRApToR (Photometric Research Application To Redshift): a Java/C++ based desktop application capable to solve non-linear regression and multi-variate classification problems, in particular specialized for photo-z estimation. It embeds a machine learning algorithm, namely a multilayer neural network trained by the Quasi Newton learning rule, and special tools dedicated to pre- and postprocessing data. PhotoRApToR has been successfully tested on several scientific cases. The application is available for free download from the DAME Program web site.Comment: To appear on Experimental Astronomy, Springer, 20 pages, 15 figure

    To trust or not to trust an explanation: using LEAF to evaluate local linear XAI methods

    Get PDF
    The main objective of eXplainable Artificial Intelligence (XAI) is to provide effective explanations for black-box classifiers. The existing literature lists many desirable properties for explanations to be useful, but there is no consensus on how to quantitatively evaluate explanations in practice. Moreover, explanations are typically used only to inspect black-box models, and the proactive use of explanations as a decision support is generally overlooked. Among the many approaches to XAI, a widely adopted paradigm is Local Linear Explanations - with LIME and SHAP emerging as state-of-the-art methods. We show that these methods are plagued by many defects including unstable explanations, divergence of actual implementations from the promised theoretical properties, and explanations for the wrong label. This highlights the need to have standard and unbiased evaluation procedures for Local Linear Explanations in the XAI field. In this paper we address the problem of identifying a clear and unambiguous set of metrics for the evaluation of Local Linear Explanations. This set includes both existing and novel metrics defined specifically for this class of explanations. All metrics have been included in an open Python framework, named LEAF. The purpose of LEAF is to provide a reference for end users to evaluate explanations in a standardised and unbiased way, and to guide researchers towards developing improved explainable techniques.Comment: 16 pages, 8 figure

    In-silico Models for Capturing the Static and Dynamic Characteristics of Robustness within Complex Networks

    Get PDF
    Understanding the role of structural patterns within complex networks is essential to establish the governing principles of such networks. Social networks, biological networks, technological networks etc. can be considered as complex networks where information processing and transport plays a central role. Complexity in these net works can be due to abstraction, scale, functionality and structure. Depending on the abstraction each of these can be categorized further. Gene regulatory networks are one such category of biological networks. Gene regulatory networks (GRNs) are assumed to be robust under internal and external perturbations. Network motifs such as feed-forward loop motif and bifan motif are believed to play a central role functionally in retaining GRN behavior under lossy conditions. While the role of static characteristics like average shortest path, density, degree centrality among other topological features is well documented by the research community, the structural role of motifs and their dynamic characteristics are not xiii well understood. Wireless sensor networks in the last decade were intensively studied using network simulators. Can we use in-silico experiments to understand biological network topologies better? Does the structure of these motifs have any role to play in ensuring robust information transport in such networks? How do their static and dynamic roles diïŹ€er? To understand these questions, we use in-silico network models to capture the dynamic characteristics of complex network topologies. Developing these models involve network mapping, sink selection strategies and identifying metrics to capture robust system behavior. Further, I studied the dynamic aspect of network characteristics using variation in network information ïŹ‚ow under perturbations deïŹned by lossy conditions and channel capacity. We use machine learning techniques to identify significant features that contribute to robust network performance. Our work demonstrates that although the structural role of feed-forward loop motif in signal transduction within GRNs is minimal, these motifs stand out under heavy perturbations

    A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response

    Get PDF
    Functional genomics efforts face tradeoffs between number of perturbations examined and complexity of phenotypes measured. We bridge this gap with Perturb-seq, which combines droplet-based single-cell RNA-seq with a strategy for barcoding CRISPR-mediated perturbations, allowing many perturbations to be profiled in pooled format. We applied Perturb-seq to dissect the mammalian unfolded protein response (UPR) using single and combinatorial CRISPR perturbations. Two genome-scale CRISPR interference (CRISPRi) screens identified genes whose repression perturbs ER homeostasis. Subjecting ∌100 hits to Perturb-seq enabled high-precision functional clustering of genes. Single-cell analyses decoupled the three UPR branches, revealed bifurcated UPR branch activation among cells subject to the same perturbation, and uncovered differential activation of the branches across hits, including an isolated feedback loop between the translocon and IRE1α. These studies provide insight into how the three sensors of ER homeostasis monitor distinct types of stress and highlight the ability of Perturb-seq to dissect complex cellular responses.National Human Genome Research Institute (U.S.) (Grant P50HG006193

    Decoding Complexity in Metabolic Networks using Integrated Mechanistic and Machine Learning Approaches

    Get PDF
    How can we get living cells to do what we want? What do they actually ‘want’? What ‘rules’ do they observe? How can we better understand and manipulate them? Answers to fundamental research questions like these are critical to overcoming bottlenecks in metabolic engineering and optimizing heterologous pathways for synthetic biology applications. Unfortunately, biological systems are too complex to be completely described by physicochemical modeling alone. In this research, I developed and applied integrated mechanistic and data-driven frameworks to help uncover the mysteries of cellular regulation and control. These tools provide a computational framework for seeking answers to pertinent biological questions. Four major tasks were accomplished. First, I developed innovative tools for key areas in the genome-to-phenome mapping pipeline. An efficient gap filling algorithm (called BoostGAPFILL) that integrates mechanistic and machine learning techniques was developed for the refinement of genome-scale metabolic network reconstructions. Genome-scale metabolic network reconstructions are finding ever increasing applications in metabolic engineering for industrial, medical and environmental purposes. Second, I designed a thermodynamics-based framework (called REMEP) for mutant phenotype prediction (integrating metabolomics, fluxomics and thermodynamics data). These tools will go a long way in improving the fidelity of model predictions of microbial cell factories. Third, I designed a data-driven framework for characterizing and predicting the effectiveness of metabolic engineering strategies. This involved building a knowledgebase of historical microbial cell factory performance from published literature. Advanced machine learning concepts, such as ensemble learning and data augmentation, were employed in combination with standard mechanistic models to develop a predictive platform for important industrial biotechnology metrics such as yield, titer, and productivity. Fourth, my modeling tools and skills have been used for case studies on fungal lipid metabolism analyses, E. coli resource allocation balances, reconstruction of the genome-scale metabolic network for a non-model species, R. opacus, as well as the rapid prediction of bacterial heterotrophic fluxomics. In the long run, this integrated modeling approach will significantly shorten the “design-build-test-learn” cycle of metabolic engineering, as well as provide a platform for biological discovery

    Augmented sparse principal component analysis for high dimensional data

    Full text link
    We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish lower bounds on the rates of convergence of the estimators of the leading eigenvectors under lql^q-sparsity constraints when an l2l^2 loss function is used. We also propose an estimator of the leading eigenvectors based on a coordinate selection scheme combined with PCA and show that the proposed estimator achieves the optimal rate of convergence under a sparsity regime. Moreover, we establish that under certain scenarios, the usual PCA achieves the minimax convergence rate.Comment: This manuscript was written in 2007, and a version has been available on the first author's website, but it is posted to arXiv now in its 2007 form. Revisions incorporating later work will be posted separatel

    Complexity in Developmental Systems: Toward an Integrated Understanding of Organ Formation

    Get PDF
    During animal development, embryonic cells assemble into intricately structured organs by working together in organized groups capable of implementing tightly coordinated collective behaviors, including patterning, morphogenesis and migration. Although many of the molecular components and basic mechanisms underlying such collective phenomena are known, the complexity emerging from their interplay still represents a major challenge for developmental biology. Here, we first clarify the nature of this challenge and outline three key strategies for addressing it: precision perturbation, synthetic developmental biology, and data-driven inference. We then present the results of our effort to develop a set of tools rooted in two of these strategies and to apply them to uncover new mechanisms and principles underlying the coordination of collective cell behaviors during organogenesis, using the zebrafish posterior lateral line primordium as a model system. To enable precision perturbation of migration and morphogenesis, we sought to adapt optogenetic tools to control chemokine and actin signaling. This endeavor proved far from trivial and we were ultimately unable to derive functional optogenetic constructs. However, our work toward this goal led to a useful new way of perturbing cortical contractility, which in turn revealed a potential role for cell surface tension in lateral line organogenesis. Independently, we hypothesized that the lateral line primordium might employ plithotaxis to coordinate organ formation with collective migration. We tested this hypothesis using a novel optical tool that allows targeted arrest of cell migration, finding that contrary to previous assumptions plithotaxis does not substantially contribute to primordium guidance. Finally, we developed a computational framework for automated single-cell segmentation, latent feature extraction and quantitative analysis of cellular architecture. We identified the key factors defining shape heterogeneity across primordium cells and went on to use this shape space as a reference for mapping the results of multiple experiments into a quantitative atlas of primordium cell architecture. We also propose a number of data-driven approaches to help bridge the gap from big data to mechanistic models. Overall, this study presents several conceptual and methodological advances toward an integrated understanding of complex multi-cellular systems

    SPLIT DECISIONS: PRACTICAL MACHINE LEARNING FOR EMPIRICAL LEGAL SCHOLARSHIP

    Get PDF
    Multivariable regression may be the most prevalent and useful task in social science. Empirical legal studies rely heavily on the ordinary least squares method. Conventional regression methods have attained credibility in court, but by no means do they dictate legal outcomes. Using the iconic Boston housing study as a source of price data, this Article introduces machine-learning regression methods. Although decision trees and forest ensembles lack the overt interpretability of linear regression, these methods reduce the opacity of black-box techniques by scoring the relative importance of dataset features. This Article will also address the theoretical tradeoff between bias and variance, as well as the importance of training, cross-validation, and reserving a holdout dataset for testing
    • 

    corecore