Search CORE

18 research outputs found

ALGORITHMS FOR CORRECTING NEXT GENERATION SEQUENCING ERRORS

Author: Fazayeli Farideh
Publication venue: Scholarship@Western
Publication date: 01/01/2011
Field of study

The advent of next generation sequencing technologies (NGS) generated a revolution in biological research. However, in order to use the data they produce, new computational tools are needed. Due to significantly shorter length of the reads and higher per-base error rate, more complicated approaches are employed and still critical problems, such as genome assembly, are not satisfactorily solved. We therefore focus our attention on improving the quality of the NGS data. More precisely, we address the error correction issue. The current methods for correcting errors are not very accurate. In addition, they do not adapt to the data. We proposed a novel tool, HiTEC, to correct errors in NGS data. HiTEC is based on the suffix array data structure accompanied by a statistical analysis. HiTEC’s accuracy is significantly higher than all previous methods. In addition, it is the only tool with the ability of adjusting to the given data set. In addition, HiTEC is time and space efficient

Scholarship@Western

Estimation with Norm Regularization

Author: Banerjee Arindam
Chen Sheng
Fazayeli Farideh
Sivakumar Vidyashankar
Publication venue
Publication date: 30/11/2015
Field of study

Analysis of non-asymptotic estimation error and structured statistical recovery based on norm regularized regression, such as Lasso, needs to consider four aspects: the norm, the loss function, the design matrix, and the noise model. This paper presents generalizations of such estimation error analysis on all four aspects compared to the existing literature. We characterize the restricted error set where the estimation error vector lies, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to any norm. Precise characterizations of the bound is presented for isotropic as well as anisotropic subGaussian design matrices, subGaussian noise models, and convex loss functions, including least squares and generalized linear models. Generic chaining and associated results play an important role in the analysis. A key result from the analysis is that the sample complexity of all such estimators depends on the Gaussian width of a spherical cap corresponding to the restricted error set. Further, once the number of samples

n

crosses the required sample complexity, the estimation error decreases as

\frac{c}{\sqrt{n}}

, where

c

depends on the Gaussian width of the unit norm ball.Comment: Fixed technical issues. Generalized some result

arXiv.org e-Print Archive

CiteSeerX

Probabilistic Structured Models for Plant Trait Analysis

Author: Fazayeli Farideh
Publication venue
Publication date: 01/03/2017
Field of study

University of Minnesota Ph.D. dissertation. March 2017. Major: Communication Sciences and Disorders. Advisor: Arindam Banerjee. 1 computer file (PDF); xii, 171 pages.Many fields in modern science and engineering such as ecology, computational biology, astronomy, signal processing, climate science, brain imaging, natural language processing, and many more involve collecting data sets in which the dimensionality of the data p exceeds the sample size n. Since it is usually impossible to obtain consistent procedures unless p < n, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse structured graphical models, low-rank matrices, and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. Of particular interest are structure learning of graphical models in high dimensional setting. The majority of statistical analysis of graphical model estimations assume that all the data are fully observed and the data points are sampled from the same distribution and provide the sample complexity and convergence rate by considering only one graphical structure for all the observations. In this thesis, we extend the above results to estimate the structure of graphical models where the data is partially observed or the data is sampled from multiple distributions. First, we consider the problem of estimating change in the dependency structure of two p-dimensional models, based on samples drawn from two graphical models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual graphical models. Next, we consider the problem of estimating sparse structure of Gaussian copula distributions (corresponding to non-paranormal distributions) using samples with missing values. We prove that our proposed estimators consistently estimate the non-paranormal correlation matrix where the convergence rate depends on the probability of missing values. In the second part of thesis, we consider matrix completion problem. Low-rank matrix completion methods have been successful in a variety of settings such as recommendation systems. However, most of the existing matrix completion methods only provide a point estimate of missing entries, and do not characterize uncertainties of the predictions. First, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization, when marginalized over one latent factor has the Matrix Generalized Inverse Gaussian (MGIG) distribution. We show that the MGIG is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation equation. The characterization leads to a novel Collapsed Monte Carlo inference algorithm for such latent factor models. Next, we propose a Bayesian hierarchical probabilistic matrix factorization (BHPMF) model to 1) incorporate hierarchical side information, and 2) provide uncertainty quantified predictions. The former yields significant performance improvements in the problem of plant trait prediction, a key problem in ecology, by leveraging the taxonomic hierarchy in the plant kingdom. The latter is helpful in identifying predictions of low confidence which can in turn be used to guide field work for data collection efforts. Finally, we consider applications of probabilistic structured models to plant trait analysis. We apply BHPMF model to fill the gaps in TRY database. The BHPMF model is the-state-of-the-art model for plant trait prediction and is getting increasing visibility and usage in the plant trait analysis. We have submitted a R package for BHPMF to CRAN. Next, we apply the Gaussian graphical model structure estimators to obtain the trait-trait interactions. We study the trait-trait interactions structure at different climate zones and among different plant growth forms and uncover the dependence of traits on climate and on vegetation

University of Minnesota Digital Conservancy

Mapping local and global variability in plant trait distributions

Author: Abhirup Datta
Andrés González-Melo
Arindam Banerjee
Bahar
Ben Bond-Lamberty
Benjamin Blonder
Bernard Amiaud
Bonan
Brandon Schamp
Bruno E. L. Cerabolini
Chaeho Byun
Christian Wirth
Clark
Crous
Daniel C. Laughlin
Datta
DeFries
Douma
Dylan Craven
Enio Sosinski
Estelle Forey
Ethan E. Butler
Farideh Fazayeli
Fernando Valladares
Franciska T. de Vries
Gerhard Boenisch
Giandiego Campetella
Gross
Habacuc Flores-Moreno
Hiroko Kurokawa
Jens Kattge
Johannes H. C. Cornelissen
Josep Peñuelas
Joseph M. Craine
Kattge
Kattge
Kerry A. Brown
Kirk R. Wythers
Koen Kramer
Lawren Sack
Lawrence
Marko J. Spasojevic
Mathew Williams
Medlyn
Meir
Ming Chen
Nadejda A. Soudzilovskaia
Nathan J. B. Kraft
Nicolas Gross
Ordoñez
Owen K. Atkin
Pappas
Patrick Meir
Pavlick
Peter B. Reich
Peter E. Thornton
Peter M. van Bodegom
Quentin Read
Reich
Reich
Rowland
Sandra Díaz
Scheiter
Simpson
Steven Jansen
Swenson
Swenson
Thomas Hickler
Tomas F. Domingues
Vanessa Minden
Verheijen
Verheijen
Wenxuan Han
Wesley N. Hattingh
Yusuke Onoda
Ülo Niinemets
Šímová
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2017
Field of study

VU Research Portal

Archivio istituzionale della ricerca - Università dell'Insubria

HAL Descartes

Wageningen University & Research Publications

Edinburgh Research Explorer

Archivio istituzionale della ricerca - Università di Camerino

Leiden University Scholary Publications

The University of Manchester - Institutional Repository

HAL - Normandie Université

Crossref

CONICET Digital

eScholarship - University of California

edocUR

Kingston University Research Repository

Western Sydney ResearchDirect

Robustness of trait connections across environmental gradients and growth forms

Author: Anand Madhur
Atkin Owen
Bahn Michael
Banerjee Arindam
Butler Ethan E
Chen Ming
Datta Abhirup
Fazayeli Farideh
Flores-Moreno Habacuc
Kattge J
Wythers Kirk R
Publication venue: 'Wiley'
Publication date: 28/11/2021
Field of study

Aim Plant trait databases often contain traits that are correlated, but for whom direct (undirected statistical dependency) and indirect (mediated by other traits) connections may be confounded. The confounding of correlation and connection hinders our understanding of plant strategies, and how these vary among growth forms and climate zones. We identified the direct and indirect connections across plant traits relevant to competition, resource acquisition and reproductive strategies using a global database and explored whether connections within and between traits from different tissue types vary across climates and growth forms. Location Global. Major taxa studied Plants. Time period Present. Methods We used probabilistic graphical models and a database of 10 plant traits (leaf area, specific leaf area, mass‐ and area‐based leaf nitrogen and phosphorous content, leaf life span, plant height, stem specific density and seed mass) with 16,281 records to describe direct and indirect connections across woody and non‐woody plants across tropical, temperate, arid, cold and polar regions. Results Trait networks based on direct connections are sparser than those based on correlations. Land plants had high connectivity across traits within and between tissue types; leaf life span and stem specific density shared direct connections with all other traits. For both growth forms, two groups of traits form modules of more highly connected traits; one related to resource acquisition, the other to plant architecture and reproduction. Woody species had higher trait network modularity in polar compared to temperate and tropical climates, while non‐woody species did not show significant differences in modularity across climate regions. Main conclusions Plant traits are highly connected both within and across tissue types, yet traits segregate into persistent modules of traits. Variation in the modularity of trait networks suggests that trait connectivity is shaped by prevailing environmental conditions and demonstrates that plants of different growth forms use alternative strategies to cope with local conditions.National Science Foundation, Grant/Award Number: IIS‐1563950; Advanced Research Projects Agency ‐ Energy, Grant/Award Number: DE‐SL0012677; H2020 European Research Council, Grant/Award Number: ERC‐SyG‐2013‐610028 IMBALANCE‐P; University of Minnesota, Grant/Award Number: CE140100008, 226299, 19‐14‐00038 and 22; Australian Research Council, Grant/Award Number: CE140100008; FP7; European Research Council; Russian Science Foundation, Grant/Award Number: # 19‐14‐0003

The Australian National University

Mapping local and global variability in plant trait distributions

Author: Amiaud Bernard
Atkin Owen
Banerjee Arindam
Blonder Benjamin
Butler Ethan E
Chen Ming
Datta Abhirup
Fazayeli Farideh
Flores-Moreno Habacuc
Kattge J
Meir Patrick
Wythers Kirk R
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 23/11/2020
Field of study

Our ability to understand and predict the response of ecosystems to a changing environment depends on quantifying vegetation functional diversity. However, representing this diversity at the global scale is challenging. Typically, in Earth system models, characterization of plant diversity has been limited to grouping related species into plant functional types (PFTs), with all trait variation in a PFT collapsed into a single mean value that is applied globally. Using the largest global plant trait database and state of the art Bayesian modeling, we created fine-grained global maps of plant trait distributions that can be applied to Earth system models. Focusing on a set of plant traits closely coupled to photosynthesis and foliar respiration - specific leaf area (SLA) and dry mass-based concentrations of leaf nitrogen (Nm) and phosphorus (Pm), we characterize how traits vary within and among over 50,000 ∼50×50-km cells across the entire vegetated land surface. We do this in several ways - without defining the PFT of each grid cell and using 4 or 14 PFTs; each model's predictions are evaluated against out-of-sample data. This endeavor advances prior trait mapping by generating global maps that preserve variability across scales by using modern Bayesian spatial statistical modeling in combination with a database over three times larger than that in previous analyses. Our maps reveal that the most diverse grid cells possess trait variability close to the range of global PFT means.This research was supported as part of the Energy Exascale Earth System Model (E3SM) project, funded by the US Department of Energy, Office of Science, Office of Biological and Environmental Research (Grant DE-SC0012677 to P.B.R. and A.B.). O.K.A. acknowledges the support of the Australian Research Council (CE140100008). This research was also funded by programs from the NSF Long-Term Ecological Research (Grant DEB-1234162) and Long-Term Research in Environmental Biology (Grant DEB-1242531). A.B., F.F., and P.B.R. acknowledge funding from NSF Grant IIS-1563950. P.B.R. also acknowledges support from two University of Minnesota Institute on the Environment discovery grants. This study has been supported by the TRY initiative on plant traits (www.try-db.org). The TRY database is hosted at the Max Planck Institute for Biogeochemistry (Jena, Germany) and supported by DIVERSITAS/Future Earth, the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, and the EU H2020 project BACI (Grant 640176). B.B. acknowledges a Natural Environment Research Council (NERC) independent research fellowship NE/M019160/1. J.P. acknowledges the financial support from the European Research Council Synergy Grant ERC-SyG-2013-610028 IMBALANCE-P, the Spanish Government Grant CGL2013-48074-P, and the Catalan Government Grant SGR 2014-274. B.B.-L. was supported by the Earth System Modeling program of the US Department of Energy, Office of Science, Office of Biological and Environmental Research. K.K. acknowledges the contribution of the Wageningen University and Research Investment theme Resilience for the project Resilient Forest (KB-29-009-003). P.M. acknowledges support from ARC Grant FT110100457 and NERC Grant NE/F002149/1. W.H. acknowledges support from the National Natural Science Foundation of China (Grant 41473068) and the “Light of West China” Program of the Chinese Academy of Sciences

The Australian National University

BHPMF – a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography

Author: Banerjee Arindam
Bönisch Gerhard
Diaz Sandra
Dickie John
Fazayeli Farideh
Gillison Andy
Joswig Julia
Karpatne Anuj
Kattge Jens
Lavorel Sandra
Leadley Paul
Reich Peter B.
Reichstein Markus
Schrodt Franziska
Shan Hanhuai
Wirth Christian B.
Wright Ian J.
Wright S. Joseph
Publication venue: 'Wiley'
Publication date: 03/11/2015
Field of study

Aim: Functional traits of organisms are key to understanding and predicting biodiversity and ecological change, which motivates continuous collection of traits and their integration into global databases. Such trait matrices are inherently sparse, severely limiting their usefulness for further analyses. On the other hand, traits are characterized by the phylogenetic trait signal, trait–trait correlations and environmental constraints, all of which provide information that could be used to statistically fill gaps. We propose the application of probabilistic models which, for the first time, utilize all three characteristics to fill gaps in trait databases and predict trait values at larger spatial scales. Innovation: For this purpose we introduce BHPMF, a ierarchical Bayesian extension of probabilistic matrix actorization (PMF). PMF is a machine learning technique which exploits the correlation structure of sparse matrices to impute missing entries. BHPMF additionally utilizes the taxonomic hierarchy for trait prediction and provides uncertainty estimates for each imputation. In combination with multiple regression against environmental information, BHPMF allows for extrapolation frompoint measurements to larger spatial scales.We demonstrate the applicability of BHPMF in ecological contexts, using different plant functional trait datasets, also comparing results to taking the species mean and PMF. Main conclusions: Sensitivity analyses validate the robustness and accuracy of BHPMF: our method captures the correlation structure of the trait matrix as well as the phylogenetic trait signal – also for extremely sparse trait matrices – and provides a robust measure of confidence in prediction accuracy for each missing entry. The combination of BHPMF with environmental constraints provides a promising concept to extrapolate traits beyond sampled regions, accounting for intraspecific trait variability. We conclude that BHPMF and its derivatives have a high potential to support future trait-based research in macroecology and functional biogeography

Nottingham eTheses