112 research outputs found

    More effort — more results: recent advances in integrative ‘omics’ data analysis

    Get PDF
    The development of ‘omics’ technologies has progressed to address complex biological questions that underlie various plant functions thereby producing copious amounts of data. The need to assimilate large amounts of data into biologically meaningful interpretations has necessitated the development of statistical methods to integrate multidimensional information. Throughout this review, we provide examples of recent outcomes of ‘omics’ data integration together with an overview of available statistical methods and tools

    Clusterwise analysis for multiblock component methods

    Get PDF
    International audienceMultiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multi-block component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem-presented in this article-is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regres-B Stéphanie Bougeard 123 S. Bougeard et al. sion improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion-by means of a sequential algorithm-ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing

    The estimation of Human Capital in structural models with flexible specification

    Get PDF
    The present paper focuses on statistical models for estimating Human Capital (HC) at disaggregated level (worker, household, graduates). The more recent literature on HC as a latent variable states that HC can be reasonably considered a broader multi-dimensional non-observable construct, depending on several and interrelate causes, and indirectly measured by many observed indicators. In this perspective, latent variable models have been assuming a prominent role in the social science literature for the study of the interrelationships among phenomena. However, traditional estimation methods are prone to different limitations, as stringent distributional assumptions, improper solutions, and factor score indeterminacy for Covariance Structure Analysis and the lack of a global optimization procedure for the Partial Least Squares approach. To avoid these limitations, new approaches to structural equation modelling, based on Component Analysis, which estimates latent variables as exact linear combinations of observed variables minimizing a single criterion, were proposed in literature. However, these methods are limited to model particular types of relationship among sets of variables. In this paper, we propose a class of models in such a way that it enables to specify and fit a variety of relationships among latent variables and endogenous indicators. Specifically, we extend this new class of models to allow for covariate effects on the endogenous indicators. Finally, an application aimed to measure, in a realistic structural model, the causal impact of formal Human capital (HC), accumulated during Higher education, on the initial earnings for University of Milan (Italy) graduates is illustrated.

    Partitioning predictors in multivariate regression models

    Get PDF
    A Multivariate Regression Model Based on the Optimal Partition of Predictors (MRBOP) useful in applications in the presence of strongly correlated predictors is presented. Such classes of predictors are synthesized by latent factors, which are obtained through an appropriate linear combination of the original variables and are forced to be weakly correlated. Specifically, the proposed model assumes that the latent factors are determined by subsets of predictors characterizing only one latent factor. MRBOP is formalized in a least squares framework optimizing a penalized quadratic objective function through an alternating least-squares (ALS) algorithm. The performance of the methodology is evaluated on simulated and real data sets. © 2013 Springer Science+Business Media New York

    Why use component-based methods in sensory science?

    Get PDF
    This paper discusses the advantages of using so-called component-based methods in sensory science. For instance, principal component analysis (PCA) and partial least squares (PLS) regression are used widely in the field; we will here discuss these and other methods for handling one block of data, as well as several blocks of data. Component-based methods all share a common feature: they define linear combinations of the variables to achieve data compression, interpretation, and prediction. The common properties of the component-based methods are listed and their advantages illustrated by examples. The paper equips practitioners with a list of solid and concrete arguments for using this methodology.publishedVersio

    Husbandry factors and health conditions influencing the productivity of French rabbit farms

    Full text link
    [EN] In 2009 productivity data from 95 kindling to finishing rabbit farms in France were analysed to identify rearing factors and health conditions that influenced their productivity. Farm productivity, expressed on a yearly basis, was described with 4 productivity indices: doe fertility and prolificacy, viability of young rabbits in the nest and mortality during the fattening period. The productivity data were obtained with the technical support of the farm and expressed in a standardised way. The average numerical productivity observed in the sample of farms was 50.9 rabbits produced per doe and per year (CI95% [49.6-52.2]). The husbandry management and health conditions were described based on a questionnaire filled out during an interview with the farmer and a farm visit. Explanatory data were organised into meaningful blocks relative to biosecurity measures, del using a Partiamaternity management, the sanitary context and the farm structure. The relationship between the 4 thematic blocks and the productivity indices was studied in a single mol Least Squares (PLS) regression model. Fertility (81.0%, CI95% [80.0-82.0]) and viability of young at nest (85.1%, CI95% [85.0-85.3] and mortality rate during fattening: 7.2%, CI95% [6.4-7.9]) were significantly associated with common factors relative to maternity management and the health context whereas prolificacy (9.7 live kits per parturition, CI95% [9.5-9.9]) was mostly influenced by a specific set of variables pertaining to those 2 blocks. Farm structure and biosecurity measures had a limited impact on fertility and on kit viability before weaning. The health conditions of the doe herd and the fattening rabbits were found to be significantly associated with several productivity indexes, but their impacts on productivity were as high as the impact of the other blocks. Genetic strain of the females, doe replacement strategy and nursing and weaning practices appeared to significantly influence reproductive performance, viability of kits before weaning and mortality rate during the fattening period. Maternity management therefore seemed to be the key point in rabbit unit management that governed the numerical productivity of the farm.The authors wish to acknowledge the CLIPP-Lapin de France, the SNGTV (French Veterinary Society - rabbit branch) and the FFC (French Federation of Cuniculture) for their collaboration. We also wish to thank the farmers and the rabbit production organisations who participated in the study. The authors are grateful to Ms. AnaĂŻs Croisier for her participation as an investigator and to Mr. Guillaume Coutelet from the French Institute for Avian Production for his technical expertise. Funding was provided by the French Agency for Veterinary Medicinal Products (ANSES-ANMV)Huneau-SalaĂŒn, A.; Bougeard, S.; Balaine, L.; Eono, F.; Le Bouquin, S.; Chauvin, C. (2015). Husbandry factors and health conditions influencing the productivity of French rabbit farms. World Rabbit Science. 23(1):27-37. https://doi.org/10.4995/wrs.2015.3076SWORD2737231Castellini, C., Dal Bosco, A., Arias-Álvarez, M., Lorenzo, P. L., Cardinali, R., & Rebollar, P. G. (2010). The main factors affecting the reproductive performance of rabbit does: A review. Animal Reproduction Science, 122(3-4), 174-182. doi:10.1016/j.anireprosci.2010.10.003Coutelet G. 2011. Performances moyennes des Ă©levages cunicoles en France pour l'annĂ©e 2010. Cuniculture Magazine, 38: 24-27.EFSA. 2005. Scientific Opinion of the Scientific Panel on Animal Health and Welfare on the impact of the current housing and husbandry systems on the health and welfare of farmed domestic rabbits. EFSA J., 2005: 1-31.Lebas F. 2010. Situation cunicole en France en 2009: performances moyennes des Ă©levages selon les rĂ©sultats du RENACEB pour l'annĂ©e 2009, situation du marchĂ© cunicole français et premiĂšres Ă©valuations pour l'annĂ©e 2010. Cuniculture Magazine, 37: 74-82.Licois D., Coudert P., Marlier D. 2006. Epizootic rabbit enteropathy. In: L. Maertens, P. Coudert (ed). Recent advances in rabbit sciences. Institute for Agricultural and Fisheries Research, Melle, Belgium, 163-170.Marongiu M.L., Dimauro C., Floris B. 2007. A six-year investigation on reproductive performance of hybrid rabbits. 1. Pregnancy rate and numerical productivity at weaning as affected by season. Ital. J. Anim. Sci., 6: 770-772.Rommers J., Maertens L., Kemp B. 2006. New perspectives in rearing systems for rabbit does. In: L. Maertens, P. Coudert (ed). Recent advances in rabbit sciences. Institute for Agricultural and Fisheries Research, Melle, Belgium, 39-51.Serrano P., Pascual M., GĂłmez E.A. 2012. Analysis of management techniques on productivity indicators using the bdcuni Spanish database. In Proc.: 10th World Rabbit Congress, 3-6 September, 2012. Sharm El-Sheikh, Egypt. 1: 803-807

    Analyse supervisée multibloc en grande dimension

    Get PDF
    Statistical learning objective is to learn from observed data in order to predict the response for a new sample. In the context of vaccination, the number of features is higher than the number of individuals. This is a degenerate case of statistical analysis which needs specific tools. The regularization algorithms can deal with those drawbacks. Different types of regularization methods can be used which depends on the data set structure but also upon the question. In this work, the main objective was to use the available information with soft-thresholded empirical covariance matrix estimations through SVD decompositions. This solution is particularly efficient in terms of variable selection and computation time. Heterogeneous typed data sets (coming from different sources and also called multiblock data) were at the core of our methodology. Since some data set generations are expensive, it is common to down sample the population acquiring some types of data. This leads to multi-block missing data patterns. The second objective of our methodology is to deal with those missing values using the response values. But the response values are not present in the test data sets and so we have designed a methodology which permits to consider both the cases of missing values in the train or in the test data sets. Thanks to soft-thresholding, our methodology can regularize and select features. This estimator needs only two parameters to be fixed which are the number of components and the maximum number of features to be selected. The corresponding tuning is performed by cross-validation. According to simulations, the proposed method shows very good results comparing to benchmark methods, especially in terms of prediction and computation time. This method has also been applied to several real data sets associated with vaccine, thomboembolic and food researches.L’apprentissage statistique consiste Ă  apprendre Ă  partir de donnĂ©es mesurĂ©es dans un Ă©chantillon d’individus et cherche Ă  prĂ©dire la grandeur d’intĂ©rĂȘt chez un nouvel individu. Dans le cas de la vaccination, ou dans d’autres cas dont certains prĂ©sentĂ©s dans ce manuscrit, le nombre de variables mesurĂ©es dĂ©passe le nombre d’individus observĂ©s, c’est un cas dĂ©gĂ©nĂ©rĂ© d’analyse statistique qui nĂ©cessite l’utilisation de mĂ©thodes spĂ©cifiques. Les propriĂ©tĂ©s des algorithmes de rĂ©gularisation permettent de gĂ©rer ces cas. Il en existe plusieurs types en fonction de la structure des donnĂ©es considĂ©rĂ©es et du problĂšme qui sont Ă©tudiĂ©s. Dans le cas de ce travail, l’objectif principal a Ă©tĂ© d’utiliser l’information disponible Ă  l’issue de dĂ©compositions en Ă©lĂ©ments propres des matrices de covariances transformĂ©es via un opĂ©rateur de seuillage doux. Cette solution est particuliĂšrement peu coĂ»teuse en termes de temps de calcul et permet la sĂ©lection des variables d’intĂ©rĂȘt. Nous nous sommes centrĂ©s sur les donnĂ©es qualifiĂ©es d’hĂ©tĂ©rogĂšnes, c’est Ă  dire issues de jeux de donnĂ©es qui sont provenant de sources ou de technologies distinctes. On parle aussi de donnĂ©es multiblocs. Les coĂ»ts d’utilisation de certaines technologies pouvant ĂȘtre prohibitifs, il est souvent choisi de ne pas acquĂ©rir certaines donnĂ©es sur l’ensemble d’un Ă©chantillon, mais seulement sur un sous-Ă©chantillon d’étude. Dans ce cas, le jeu de donnĂ©es se retrouve amputĂ© d’une partie non nĂ©gligeable de l’information. La structure des donnĂ©es associĂ©e Ă  ces dĂ©fauts d’acquisition induit une rĂ©partition elle-mĂȘme multibloc de ces donnĂ©es manquantes, on parle alors de donnĂ©es manquantes par blocs. Le second objectif de notre mĂ©thode est de gĂ©rer ces donnĂ©es manquantes par blocs en s’appuyant sur l’information Ă  prĂ©dire, ceci dans le but de crĂ©er un modĂšle prĂ©dictif qui puisse gĂ©rer les donnĂ©es manquantes aussi bien pour les donnĂ©es d’entraĂźnement que pour celles de test. Cette mĂ©thode emprunte au seuillage doux afin de sĂ©lectionner les variables d’intĂ©rĂȘt et ne nĂ©cessite que deux paramĂštres Ă  rĂ©gler qui sont le nombre de composantes et le nombre de variables Ă  sĂ©lectionner parmi les covariables. Ce paramĂ©trage est classiquement rĂ©alisĂ© par validation croisĂ©e. La mĂ©thode dĂ©veloppĂ©e a fait l’objet de simulations la comparant aux principales mĂ©thodes existantes. Elle montre d’excellents rĂ©sultats en prĂ©diction et en termes de temps de calcul. Elle a aussi Ă©tĂ© appliquĂ©e Ă  plusieurs jeux de donnĂ©e

    Trends in the application of chemometrics to foodomics studies

    Full text link

    Temporal variation in benthic macroinvertebrate community from impaired streams

    Full text link
    Tesis Doctoral inĂ©dita leĂ­da en la Universidad AutĂłnoma de Madrid, Facultad de Ciencias, Departamento de EcologĂ­a. Fecha de lectura: 25-03-2019During the last century global population has experienced immense growth leading to huge changes in land use planning to cope with its own sustentation. More in detail, world population has shifted from an agriculture-based economy to an industrial society, which has pushed the population to move from rural to urban areas. The development of urban areas has led to changes in the physical structure of the environment (i.e. water bodies and surrounding area) being responsible for water quality changes by diffuse and point pollution and alterations in hydrological features such as flow magnitude and frequency. As a consequence of the physical and chemical alterations, instream community structure and composition has been altered and, hence, the ecological integrity of rivers has been jeopardized. Despite efforts to restore the natural state and functioning of the river systems there is still a lack of knowledge on three questions that I sought to explain in this dissertation: (i) is the variation of macroinvertebrate community inherent to the impairment of the river or is there a natural fluctuations that guides long-term variation?; (ii) how do rivers respond to restoration activities when biological communities may already be adapted to such impaired conditions?; and (iii) which are the most successful restoration measures at improving the biological condition of the river. To answer these questions I studied impaired river systems in Canada and Italy. Interannual variability of macroinvertebrate community from eight Canadian rivers, representing a gradient of anthropogenic water quality pressures and variable hydrological regimes, were studied over a period of 20 years, focusing on the relationship between water quality, hydrologic variables and sampling features. In Italy the process of restoration of an urban river was followed over a period of 3 years studying the relationship between environmental variables and macroinvertebrate community, focusing on the hydromorphological improvements. Results of the Partial Least Square (PLS) Regressions on data from the long-term study demonstrated that the benthic community assemblage was not driven by any of the measured environmental variables (i.e. water quality, hydrologic variables, sampling features), while at a short-term benthic community responded to water quality and hydrometric features, but did not show significant responses to restoration measures. The temporal stability of the studied benthic communities to variations in environmental and anthropogenic conditions may be reflective of the limited pool of tolerant taxa within these systems.Part of the project was co-financed by Fondazione CARIPLO and Parco Nord Milano and supported by other regional institutions. Funding for thisresearch was provided by the “Contratto di fiume Seveso and “Progetto di riqualificazione del Seveso - sicurezza idraulica, qualitĂ  acque e naturalitĂ  sponde, valorizzazione paesaggistica, culturale e fruitiva” - Fondi FSC: d.g.r. n. X/1727/2014

    Research and Technology Highlights 1995

    Get PDF
    The mission of the NASA Langley Research Center is to increase the knowledge and capability of the United States in a full range of aeronautics disciplines and in selected space disciplines. This mission is accomplished by performing innovative research relevant to national needs and Agency goals, transferring technology to users in a timely manner, and providing development support to other United States Government agencies, industry, other NASA Centers, the educational community, and the local community. This report contains highlights of the major accomplishments and applications that have been made by Langley researchers and by our university and industry colleagues during the past year. The highlights illustrate both the broad range of research and technology (R&T) activities carried out by NASA Langley Research Center and the contributions of this work toward maintaining United States leadership in aeronautics and space research. An electronic version of the report is available at URL http://techreports.larc.nasa.gov/RandT95. This color version allows viewing, retrieving, and printing of the highlights, searching and browsing through the sections, and access to an on-line directory of Langley researchers
    • 

    corecore