186 research outputs found

    Linear algebra and multivariate analysis in statistics: development and interconnections in the twentieth century

    Get PDF
    The most obvious points of contact between linear and matrix algebra and statistics are in the area of multivariate analysis. We review the way that, as both developed during the last century, the two influenced each other by examining a number of key areas. We begin with matrix and linear algebra, its emergence in the nineteenth century, and its eventual penetration into the undergraduate curriculum in the twentieth century. We continue with a similar account for multivariate analysis in statistics. We pick out the year 1936 for three key developments, and the early post-war period for three more. We then turn to some special results in linear algebra that we need. We briefly discuss four of the main contributors, and close with thirteen ‘case studies’, showing in a range of specific cases how these general algebraic methods have been put to good use and changed the face of statistics

    VPN: Learning Video-Pose Embedding for Activities of Daily Living

    Get PDF
    In this paper, we focus on the spatio-temporal aspect of recognizing Activities of Daily Living (ADL). ADL have two specific properties (i) subtle spatio-temporal patterns and (ii) similar visual patterns varying with time. Therefore, ADL may look very similar and often necessitate to look at their fine-grained details to distinguish them. Because the recent spatio-temporal 3D ConvNets are too rigid to capture the subtle visual patterns across an action, we propose a novel Video-Pose Network: VPN. The 2 key components of this VPN are a spatial embedding and an attention network. The spatial embedding projects the 3D poses and RGB cues in a common semantic space. This enables the action recognition framework to learn better spatio-temporal features exploiting both modalities. In order to discriminate similar actions, the attention network provides two functionalities - (i) an end-to-end learnable pose backbone exploiting the topology of human body, and (ii) a coupler to provide joint spatio-temporal attention weights across a video. Experiments show that VPN outperforms the state-of-the-art results for action classification on a large scale human activity dataset: NTU-RGB+D 120, its subset NTU-RGB+D 60, a real-world challenging human activity dataset: Toyota Smarthome and a small scale human-object interaction dataset Northwestern UCLA.Comment: Accepted in ECCV 202

    Recipes for sparse LDA of horizontal data

    Get PDF
    Many important modern applications require analyzing data with more variables than observations, called for short horizontal. In such situation the classical Fisher’s linear discriminant analysis (LDA) does not possess solution because the within-group scatter matrix is singular. Moreover, the number of the variables is usually huge and the classical type of solutions (discriminant functions) are difficult to interpret as they involve all available variables. Nowadays, the aim is to develop fast and reliable algorithms for sparse LDA of horizontal data. The resulting discriminant functions depend on very few original variables, which facilitates their interpretation. The main theoretical and numerical challenge is how to cope with the singularity of the within-group scatter matrix. This work aims at classifying the existing approaches according to the way they tackle this singularity issue, and suggest new ones

    Modelling the neonatal system: A joint analysis of length of stay and patient pathways

    Get PDF
    © 2019 John Wiley & Sons, Ltd. This is the peer reviewed version of the following article: Modelling the neonatal system: A joint analysis of length of stay and patient pathways, which has been published on 27/11/2019 in final form at https://doi.org/10.1002/hpm.2928. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.In the United Kingdom, one in seven babies require specialist neonatal care after birth, with a noticeable increase in demand. Coupled with budgeting constraints and lack of investment means that neonatal units are struggling. This will inevitably have an impact on baby's length of stay (LoS) and the performance of the service. Models have previously been developed to capture individual babies' pathways to investigate the longitudinal cycle of care. However, no models have been developed to examine the joint analysis of LoS and babies' pathways. LoS at each stage of care is a critical driver of both the clinical outcomes and economic performance of the neonatal system. Using the generalized linear mixed modelling approach, extended to accommodate multiple outcomes, the association between neonate's pathway to discharge and LoS is examined. Using the data about 1002 neonates, we noticed that there is a high positive association between baby's pathway and total LoS, suggesting that discharge policies needs to be looked at more carefully. A novel statistical approach that examined the association of key outcomes and how it evolved over time is developed. Its applicability can be extended to other types of long-term care or diseases, such as heart failure and stroke.Peer reviewedFinal Accepted Versio

    NMJ-morph reveals principal components of synaptic morphology influencing structure–function relationships at the neuromuscular junction

    Get PDF
    The ability to form synapses is one of the fundamental properties required by the mammalian nervous system to generate network connectivity. Structural and functional diversity among synaptic populations is a key hallmark of network diversity, and yet we know comparatively little about the morphological principles that govern variability in the size, shape and strength of synapses. Using the mouse neuromuscular junction (NMJ) as an experimentally accessible model synapse, we report on the development of a robust, standardized methodology to facilitate comparative morphometric analysis of synapses (‘NMJ-morph’). We used NMJ-morph to generate baseline morphological reference data for 21 separate pre- and post-synaptic variables from 2160 individual NMJs belonging to nine anatomically distinct populations of synapses, revealing systematic differences in NMJ morphology between defined synaptic populations. Principal components analysis revealed that overall NMJ size and the degree of synaptic fragmentation, alongside pre-synaptic axon diameter, were the most critical parameters in defining synaptic morphology. ‘Average’ synaptic morphology was remarkably conserved between comparable synapses from the left and right sides of the body. Systematic differences in synaptic morphology predicted corresponding differences in synaptic function that were supported by physiological recordings, confirming the robust relationship between synaptic size and strength

    Roc632: An overview

    Get PDF
    The present paper aims to analyze and explore the ROC632 package, specifying its main characteristics and functions. More specifically, the goal of this study is the evaluation of the effectiveness of the package and its strengths and weaknesses. This package was created in order to overcome the lack of information concerning incomplete time-to-event data, adapting the 0.632+ bootstrap estimator for the evaluation of time dependent ROC curves. By applying this package to a specific dataset (DLBCLpatients), it becomes possible to assess tangible data, determining if it is able to analyze complete and incomplete data efficiently and without bias.(undefined)info:eu-repo/semantics/publishedVersio

    Objective surface evaluation of fiber reinforced polymer composites

    Full text link
    The mechanical properties of advanced composites are essential for their structural performance, but the surface finish on exterior composite panels is of critical importance for customer satisfaction. This paper describes the application of wavelet texture analysis (WTA) to the task of automatically classifying the surface finish properties of two fiber reinforced polymer (FRP) composite construction types (clear resin and gel-coat) into three quality grades. Samples were imaged and wavelet multi-scale decomposition was used to create a visual texture representation of the sample, capturing image features at different scales and orientations. Principal components analysis was used to reduce the dimensionality of the texture feature vector, permitting successful classification of the samples using only the first principal component. This work extends and further validates the feasibility of this approach as the basis for automated non-contact classification of composite surface finish using image analysis.<br /

    Fast, automated measurement of nematode swimming (thrashing) without morphometry

    Get PDF
    Background: The "thrashing assay", in which nematodes are placed in liquid and the frequency of lateral swimming ("thrashing") movements estimated, is a well-established method for measuring motility in the genetic model organism Caenorhabditis elegans as well as in parasitic nematodes. It is used as an index of the effects of drugs, chemicals or mutations on motility and has proved useful in identifying mutants affecting behaviour. However, the method is laborious, subject to experimenter error, and therefore does not permit high-throughput applications. Existing automation methods usually involve analysis of worm shape, but this is computationally demanding and error-prone. Here we present a novel, robust and rapid method of automatically counting the thrashing frequency of worms that avoids morphometry but nonetheless gives a direct measure of thrashing frequency. Our method uses principal components analysis to remove the background, followed by computation of a covariance matrix of the remaining image frames from which the interval between statistically-similar frames is estimated. Results: We tested the performance of our covariance method in measuring thrashing rates of worms using mutations that affect motility and found that it accurately substituted for laborious, manual measurements over a wide range of thrashing rates. The algorithm used also enabled us to determine a dose-dependent inhibition of thrashing frequency by the anthelmintic drug, levamisole, illustrating the suitability of the system for assaying the effects of drugs and chemicals on motility. Furthermore, the algorithm successfully measured the actions of levamisole on a parasitic nematode, Haemonchus contortus, which undergoes complex contorted shapes whilst swimming, without alterations in the code or of any parameters, indicating that it is applicable to different nematode species, including parasitic nematodes. Our method is capable of analyzing a 30 s movie in less than 30 s and can therefore be deployed in rapid screens. Conclusion: We demonstrate that a covariance-based method yields a fast, reliable, automated measurement of C. elegans motility which can replace the far more time-consuming, manual method. The absence of a morphometry step means that the method can be applied to any nematode that swims in liquid and, together with its speed, this simplicity lends itself to deployment in large-scale chemical and genetic screens. </p

    A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery – Part II: an illustrative example

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example.</p> <p>Methods</p> <p>Eight models were developed: Bayes linear and quadratic models, <it>k</it>-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively.</p> <p>Results</p> <p>Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and <it>k</it>-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, <it>k</it>-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results.</p> <p>Conclusion</p> <p>Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.</p

    The projection score - an evaluation criterion for variable subset selection in PCA visualization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization.</p> <p>Results</p> <p>We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA.</p> <p>Conclusions</p> <p>We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.</p
    corecore