1,239 research outputs found

    A Tree-based Federated Learning Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources

    Full text link
    Federated learning is an appealing framework for analyzing sensitive data from distributed health data networks due to its protection of data privacy. Under this framework, data partners at local sites collaboratively build an analytical model under the orchestration of a coordinating site, while keeping the data decentralized. However, existing federated learning methods mainly assume data across sites are homogeneous samples of the global population, hence failing to properly account for the extra variability across sites in estimation and inference. Drawing on a multi-hospital electronic health records network, we develop an efficient and interpretable tree-based ensemble of personalized treatment effect estimators to join results across hospital sites, while actively modeling for the heterogeneity in data sources through site partitioning. The efficiency of our method is demonstrated by a study of causal effects of oxygen saturation on hospital mortality and backed up by comprehensive numerical results

    Comparative Biology of Three Species of Costa Rican Haeterini

    Get PDF
    Documenting life history characteristics of populations, especially of herbivorous insects such as butterflies, is fundamental to the ecological study of tropical rainforests. However, we know relatively little about tropical forest butterflies. Here, I combine information gathered using the mark-release-recapture (MRR) approach with manipulative and observational experiments in a natural environment to explore aspects of the population biology of three closely-related species of Costa Rican fruit-feeding understory butterflies (Cithaerias pireta, Dulcedo polita, and Pierella helvina), specifically: vertical stratification, attraction to and persistence in fruit-baited traps, relative abundance and distribution, movement patterns, probabilities of recapture and daily survival, and factors that affect those probabilities. Among the three focal species there were differences in capturability, recapturability, spatial distribution, and degree of vertical stratification. Males appear to fly within smaller home ranges than females, and P. helvina can traverse the entire forest reserve in a single day. These findings have implications for the genetic diversity of these populations and for the risk of local extinction in the face of changing ecological conditions

    3D Remote Sensing Applications in Forest Ecology: Composition, Structure and Function

    Get PDF
    Dear Colleagues, The composition, structure and function of forest ecosystems are the key features characterizing their ecological properties, and can thus be crucially shaped and changed by various biotic and abiotic factors on multiple spatial scales. The magnitude and extent of these changes in recent decades calls for enhanced mitigation and adaption measures. Remote sensing data and methods are the main complementary sources of up-to-date synoptic and objective information of forest ecology. Due to the inherent 3D nature of forest ecosystems, the analysis of 3D sources of remote sensing data is considered to be most appropriate for recreating the forest’s compositional, structural and functional dynamics. In this Special Issue of Forests, we published a set of state-of-the-art scientific works including experimental studies, methodological developments and model validations, all dealing with the general topic of 3D remote sensing-assisted applications in forest ecology. We showed applications in forest ecology from a broad collection of method and sensor combinations, including fusion schemes. All in all, the studies and their focuses are as broad as a forest’s ecology or the field of remote sensing and, thus, reflect the very diverse usages and directions toward which future research and practice will be directed

    Control and surveillance of partially observed stochastic epidemics in a Bayesian framework

    Get PDF
    This thesis comprises a number of inter-related parts. For most of the thesis we are concerned with developing a new statistical technique that can enable the identi cation of the optimal control by comparing competing control strategies for stochastic epidemic models in real time. In the second part, we develop a novel approach for modelling the spread of Peste des Petits Ruminants (PPR) virus within a given country and the risk of introduction to other countries. The control of highly infectious diseases of agriculture crops, animal and human diseases is considered as one of the key challenges in epidemiological and ecological modelling. Previous methods for analysis of epidemics, in which different controls are compared, do not make full use of the trajectory of the epidemic. Most methods use the information provided by the model parameters which may consider partial information on the epidemic trajectory, so for example the same control strategy may lead to different outcomes when the experiment is repeated. Also, by using partial information it is observed that it might need more simulated realisations when comparing two different controls. We introduce a statistical technique that makes full use of the available information in estimating the effect of competing control strategies on real-time epidemic outbreaks. The key to this approach lies in identifying a suitable mechanism to couple epidemics, which could be unaffected by controls. To that end, we use the Sellke construction as a latent process to link epidemics with different control strategies. The method is initially applied on non-spatial processes including SIR and SIS models assuming that there are no observation data available before moving on to more complex models that explicitly represent the spatial nature of the epidemic spread. In the latter case, the analysis is conditioned on some observed data and inference on the model parameters is performed in Bayesian framework using the Markov Chain Monte Carlo (MCMC) techniques coupled with the data augmentation methods. The methodology is applied on various simulated data sets and to citrus canker data from Florida. Results suggest that the approach leads to highly positively correlated outcomes of different controls, thus reducing the variability between the effect of different control strategies, hence providing a more efficient estimator of their expected differences. Therefore, a reduction of the number of realisations required to compare competing strategies in term of their expected outcomes is obtained. The main purpose of the final part of this thesis is to develop a novel approach to modelling the speed of Pest des Petits Ruminants (PPR) within a given country and to understand the risk of subsequent spread to other countries. We are interested in constructing models that can be fitted using information on the occurrence of outbreaks as the information on the susceptible population is not available, and use these models to estimate the speed of spatial spread of the virus. However, there was little prior modelling on which the models developed here could be built. We start by first establishing a spatio-temporal stochastic formulation for the spread of PPR. This modelling is then used to estimate spatial transmission and speed of spread. To account for uncertainty on the lack of information on the susceptible population, we apply ideas from Bayesian modelling and data augmentation by treating the transmission network as a missing quantity. Lastly, we establish a network model to address questions regarding the risk of spread in the large-scale network of countries and introduce the notion of ` first-passage time' using techniques from graph theory and operational research such as the Bellman-Ford algorithm. The methodology is first applied to PPR data from Tunisia and on simulated data. We also use simulated models to investigate the dynamics of spread through a network of countries

    Efficient similarity search in high-dimensional data spaces

    Get PDF
    Similarity search in high-dimensional data spaces is a popular paradigm for many modern database applications, such as content based image retrieval, time series analysis in financial and marketing databases, and data mining. Objects are represented as high-dimensional points or vectors based on their important features. Object similarity is then measured by the distance between feature vectors and similarity search is implemented via range queries or k-Nearest Neighbor (k-NN) queries. Implementing k-NN queries via a sequential scan of large tables of feature vectors is computationally expensive. Building multi-dimensional indexes on the feature vectors for k-NN search also tends to be unsatisfactory when the dimensionality is high. This is due to the poor index performance caused by the dimensionality curse. Dimensionality reduction using the Singular Value Decomposition method is the approach adopted in this study to deal with high-dimensional data. Noting that for many real-world datasets, data distribution tends to be heterogeneous, dimensionality reduction on the entire dataset may cause a significant loss of information. More efficient representation is sought by clustering the data into homogeneous subsets of points, and applying dimensionality reduction to each cluster respectively, i.e., utilizing local rather than global dimensionality reduction. The thesis deals with the improvement of the efficiency of query processing associated with local dimensionality reduction methods, such as the Clustering and Singular Value Decomposition (CSVD) and the Local Dimensionality Reduction (LDR) methods. Variations in the implementation of CSVD are considered and the two methods are compared from the viewpoint of the compression ratio, CPU time, and retrieval efficiency. An exact k-NN algorithm is presented for local dimensionality reduction methods by extending an existing multi-step k-NN search algorithm, which is designed for global dimensionality reduction. Experimental results show that the new method requires less CPU time than the approximate method proposed original for CSVD at a comparable level of accuracy. Optimal subspace dimensionality reduction has the intent of minimizing total query cost. The problem is complicated in that each cluster can retain a different number of dimensions. A hybrid method is presented, combining the best features of the CSVD and LDR methods, to find optimal subspace dimensionalities for clusters generated by local dimensionality reduction methods. The experiments show that the proposed method works well for both real-world datasets and synthetic datasets
    • …
    corecore