2,712 research outputs found
Fleet management in free-floating bike sharing systems using predictive modelling and explorative tools
For redistribution and operating bikes in a free-floating systems, two measures are of highest priority. First, the information about the expected number of rentals on a day is an important measure for service providers for management and service of their fleet. The estimation of the expected number of bookings is carried out with a simple model and a more complex model based on meterological information, as the number of loans depends strongly on the current and forecasted weather. Secondly, the knowledge of a service level violation in future on a fine spatial resolution is important for redistribution of bikes.
With this information, the service provider can set reward zones where service level violations will occur in the near future. To forecast a service level violation on a fine geographical resolution the current distribution of bikes as well as the time and space information of past rentals has to be taken into account. A Markov Chain Model is formulated to integrate this information.
We develop a management tool that describes in an explorative way important information about past, present and predicted future counts on rentals in time and space. It integrates all estimation procedures. The management tool is running in the browser and continuously updates the information and predictions since the bike distribution over the observed area is in continous flow as well as new data are generated continuously
An Object-Oriented Framework for Statistical Simulation: The R Package simFrame
Simulation studies are widely used by statisticians to gain insight into the quality of developed methods. Usually some guidelines regarding, e.g., simulation designs, contamination, missing data models or evaluation criteria are necessary in order to draw meaningful conclusions. The R package simFrame is an object-oriented framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with a minimal effort of programming. Its object-oriented implementation provides clear interfaces for extensions by the user. Since statistical simulation is an embarrassingly parallel process, the framework supports parallel computing to increase computational performance. Furthermore, an appropriate plot method is selected automatically depending on the structure of the simulation results. In this paper, the implementation of simFrame is discussed in great detail and the functionality of the framework is demonstrated in examples for different simulation designs.
An Open Source Approach for Modern Teaching Methods: The Interactive TGUI System
In order to facilitate teaching complex topics in an interactive way, the authors developed a computer-assisted teaching system, a graphical user interface named TGUI (Teaching Graphical User Interface). TGUI was introduced at the beginning of 2009 in the Austrian Journal of Statistics (Dinges and Templ 2009) as being an effective instrument to train and teach staff on mathematical and statistical topics. While the fundamental principles were retained, the current TGUI system has been undergone a complete redesign. The ultimate goal behind the reimplementation was to share the advantages of TGUI and provide teachers and people who need to hold training courses with a strong tool that can enrich their lectures with interactive features. The idea was to go a step beyond the current modular blended-learning systems (see, e.g., Da Rin 2003) or the related teaching techniques of classroom-voting (see, e.g., Cline 2006). In this paper the authors have attempted to exemplify basic idea and concept of TGUI by means of statistics seminars held at Statistics Austria. The powerful open source software R (R Development Core Team 2010a) is the backend for TGUI, which can therefore be used to process even complex statistical contents. However, with specifically created contents the interactive TGUI system can be used to support a wide range of courses and topics. The open source R packages TGUICore and TGUITeaching are freely available from the Comprehensive R Archive Network at http://CRAN.R-project.org/.
Feedback-based integration of the whole process of data anonymization in a graphical interface
The interactive, web-based point-and-click application presented in this article, allows anonymizing data without any knowledge in a programming language. Anonymization in data mining, but creating safe, anonymized data is by no means a trivial task. Both the methodological issues as well as know-how from subject matter specialists should be taken into account when anonymizing data. Even though specialized software such as sdcMicro exists, it is often difficult for nonexperts in a particular software and without programming skills to actually anonymize datasets without an appropriate app. The presented app is not restricted to apply disclosure limitation techniques but rather facilitates the entire anonymization process. This interface allows uploading data to the system, modifying them and to create an object defining the disclosure scenario. Once such a statistical disclosure control (SDC) problem has been defined, users can apply anonymization techniques to this object and get instant feedback on the impact on risk and data utility after SDC methods have been applied. Additional features, such as an Undo Button, the possibility to export the anonymized dataset or the required code for reproducibility reasons, as well its interactive features, make it convenient both for experts and nonexperts in R – the free software environment for statistical computing and graphics – to protect a dataset using this app
Imputation with the R Package VIM
The package VIM is developed to explore and analyze the structure of missing values in data using visualization methods, to impute these missing values with the built-in imputation methods and to verify the imputation process using visualization tools, as well as to produce high-quality graphics for publications. This article focuses on the different imputation techniques available in the package. Four different imputation methods are currently implemented in VIM, namely hot-deck imputation, k-nearest neighbor imputation, regression imputation and iterative robust model-based imputation. All of these methods are implemented in a flexible manner with many options for customization. Furthermore in this article practical examples are provided to highlight the use of the implemented methods on real-world applications. In addition, the graphical user interface of VIM has been re-implemented from scratch resulting in the package VIMGUI to enable users without extensive R skills to access these imputation and visualization methods
Die Theorie lebt in der Praxis : ein Interview mit Ernst Stadlober
Das Interview mit Ernst Stadlober wurde von Herwig Friedl und Matthias Templ am 18. Dezember 2015 geführt. Es zeichnet ein Bild des beruflichen Werdeganges von Ernst Stadlober, von seinen Anfängen wo er mit fix gesetztem seed auf deterministischem Wege über die Random Number Generation zu seiner sehr breiten Ausrichtung der Statistik fand. Viele erfolgreich angewandte Forschungsprojekte mit Partnern aus Verwaltung, Industrie und Wirtschaft bezeugen ebenso seine Erfolgsgeschichte als auch die beispiellose intensive Betreuung von Studenten an der TU Graz. Man kann zurecht behaupten, dass Ernst Stadlober ein breites Methodenspektrum aus dem Gebiet der Statistik beherrscht und es trotzdem schaffte in viele Spezialgebiete auch tiefer vorzudringen.
Ernst Stadlobers berufliche Heimat war und ist das Statistikinstitut der TU Graz, das er seit 1998 auch leitet. Dazwischen war er auf Forschungsaufenthalten an der Stanford University/USA und der TH Darmstadt und hatte eine Lehrstuhlvertretung an der Universität Kiel. Bis heute hat er 12 Dissertationen und mehr als 90 Diplom-/Masterarbeiten betreut. Zum Repertoire seiner Lehre zählt die (Angewandte) Statistik, Zeitreihenanalyse, Stochastische Modellierung und Simulation, Versuchsplanung und einiges mehr. Zusätzlich blickt er heute auf eine Reihe von über 100 Vorträgen sowie auf etwa 80 Publikationen aus dem Bereich der Biostatistik, Computerstatistik und Angewandten Statistik zurück
A systematic overview on methods to protect sensitive data provided for various analyses
In view of the various methodological developments regarding the protection of sensitive data, especially with respect to privacy-preserving computation and federated learning, a conceptual categorization and comparison between various methods stemming from different fields is often desired. More concretely, it is important to provide guidance for the practice, which lacks an overview over suitable approaches for certain scenarios, whether it is differential privacy for interactive queries, k-anonymity methods and synthetic data generation for data publishing, or secure federated analysis for multiparty computation without sharing the data itself. Here, we provide an overview based on central criteria describing a context for privacy-preserving data handling, which allows informed decisions in view of the many alternatives. Besides guiding the practice, this categorization of concepts and methods is destined as a step towards a comprehensive ontology for anonymization. We emphasize throughout the paper that there is no panacea and that context matters
Habitat-Dependency of Transect Walk and Pan Trap Methods for Bee Sampling in Farmlands
Bees are the most important group of flower visitors providing an essential ecosystem service, namely pollination. Due to the worldwide decline of bees, there should be standardized sampling methods in place to ensure consistent and comparable results between studies. We compared the two commonly used sampling methods of yellow pan traps and transect walk to determine (i) which habitat variables affect the species composition, abundance and species richness of sampled bee communities, (ii) which method potentially contains sampling bias towards some individuals or groups of bees and (iii) the efficiency of sampling in various habitats. We conducted fieldwork in different agricultural habitats distributed along landscape heterogeneity and topography gradients.
Our results showed that the height of vegetation, the average number of flowers and the amount of woody vegetation had the greatest influence on the sampling efficiency. Our survey also demonstrated that sampling by transect walk captured less bees in general, especially in stubble, maize, and cereal fields. We found that Apis mellifera and Bombus spp. were well represented in samples collected by the transect walk method, while the abundance of other genera, especially Dasypoda, Hylaeus and Panurgus was higher in pan traps. Based on the results, we suggest (i) the transect walk method to compare samples of flower-visiting wild bee communities from various habitats of different vegetation and flower characteristics, (ii) application of the transect walk or pan traps to compare similar habitats and (iii) adoption of a comprehensive method which would incorporate both sampling techniques to gain a more complex insight into wild bee species composition
Statistical analysis of chemical element compositions in Food Science : problems and possibilities
In recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products
- …