36 research outputs found

    Comparing Outlier Detection Methods using Boxplot Generalized Extreme Studentized Deviate and Sequential Fences

    Get PDF
    Outliers identification is essential in data analysis since it can make wrong inferential statistics. This study aimed to compare the performance of Boxplot, Generalized Extreme Studentized Deviate (Generalized ESD), and Sequential Fences method in identifying outliers. A published dataset was used in the study. Based on preliminary outlier identification, the data did not contain outliers. Each outlier detection method's performance was evaluated by contaminating the original data with few outliers. The contaminations were conducted by replacing the two smallest and largest observations with outliers. The analysis was conducted using SAS version 9.2 for both original and contaminated data. We found that Sequential Fences have outstanding performance in identifying outliers compared to Boxplot and Generalized ESD

    Technical note: A procedure to clean, decompose, and aggregate time series

    Get PDF
    Errors, gaps, and outliers complicate and sometimes invalidate the analysis of time series. While most fields have developed their own strategy to clean the raw data, no generic procedure has been promoted to standardize the pre-processing. This lack of harmonization makes the inter-comparison of studies difficult, and leads to screening methods that can be arbitrary or case-specific. This study provides a generic pre-processing procedure implemented in R (ctbi for cyclic/trend decomposition using bin interpolation) dedicated to univariate time series. Ctbi is based on data binning and decomposes the time series into a long-term trend and a cyclic component (quantified by a new metric, the Stacked Cycles Index) to finally aggregate the data. Outliers are flagged with an enhanced box plot rule called Logbox that corrects biases due to the sample size and that is adapted to non-Gaussian residuals. Three different Earth science datasets (contaminated with gaps and outliers) are successfully cleaned and aggregated with ctbi. This illustrates the robustness of this procedure that can be valuable to any discipline.</p

    Factors influencing hotels’ online prices

    Get PDF
    Digital corporations are creating new paths of business driven by consumers empowered by social media. Understanding the role that each feature drawn from online platforms has on price fluctuation is vital for leveraging decision making. In this study, 5603 simulations of online reservations from 23 Portuguese cities were gathered, including characterizing features from social media, web visibility and hotel amenities, from four renowned online sources: Booking.com, TripAdvisor, Google, and Facebook. After data preparation, including removal of irrelevant features in terms of modeling and outlier cleaning, a tuned dataset of 3137 simulations and 30 features (including the price charged per day) was used first for evaluating the modeling performance of an ensemble of multilayer perceptrons, and then for extracting valuable knowledge through the data-based sensitivity analysis. Findings show that all features from the encompassed factors (social media, online reservation, hotel characteristics, web visibility and city) play a significant role in price.info:eu-repo/semantics/acceptedVersio

    Redes neurais artificiais, regressão quantílica e regressão linear para predição do índice de sítio na presença de “outliers”

    Get PDF
    The objective of this work was to compare methods of obtaining the site index for eucalyptus (Eucalyptus spp.) stands, as well as to evaluate their impact on the stability of this index in databases with and without outliers. Three methods were tested, using linear regression, quantile regression, and artificial neural network. Twenty-two permanent plots from a continuous forest inventory were used, measured in trees with ages from 23 to 83 months. The outliers were identified using a boxplot graphic. The artificial neural network showed better results than the linear and quantile regressions, both for dominant height and site index estimates. The stability obtained for the site index classification by the artificial neural network was also better than the one obtained by the other methods, regardless of the presence or the absence of outliers in the database. This shows that the artificial neural network is a solid modelling technique in the presence of outliers. When the cause of the presence of outliers in the database is not known, they can be kept in it if techniques as artificial neural networks or quantile regression are used.O objetivo deste trabalho foi comparar métodos para obtenção do índice de sítio para povoamentos de eucalipto (Eucalyptus spp.), bem como avaliar seus impactos na estabilidade desse índice em bases de dados com e sem a presença de “outliers”. Foram testados três métodos, com uso de regressão linear, regressão quantílica e rede neural artificial. Foram utilizadas 22 parcelas permanentes de inventário florestal contínuo, medidas em árvores com idade de 23 a 83 meses. Os outliers foram identificados com uso de gráfico de boxplot. A rede neural artificial proporcionou melhores resultados que as regressões linear e quantílica, tanto para as estimativas de altura dominante quanto do índice de sítio. A estabilidade da classificação do índice de sítio obtida pela rede neural artificial também foi melhor que a obtida com os outros métodos, independentemente da presença ou da ausência de outliers na base de dados. Isso indica que a rede neural artificial é uma técnica sólida de modelagem na presença de outliers. Quando a causa da presença de outliers na base de dados não é conhecida, eles podem ser mantidos nela se técnicas como as de redes neurais artificiais ou de regressão quantílica forem utilizadas

    Swimming pools and intra-city climates: Influences on residential water consumption in Cape Town

    Get PDF
    Water demand management can be effective as a resource management approach if demand estimation is accurate and consumption determinants are defined. While determinants such as household income, regional climate, water price, property size and household occupancy have been comprehensively studied and modelled, other determinants such as swimming pools and intra-city climates have not. This study examines residential water consumption in the City of Cape Town in 2008/2009, under property size regimes, to separately determine whether the presence of pools or occurrence of different intra-city precipitation patterns have an influence on water consumption. A sample of 14 233 properties is analysed, with 20.86% having swimming pools within their boundaries. Overall, those properties with swimming pools used 37.36% or 8.85 kℓ per month more water than those without, with pools having a larger influence on household consumption on smaller properties. These results were statistically significant. Different precipitation patterns occurred over the study period, and while there were indications that consumption may be lower if there is more rainfall, limited evidence was found to support the hypothesis.Keywords: water consumption, water demand management, swimming pools, precipitation, Cape Tow

    Analyzing Change-of-Direction and the Laterally Resisted Split Squat: Incorporating a Lateral Vector into the Single Leg Squat

    Get PDF
    Improving change of direction (COD) with the use of strength training has led to mixed results. To date, the modified single leg squat (MSLS) and the bilateral squat (BS) have been successfully used to improve COD, with equal improvement. COD is primarily performed at a 45-75° frontal plane angle; however, the MSLS and BS are performed at a 90° frontal plane angle. Based on the force vector theory, it is proposed that a more mechanically similar strength training exercise, the Laterally Resisted Split Squat (LRSS), be used. The purpose of this study is to compare COD with the LRSS, MSLS, and the BS via kinetic measurements. Ten healthy and recreationally active female individuals volunteered for this study. Participants were pre-screened using a COD test to verify proper mechanics. Participant’s weight was measured and 1RM (using Bryzcki formula/technique) for the LRSS, MSLS, and BS calculated. Peak ground reaction force (GRF) of participant’s dominant leg in the frontal plane for COD and the three exercises at 70% 1RM was collected and used to calculate peak magnitude and vector angle. Peak GRF magnitude was significantly larger in COD (2.13 ± 0.52 bodyweight: BW) than the LRSS (0.85 ± 0.07 BW; p \u3c 0.001), MSLS (0.99 ± 0.10 BW; p = 0.001), and BS (0.52 ± 0.07 BW; p \u3c 0.001). COD (66.70° ± 4.98°) vector angle was not significantly difference than the LRSS (74.94° ± 4.11°; p = 0.057) as compared to the MSLS (89.04° ± 0.48°; p \u3c 0.001) and BS (82.69° ± 4.30°; p \u3c 0.001). In an application of the force vector theory, the LRSS more closely matches COD than the MSLS or BS

    Métricas científicas em estudos bibliométricos: detecção de outliers para dados univariados

    Get PDF
    This study presents formulas for detection of outliers for univariate data, taking into consideration the positive as well as the negative asymmetry of data. This new formula is based on the Exploratory Data Analysis and is simulated through the comparison of the outcome of the Exploratory Data Analysis found in statistical text books and statistical software. However, only normal or Gaussian distribution, i.e., symmetric or slightly asymmetric values, are applied. Real data published in two scientific papers on metrics are used for the simulation. For moderate or strong positive (negative) asymmetries, the new formulation detects a lower (higher) quantity of superior outliers. It is important to take into account the existence of outliers in bibliometric data; it is recommended to quantify the influence of outliers in statistical calculation, such as mean and standard deviation.Apresenta fórmulas, para dados univariados, de detecção de outliers que levem em conta a assimetria dos dados, tanto positiva como negativa. A nova formulação, proveniente da Análise Exploratória de Dados, é simulada comparando os resultados com a proposta oriunda da Análise Exploratória de Dados, presente na maioria dos livros-textos de estatística e softwares estatísticos, mas que se aplica somente para distribuições normais ou gaussianas, ou seja, simétricas ou com leve assimetria. Para a simulação, são utilizados dados reais publicados por dois trabalhos na área de métricas científicas. Para assimetrias positivas (negativas) moderadas ou fortes, a nova formulação detecta menor (maior) quantidade de outliers superiores que a proposta clássica. É importante levar em conta a existência de outliers nos dados bibliométricos, pois recomendase quantificar a influência dos mesmos nos cálculos estatísticos, tais como média e desvio padrão

    State anxiety alters the neural oscillatory correlates of predictions and prediction errors during reward-based learning

    Get PDF
    Anxiety influences how the brain estimates and responds to uncertainty. The consequences of these processes on behaviour have been described in theoretical and empirical studies, yet the associated neural correlates remain unclear. Rhythm-based accounts of Bayesian predictive coding propose that predictions in generative models of perception are represented in alpha (8–12 Hz) and beta oscillations (13–30 Hz). Updates to predictions are driven by prediction errors weighted by precision (inverse variance), and are encoded in gamma oscillations (>30 Hz) and associated with suppression of beta activity. We tested whether state anxiety alters the neural oscillatory activity associated with predictions and precision-weighted prediction errors (pwPE) during learning. Healthy human participants performed a probabilistic reward-based learning task in a volatile environment. In our previous work, we described learning behaviour in this task using a hierarchical Bayesian model, revealing more precise (biased) beliefs about the tendency of the reward contingency in state anxiety, consistent with reduced learning in this group. The model provided trajectories of predictions and pwPEs for the current study, allowing us to assess their parametric effects on the time-frequency representations of EEG data. Using convolution modelling for oscillatory responses, we found that, relative to a control group, state anxiety increased beta activity in frontal and sensorimotor regions during processing of pwPE, and in fronto-parietal regions during encoding of predictions. No effects of state anxiety on gamma modulation were found. Our findings expand prior evidence on the oscillatory representations of predictions and pwPEs into the reward-based learning domain. The results suggest that state anxiety modulates beta-band oscillatory correlates of pwPE and predictions in generative models, providing insights into the neural processes associated with biased belief updating and poorer learning

    Effect of the net radiation substitutes on maize and soybean evapotranspiration estimation using machine learning methods

    Get PDF
    La estimación precisa de la evapotranspiración (ET) es esencial para gestionar agua en cultivos, pero no es una tarea fácil. Las metodologías empíricas de ET requieren mediciones precisas de la radiación neta (Rn) para obtener resultados confiables. Sin embargo, estas mediciones no son rutinarias en las estaciones meteorológicas. Este trabajo exploró el uso de aprendizaje automático para estimar la ET diaria con dos sustitutos de Rn: la radiación solar extraterrestre (Ra) y la Rn modelada (RnM). Se utilizó Support Vector Machine (SVM), Kernel Ridge (KR), Decision Tree (DT), Adaptive Boosting (AB) y Multilayer Perceptron (MLP) para modelar observaciones de FLUXNET. Adaptive Boosting brindó los mejores resultados con observaciones de Rn (RnO), con un valor para la raíz del error cuadrático medio de aproximadamente el 16 % de Rn medio observado. La Rn resultante (AB RnM) se utilizó para modelar la ET, usando RnO, AB RnM y Ra, junto a variables meteorológicas y el índice NDVI. Los métodos evaluados estimaron adecuadamente la ET, arrojando errores similares a los obtenidos con RnO, cuando se contrastan con las observaciones de ET. Estos resultados demuestran que AB y KR son aplicables con datos rutinarios meteorológicos y de satélite para estimar la ET.Accurate evapotranspiration (ET) estimation is essential for water management in crops, but it is not an easy task. Empirical ET methodologies require precise net radiation (Rn) measurements to obtain accurate results. Nevertheless, Rn measurements are not easy to obtain from meteorological stations. Thus, this study explored the use of machine learning algorithms with two Rn substitutes, to estimate daily ET: the extraterrestrial solar radiation (Ra) and a modelled Rn (RnM). Support Vector Machine (SVM), Kernel Ridge (KR), Decision Tree (DT), Adaptive Boosting (AB), and Multilayer Perceptron (MLP) were applied to model FLUXNET Rn and ET observations. Adaptive Boosting produced the best field Rn measurements (RnO), yielding a Root Mean Square Error of about 16 % of the mean observed Rn. The resulting Rn (AB RnM) was used to model daily crops ET employing the above-mentioned machine learning methods with RnO, AB RnM, and Ra, in conjunction with meteorological variables and the NDVI index. The evaluated methods were suitable to estimate ET, yielding similar errors to those obtained with RnO, when contrasted with ET observations. These results demonstrate that AB and KR are applicable with rutinary meteorological and satellite data to estimate ET.Fil: Venturini, Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe; Argentina. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas; ArgentinaFil: Walker, Elisabet. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe; Argentina. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas; ArgentinaFil: Fonnegra Mora, Diana Carolina. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas; ArgentinaFil: Fagioli, Gianfranco. Kilimo S.a; Argentin

    Characterisation of Malaysia agarwood oil (aquilaria sp.) and comparison with different origins based on sensory studies

    Get PDF
    Agarwood has wide applications for medicine, aromatherapy, perfume, cosmetics, and incense. One of the primary issues in the agarwood trading industry is difficult to identify the grade and quality accurately as there is no standard reference. Therefore, this study aims to analyze and compare the chemical profile of agarwood oil from different origins (Malaysia, India, Cambodia, and Thailand) using gas chromatography analysis. The identification and validation of the selected marker compounds based on the distribution of chemical constituents among the samples were carried out using preparative gas chromatography (Prep-GC). The evaluation of the odor profile from the fabricated electronic nose (E-nose) was conducted for different grades of agarwood oil as a preliminary assessment. The extraction was carried out using the Taguchi method as the design of experiment for fabricated hydrodistillation. The results show that highest yield was achieved from sample EX8 with 1.05 g of agarwood oil. The extraction of EX8 was conducted at 14 days of soaking time, 16 h of extraction time, and a soaking ratio of 1:8 (sample:water). Overall, sesquiterpenoid compounds were identified as the major compound in a high-grade sample, which contradicted low-grade agarwood oil. The major sesquiterpenoid compounds identified in the samples were norketoagarofuran, selina-4,11-dien-14-oic acid, epi-α-cadinol, kusunol, agarospirol, 10-epi-γ-eudesmol, α-agarofuran, guaia-1(10),11-dien-15-ol, α-eudesmol, bulnesol, guaiol, 9,11-eremophiladien-8-one, rotundone, and selina-3,11-dien-9-one. Agarospirol and nhexadecanoic acid were selected as the marker compounds and further isolated using Prep-GC. Finally, the odor profiles of agarwood oil samples were successfully developed by the E-nose based on sensory studies. This study provides a reference for agarwood oil from different origins, specifically from Malaysia, India, Cambodia, and Thailand based on the distribution of chemical constituents toward standardizing the grade and quality. In addition, a study on odor profile response from the fabricated E-nose provides fundamental results for the development of a systematic instrument for the assessment of the grade and quality of agarwood oil in the agarwood industry
    corecore