19 research outputs found

    In-situ measurements and GIS-based analysis of the microclimate at the Universiti Teknologi Malaysia, Kuala Lumpur

    Get PDF
    Developing tropical countries are expecting a large number of population in the near future, thus, environmental degradation issues due to excessive economic development and urban climate change are becoming a major threat to modern society. In order to improve urban design and sustainable architecture, in accordance with this specific tropical climate, a quantitative grasp of the microclimate in a developed city is highly desirable. Moreover, only a few studies have been carried out on the aforementioned issues in a low-latitude tropical urban region. Therefore, this study aims to provide a better insight into the use of in-situ microclimatic measurements and Geographical Information Systems (GIS), particularly in analysing the effect of greenery coverage and morphological aspects, i.e. height to width ratio of built-up features, for understanding the microclimate pattern at the university campus. The study area is situated at the Universiti Teknologi Malaysia, Kuala Lumpur, (UTM KL), which is a local-scale city campus environment, located near the Kuala Lumpur city centre. The urban microclimate was observed for the duration of one year. The climatic data were mapped and spatially analysed in relation to different land cover types in the GIS environment. Moreover, the effect of green areas and building morphology are critically evaluated with regards to the changes made to the local climatic variables in the campus. As a result, this study reveals that the effects of greenery coverage and the morphological characteristics on the campus providing a good indicator of the microclimate pattern in a developed city campus. In conclusion, with the support of in-situ measurements and GIS analysis, the campus temperature properties were quantitatively evaluated, and this directly contributed to a better understanding of climate change in the city of Kuala Lumpur

    Graphical Summaries of Circular Data with Outliers Using Python Programming Language

    Get PDF
    Graph in statistics is used to summarise and visualise the data in pictorial form. Graphical summary enables us to visualise the data in a more simple and meaningful way so that the interpretation will be easier to understand. The graphical summaries of circular data with outliers is discussed in this study. Most of the time, people use linear data in real life applications. Other than linear data, there is another data type that has a direction which refers to circular data and it is different from linear data in many aspects such as in descriptive statistics and statistical modeling. Unfortunately, the availability of statistical software specialises in analysing circular data is very limited. In this study, the graphical summaries of circular data are plotted using the in-demand programming language, Python. The Python code for generating graphical summaries of circular data such as circular dot plot and rose diagram is proposed. The historical circular data is used to illustrate the graphical summaries with the existence of outliers. This study will be helpful for those who are started exploring circular data and choose Python as an analysis tool

    Review on circular-linear regression models

    Get PDF
    Classical linear statistics method is no longer appropriate when handling circular data since the data is influenced by direction or angle. Considering the possibility of circular data appeared as dependent variable, it has resulted in the remodeling of classic linear regression model into circular-linear regression model over the past few decades. It is important to acknowledge these circular data characteristics as it can affect the descriptive and inference of statistical analysis. With the growing body of literature regarding this issue, this paper will review on circular-linear regression model by highlighting and exploring their benefits and limitations

    A synthetic data generation procedure for univariate circular data with various outliers scenarios using Python programming language

    Get PDF
    Synthetic data is artificial data that is created based on the statistical properties of the original data. The aim of this study is to generate a synthetic or simulated data for univariate circular data that follow von Mises (VM) distribution with various outliers scenario using Python programming language. The procedure of formulation a synthetic data generation is proposed in this study. The synthetic data is generated from various combinations of seven sample size, n and five concentration parameters, K. Moreover, a synthetic data will be generated by formulating a data generation procedure with different condition of outliers scenarios. Three outliers scenarios are proposed in this study to introduce the outliers in synthetic dataset by placing them away from inliers at a specific distance. The number of outliers planted in the dataset are fixed with three outliers. The synthetic data is randomly generated by using Python library and package which are 'numpy', 'random' and von Mises'. In conclusion, the synthetic data of univariate circular data from von Mises distribution is generated and the outliers are successfully introduced in the dataset with three outliers scenarios using Python. This study will be valuable for those who are interested to study univariate circular data with outliers and choose Python as an analysis tool

    The effect of different similarity distance measures in detecting outliers using single-linkage clustering algorithm for univariate circular biological data

    Get PDF
    Clustering algorithms can be used to create an outlier detection procedure in univariate circular data. The circular distance between each point of angular observation in circular data is used to calculate the similarity measure to appropriately group observations. In this paper, we present a clustering-based procedure for detecting outliers in univariate circular biological data using various similarity distance measures. Three circular similarity distance measures; Satari distance, Di distance and Chang-chien distance were used to detect outliers using a single-linkage clustering algorithm. Satari distance and Di distance are two similarity measures that have similar formulas for univariate circular data. This study aims to develop and demonstrate the effectiveness of the proposed clustering-based procedure with various similarity distance measures in detecting outliers. The circular similarity distance of SL-Satari/Di and other similarity measures, including SL-Chang, were compared at various dendrogram cutting points. It is found that a clustering-based procedure using a single-linkage algorithm with various similarity distances is a practical and promising approach to detect outliers in univariate circular data, particularly for biological data. According to the results, the SL-Satari/Di distance outperformed the SL-Chang distance for certain data conditions

    A review on outliers-detection methods for multivariate data

    Get PDF
    Data in practice are often of high dimension and multivariate in nature. Detection of outliers has been one of the problems in multivariate analysis. Detecting outliers in multivariate data is difficult and it is not sufficient by using only graphical inspection. In this paper, a nontechnical and brief outlier detection method for multivariate data which are projection pursuit method, methods based on robust distance and cluster analysis are reviewed. The strengths and weaknesses of each method are briefly discussed

    Comparison of Robust Estimators’ Performance for Detecting Outliers in Multivariate Data

    Get PDF
    In multivariate data, outliers are difficult to detect especially when the dimension of the data increase. Mahalanobis distance (MD) has been one of the classical methods to detect outliers for multivariate data. However, the classical mean and covariance matrix in MD suffered from masking and swamping effects if the data contain outliers. Due to this problem, many studies used a robust estimator instead of the classical estimator of mean and covariance matrix. In this study, the performance of five robust estimators namely Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME), Index Set Equality (ISE),and Test on Covariance (TOC) are investigated and compared. FMCD has been widely used and is known as among the best robust estimator. However, there are certain conditions that FMCD still lacks. MVV, CME, ISE and TOC are innovative of FMCD. These four robust estimators improve the last step of the FMCD algorithm. Hence, the objective of this study is to observe the performance of these five estimator to detect outliers in multivariate data particularly TOC as TOC is the latest robust estimator. Simulation studies are conducted for two outlier scenarios with various conditions. There are three performance measures, which are pout, pmask and pswamp used to measure the performance of the robust estimators. It is found that the TOC gives better performance in pswamp for most conditions. TOC gives better results for pout and pmask for certain conditions

    Comparison of robust estimators for detecting outliers in multivariate datasets

    Get PDF
    Detecting outliers for multivariate data is difficult and does not work by visual inspection. Mahalanobis distance (MD) has been a classical method to detect outliers in multivariate data. However, classical mean and covariance matrix in MD suffer from masking and swamping effects. Masking effects happened when outliers are not identified and swamping effects happened when inliers are identified as outliers. Hence, robust estimators have been proposed to overcome these problems. In this study, the performance of a new robust estimator named Test on Covariance (TOC) is tested and compared with other robust estimators which are Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME) and Index Set Equality (ISE). These five robust estimators' performance is being tested on five real multivariate datasets. Brain and weight, Hawkins-Bradu Kass, Stackloss, Bushfire and Milk datasets were used as these five real datasets are well-known in most outlier detection studies. Results show that TOC has proven to be able in detecting outliers, does not have a masking effect and has the same performance as other robust estimators in all datasets

    Synthetic multivariate data generation procedure with various outlier scenarios using R programming language

    Get PDF
    A synthetic data generation procedure is a procedure to generate data from either a statistical or mathematical model. The data generation procedure has been used in simulation studies to compare statistical performance methods or propose a new statistical method with a specific distribution. A synthetic multivariate data generation procedure with various outlier scenarios using R is formulated in this study. An outlier generating model is used to generate multivariate data that contains outliers. Data generation procedures for various outlier scenarios by using R are explained. Three outlier scenarios are produced, and graphical representations using 3D scatterplot and Chernoff faces for these outlier scenarios are shown. The graphical representation shows that as the distance between outliers and inliers by shifting the mean, increases in Outlier Scenario 1, the outliers and inliers are completely separated. The same pattern can also be seen when the distance between outliers and inliers, by shifting the covariance, increase in Outlier Scenario 2. For Outlier Scenario 3, when both values and increase, the separation of outliers and inliers are more apparent. The data generation procedure in this study will be continually used in other applications, such as identifying outliers by using the clustering method

    Graphical user interface for statistical characteristics of skull morphology in syndromic craniosynostosis

    Get PDF
    Circular data, such as skull angle can be found in biomedical area. Biomedical data are often complex in structure and exposed to an abnormality. In this study, we consider a case study related to a congenital disorder called craniosynostosis syndrome which results in skull growth abnormalities. In this study, 12 skull angles of craniosynostosis patients age of 0-12 years old in Malaysia are analysed using circular statistics methods. The raw CTSCAN data is provided by UM Specialist Centre. The statistical characteristics of skull morphology in syndromic craniosynostosis are displayed and compared with the normal skull data of Malaysian children age 0-12 years old. A Graphical User Interface (GUI) is developed using Python to give user a specific statistical analysis about the skull morphology characteristics of craniosynostosis syndrome patients in Malaysia
    corecore