2,679 research outputs found

    Quantitative analysis of sold and unsold Forest Service timber offerings in Region 1

    Get PDF

    Nonparametric Two-Group Classification: Concepts and a SAS-Based Software Package

    Get PDF
    In this paper, we introduce BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule utilize the mathematical programming routines of the SAS/OR@ software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in BestClass with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors

    Robust discriminant analysis.

    Get PDF
    Robuuste discriminant analyse Discriminant analyse, en de bijhorende classificatieregels, wordt vaak g ebruikt in de praktijk. Denk bijvoorbeeld aan de marketing afdeling van een bank die tracht beleggingsfondsen te verkopen aan nieuwe klanten. Ve rmits men uiteraard niet onnodig nieuwe klanten wenst lastig te vallen, wil men ervoor zorgen dat de reclame alleen aan mogelijk geïnteresseerde n gegeven wordt. Discriminant analyse kan hier helpen, maar er moet reke ning mee gehouden worden dat de gegevensbank waarover de bank beschikt e rg groot kan zijn, en dat deze vele atypische observaties kan bevatten, ook wel uitschieters genaamd. In discrimin antanalyse tracht men een regel op te stellen die toelaat o m multivariate observaties aan verschillende groepen toe te wijzen. Deze regel wordt geconstrueerd op basis van een oefensteekproef, wat een ver zameling observaties is waarvan men reeds weet tot welke groep ze behore n. Als voorbeeld kan als oefensteekproef een verzameling cliënten bescho uwd worden. Een deel hiervan zijn mensen die reeds beleggingsfondsen heb ben en een ander deel niet. Voor al deze cliënten worden enkele relevant e karakteristieken gemeten, zoals het spaargeld, het gezinsinkomen, info rmatie over lopende leningen, het aantal kinderen, enz. Gebruikmakende v an deze informatie kan men dan een discriminant regel opstellen, die toe gepast kan worden op nieuwe cliënten waarvan men enkel de karakteristiek en kent, doch die niet in de oefensteekproef zaten. Deze cliënten kunnen dan toegekend worden aan één van de groepen. Enkel de nieuwe cliën ten die toegekend worden aan de groep van mensen geïnteresseerd in beleg gingsfondsen, zullen de reclame ontvangen. Vele andere toepassingen van discriminant analyse kunnen uiteraard gevonden worden in economie, biolo gie, geneeskunde, enz. De klassieke discriminant regels kunnen echter erg sterk beïnvloed worde n door aanwezigheid van enkele uitschieters in de oefensteekproef, waard oor de resultaten onbetrouwbaar kunnen worden. Daarom is er nood aan rob uuste alternatieven die zich stabieler gedragen in aanwezigheid van uits chieters in de data. In de literatuur werden reeds resultaten voor robuu ste discriminant analyse gegeven, doch dit was meestal beperkt tot linea ire discriminant analyse en in het geval van slechts twee groepen. In di t proefschrift worden ook robuuste niet-lineaire discriminant regels bes tudeerd, zoals kwadratische en logistische regels. Tevens wordt in dit p roefschrift een uitbreiding naar discriminant analyse voor meerdere groe pen voorzien. Het kan bijvoorbeeld zeer interessant zijn om groepen van beleggers te onderscheiden, afhankelijk van de karakteristieken van de p ersonen in die verschillende groepen. In dit proefschrift werden nieuwe discriminant procedures ontwikkeld, di e zich robuust gedragen in aanwezigheid van uitschieters en een zo klein mogelijke kans op foutieve classificatie geven. Statistische eigenschap pen werden afgeleid voor de verscheidene methodes en voorgesteld in de v erschillende artikels die reeds gepubliceerd werden of reeds ingestuurd werden voor publicatie. Ze situeren zich allemaal in het domein van robu uste statistiek. Naast robuuste discriminant analyse, werd o ok aandacht geschonken aan robuuste tijdreeksenanalyse.

    Quantitative acoustic differentiation of cryptic species illustrated with King and Clapper rails

    Get PDF
    Reliable species identification is vital for survey and monitoring programs. Recently, the development of digital technology for recording and analyzing vocalizations has assisted in acoustic surveying for cryptic, rare, or elusive species. However, the quantitative tools that exist for species differentiation are still being refined. Using vocalizations recorded in the course of ecological studies of a King Rail (Rallus elegans) and a Clapper Rail (Rallus crepitans) population, we assessed the accuracy and effectiveness of three parametric (logistic regression, discriminant function analysis, quadratic discriminant function analysis) and six nonparametric (support vector machine, CART, Random Forest, k�nearest neighbor, weighted k�nearest neighbor, and neural networks) statistical classification methods for differentiating these species by their kek mating call. We identified 480 kek notes of each species and quantitatively characterized them with five standardized acoustic parameters. Overall, nonparametric classification methods outperformed parametric classification methods for species differentiation (nonparametric tools were between 57% and 81% accurate, parametric tools were between 57% and 60% accurate). Of the nine classification methods, Random Forest was the most accurate and precise, resulting in 81.1% correct classification of kek notes to species. This suggests that the mating calls of these sister species are likely difficult for human observers to tell apart. However, it also implies that appropriate statistical tools may allow reasonable species�level classification accuracy of recorded calls and provide an alternative to species classification where other capture� or genotype�based survey techniques are not possible

    Performance Evaluation of Logistic Regression, Linear Discriminant Analysis, and Classification and Regression Trees Under Controlled Conditions

    Get PDF
    Logistic Regression (LR), Linear Discriminant Analysis (LDA), and Classification and Regression Trees (CART) are common classification techniques for prediction of group membership. Since these methods are applied for similar purposes with different procedures, it is important to evaluate the performance of these methods under different controlled conditions. With this information in hand, researchers can apply the optimal method for certain conditions. Following previous research which reported the effects of conditions such as sample size, homogeneity of variancecovariance matrices, effect size, and predictor distributions, this research focused on effects of correlation between predictor variables, number of the predictor variables, number of the groups in the outcome variable, and group size ratios for the performance of LDA, LR, and CART. Data were simulated with Monte Carlo procedures in R statistical software and a factorial ANOVA with follow-ups was employed to evaluate the effect of conditions on the performance of each technique as measured by proportions of correctly predicted observations for all groups and for the smallest group. In most of the conditions for the two outcome measures, higher performances of CART than LDA and LR were observed. But, in some conditions where there were a higher number of predictor variables and number of groups with low predictor variable correlation, superiority of LR to CART was observed. Meaningful effects of methods of correlation, number or predictor variables, group numbers and group size ratio were observed on prediction accuracy of group membership. Effects of correlation, group size ratio, group number, and number of predictor variables on prediction accuracies were higher for LDA and LR than CART. For the three methods, lower correlation and greater number of predictor variables yielded higher prediction accuracies. Having balanced data rather than imbalanced data and greater group numbers led to lower group membership prediction accuracies for all groups, but having more groups led to better predictions for the small group. In general, based on these results, researchers are encouraged to apply CART in most conditions except for the cases when there are many predictor variables (around 10 or more) and non-binary groups with low correlations between predictor variables, when LR might provide more accurate results

    Assessing and Analyzing Bat Activity with Acoustic Monitoring: Challenges and Interpretations

    Get PDF
    Acoustic monitoring is a powerful technique for learning about the ecology of bats, but understanding sources of variation in the data collected is important for unbiased interpretation. The objectives of this dissertation were to investigate sources of variation in acoustic monitoring and make recommendations for acoustic survey design and analysis. I addressed this goal in three ways: i) variation resulting from differences in bat detectors, ii) methods for objective identification of peak activity, and iii) the use of stationary transects to address within-site spatial variation. First, I compared variation of detection of echolocation calls among commonly available bat detectors and found significant differences in distance and angle of detection. Consequently, this source of variation should be taken into account when comparing datasets obtained with different systems. Furthermore, choice of detector should be taken into account when designing new studies. Second, I investigated two statistical methods for identifying peaks in activity, percentile thresholds and space-time scan statistic (SaTScan). Acoustic monitoring provides a relative measure of activity levels and is rarely evaluated based on objective criteria, so describing bat activity as “high” or “low” is useful only in context of the studies in question. Percentile thresholds allow for peaks to be identified relative to a larger distribution of activity levels. SaTScan identifies peaks in space and time that are significantly higher than the background expectation of the dataset. Both methods are valuable tools for replicable and objective identification of peak activity that can be applied at various temporal and spatial scales. Third, I examine how within-site spatial variation can impact estimates of bat activity. I used a stationary transect of bat detectors to i) assess variation in patterns of activity at each detector, ii) test whether spatial or temporal factors were more important for explaining variation in activity, iii) explore what sampling effort in space and time is required for species-specific activity levels. The picture of activity differs significantly within a site depending on detector placement so it is important to use multiple detectors simultaneously to collect accurate estimates of activity

    South African cranial variation : a combined metric-macromorphoscopic method for ancestry estimation

    Get PDF
    Thesis (PhD (Anatomy))--University of Pretoria, 2023.Ancestry is a fundamental parameter of the biological profile. To date South African forensic anthropologists are only able to successfully apply a metric approach to estimate ancestry from skeletal remains. While a non-metric, or macromorphoscopic (MMS) approach exists, limited research has been conducted to explore its use in a South African population. The method has not been sufficiently tested and validated which is required for anthropological methodology to be compliant with standards of best practice. This study aimed to explore the MMS traits and its covariation with cranial measurements to develop improved methodology for the estimation of ancestry from skeletal remains in South Africa. A suite of 17 MMS traits and 25 standard linear measurements were collected from 660 crania of black, white and coloured South Africans. Inter- and intra-observer agreement was closely scrutinized as visual methods have been shown to be prone to error. The intra-observer agreement ranged from moderate to perfect, with three traits (inferior nasal margin, nasal bone shape, and nasal overgrowth) yielding slightly lower repeatability. Inter-observer agreement was assessed among five individuals with varying levels of general experience and familiarity with the traits. Overall, the observers demonstrated poor to substantial agreement. A group discussion on the scoring procedure, followed by subsequent rescoring of the crania showed a slight increase in overall agreement, with kappa values ranging between moderate and substantial. While general experience does not appear to translate to proficiency with the method, familiarity with the traits and scoring procedure contributes to consistent scores. Thus, method-specific training is essential prior to employing the MMS traits in practice. Technical error of measurement was used to assess the repeatability of the measurements, where the intra-observer error was noted to be lower than the inter-observer error. The greatest disparity was observed with the inter-orbital breadth and mastoid height for both the inter- and intra-observer assessments. The MMS trait frequency distributions revealed substantial group variation and overlap. Ultimately, not a single trait can be considered characteristic of any one population group. Kruskal-Wallis and Dunn’s tests demonstrated significant population differences for 13 of the 17 traits. Black and coloured South Africans, and coloured and white South Africans shared similarities for many of the traits, but black and white South Africans did not present with significant overlap for any trait. ANOVA and Tukey’s honestly significant difference (HSD) test revealed that all measurements were significantly different for ancestry, except the foramen magnum length. Substantial variation and overlap were observed for the measurements among all three groups. Random Forest Modelling (RFM) was used to develop classification models to assess the reliability and accuracy of the variables in identifying ancestry. Models were created for the traits and measurements separately to gauge the discriminatory power of each dataset. A combined model including all data was also created to test if mixed data can better capture cranial variation than individual methods. The MMS model outperformed the metric model, with classification accuracies of 79% and 72%, respectively. Ultimately, the best results were obtained with the mixed model, which yielded an accuracy of 81%. The results indicate that the combination of size and shape data (as quantified with the mixed model) can effectively distinguish between black, white and coloured South Africans despite significant group overlap. Thus, this study has shown the MMS traits to be a valid and tested method, and the population-specific data from this study can be used to add MMS analyses to forensic casework and skeletal analyses in South Africa.AnatomyPhD (Anatomy)UnrestrictedFaculty of Health SciencesSDG-16: Peace, justice and strong institution
    corecore