95,941 research outputs found

    Bayesian hierarchical modeling for the forensic evaluation of handwritten documents

    Get PDF
    The analysis of handwritten evidence has been used widely in courts in the United States since the 1930s (Osborn, 1946). Traditional evaluations are conducted by trained forensic examiners. More recently, there has been a movement toward objective and probability-based evaluation of evidence, and a variety of governing bodies have made explicit calls for research to support the scientific underpinnings of the field (National Research Council, 2009; President\u27s Council of Advisors on Science and Technology (US), 2016; National Institutes of Standards and Technology). This body of work makes contributions to help satisfy those needs for the evaluation of handwritten documents. We develop a framework to evaluate a questioned writing sample against a finite set of genuine writing samples from known sources. Our approach is fully automated, reducing the opportunity for cognitive biases to enter the analysis pipeline through regular examiner intervention. Our methods are able to handle all writing styles together, and result in estimated probabilities of writership based on parametric modeling. We contribute open-source datasets, code, and algorithms. A document is prepared for the evaluation processed by first being scanned and stored as an image file. The image is processed and the text within is decomposed into a sequence of disjoint graphical structures. The graphs serve as the smallest unit of writing we will consider, and features extracted from them are used as data for modeling. Chapter 2 describes the image processing steps and introduces a distance measure for the graphs. The distance measure is used in a K-means clustering algorithm (Forgy, 1965; Lloyd, 1982; Gan and Ng, 2017), which results in a clustering template with 40 exemplar structures. The primary feature we extract from each graph is a cluster assignment. We do so by comparing each graph to the template and making assignments based on the exemplar to which each graph is most similar in structure. The cluster assignment feature is used for a writer identification exercise using a Bayesian hierarchical model on a small set of 27 writers. In Chapter 3 we incorporate new data sources and a larger number of writers in the clustering algorithm to produce an updated template. A mixture component is added to the hierarchical model and we explore the relationship between a writer\u27s estimated mixing parameter and their writing style. In Chapter 4 we expand the hierarchical model to include other graph-based features, in addition to cluster assignments. We incorporate an angular feature with support on the polar coordinate system into the hierarchical modeling framework using a circular probability density function. The new model is applied and tested in three applications

    Trajectory Clustering and an Application to Airspace Monitoring

    Get PDF
    This paper presents a framework aimed at monitoring the behavior of aircraft in a given airspace. Nominal trajectories are determined and learned using data driven methods. Standard procedures are used by air traffic controllers (ATC) to guide aircraft, ensure the safety of the airspace, and to maximize the runway occupancy. Even though standard procedures are used by ATC, the control of the aircraft remains with the pilots, leading to a large variability in the flight patterns observed. Two methods to identify typical operations and their variability from recorded radar tracks are presented. This knowledge base is then used to monitor the conformance of current operations against operations previously identified as standard. A tool called AirTrajectoryMiner is presented, aiming at monitoring the instantaneous health of the airspace, in real time. The airspace is "healthy" when all aircraft are flying according to the nominal procedures. A measure of complexity is introduced, measuring the conformance of current flight to nominal flight patterns. When an aircraft does not conform, the complexity increases as more attention from ATC is required to ensure a safe separation between aircraft.Comment: 15 pages, 20 figure

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Statistical Analysis of Dynamic Actions

    Get PDF
    Real-world action recognition applications require the development of systems which are fast, can handle a large variety of actions without a priori knowledge of the type of actions, need a minimal number of parameters, and necessitate as short as possible learning stage. In this paper, we suggest such an approach. We regard dynamic activities as long-term temporal objects, which are characterized by spatio-temporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences which captures the similarities in their behavioral content. This measure is nonparametric and can thus handle a wide range of complex dynamic actions. Having a behavior-based distance measure between sequences, we use it for a variety of tasks, including: video indexing, temporal segmentation, and action-based video clustering. These tasks are performed without prior knowledge of the types of actions, their models, or their temporal extents

    A hybrid generative/discriminative framework to train a semantic parser from an un-annotated corpus

    Get PDF
    We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HMSVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences

    Network anomaly detection: a survey and comparative analysis of stochastic and deterministic methods

    Get PDF
    7 pages. 1 more figure than final CDC 2013 versionWe present five methods to the problem of network anomaly detection. These methods cover most of the common techniques in the anomaly detection field, including Statistical Hypothesis Tests (SHT), Support Vector Machines (SVM) and clustering analysis. We evaluate all methods in a simulated network that consists of nominal data, three flow-level anomalies and one packet-level attack. Through analyzing the results, we point out the advantages and disadvantages of each method and conclude that combining the results of the individual methods can yield improved anomaly detection results
    corecore