527 research outputs found

    Information Retrieval Performance Enhancement Using The Average Standard Estimator And The Multi-criteria Decision Weighted Set

    Get PDF
    Information retrieval is much more challenging than traditional small document collection retrieval. The main difference is the importance of correlations between related concepts in complex data structures. These structures have been studied by several information retrieval systems. This research began by performing a comprehensive review and comparison of several techniques of matrix dimensionality estimation and their respective effects on enhancing retrieval performance using singular value decomposition and latent semantic analysis. Two novel techniques have been introduced in this research to enhance intrinsic dimensionality estimation, the Multi-criteria Decision Weighted model to estimate matrix intrinsic dimensionality for large document collections and the Average Standard Estimator (ASE) for estimating data intrinsic dimensionality based on the singular value decomposition (SVD). ASE estimates the level of significance for singular values resulting from the singular value decomposition. ASE assumes that those variables with deep relations have sufficient correlation and that only those relationships with high singular values are significant and should be maintained. Experimental results over all possible dimensions indicated that ASE improved matrix intrinsic dimensionality estimation by including the effect of both singular values magnitude of decrease and random noise distracters. Analysis based on selected performance measures indicates that for each document collection there is a region of lower dimensionalities associated with improved retrieval performance. However, there was clear disagreement between the various performance measures on the model associated with best performance. The introduction of the multi-weighted model and Analytical Hierarchy Processing (AHP) analysis helped in ranking dimensionality estimation techniques and facilitates satisfying overall model goals by leveraging contradicting constrains and satisfying information retrieval priorities. ASE provided the best estimate for MEDLINE intrinsic dimensionality among all other dimensionality estimation techniques, and further, ASE improved precision and relative relevance by 10.2% and 7.4% respectively. AHP analysis indicates that ASE and the weighted model ranked the best among other methods with 30.3% and 20.3% in satisfying overall model goals in MEDLINE and 22.6% and 25.1% for CRANFIELD. The weighted model improved MEDLINE relative relevance by 4.4%, while the scree plot, weighted model, and ASE provided better estimation of data intrinsic dimensionality for CRANFIELD collection than Kaiser-Guttman and Percentage of variance. ASE dimensionality estimation technique provided a better estimation of CISI intrinsic dimensionality than all other tested methods since all methods except ASE tend to underestimate CISI document collection intrinsic dimensionality. ASE improved CISI average relative relevance and average search length by 28.4% and 22.0% respectively. This research provided evidence supporting a system using a weighted multi-criteria performance evaluation technique resulting in better overall performance than a single criteria ranking model. Thus, the weighted multi-criteria model with dimensionality reduction provides a more efficient implementation for information retrieval than using a full rank model

    Data-Driven Process Discovery: A Discrete Time Algebra for Relational Signal Analysis

    Get PDF
    This research presents an autonomous and computationally tractable method for scientific process analysis, combining an iterative algorithmic search and a recognition technique to discover multivariate linear and non-linear relations within experimental data series. These resultant data-driven relations provide researchers with a potentially real-time insight into experimental process phenomena and behavior. This method enables the efficient search of a potentially infinite space of relations within large data series to identify relations that accurately represent process phenomena. Proposed is a time series transformation that encodes and compresses real-valued data into a well-defined, discrete-space of 13 primitive elements where comparative evaluation between variables is both plausible and heuristically efficient. Additionally, this research develops and demonstrates binary discrete-space operations which accurately parallel their numeric-space equivalents. These operations extend the method\u27s utility into trivariate relational analysis, and experimental evidence is offered supporting the existence of traceable multivariate signatures of incremental order within the discrete-space that can be exploited for higher dimensional analysis by means of an iterative best-n first search

    Assumptions underlying behavioral linkage revisited: A multidimensional approach to ascertaining individual differentiation and consistency in serial rape

    Full text link
    While investigative use of behavioral evidence to help link and solve serial offenses has been in use for centuries, the empirical and theoretical grounds for whether and how to use this evidence effectively has begun to emerge only in recent years. In order for behavioral crime linking to be validated, two base assumptions must be met: individual differentiation (i.e., that offenses committed by one offender will be distinctly different from those committed by another offender) and consistency (i.e., that a degree of similarity will be apparent across crimes committed by the same offender). The present study empirically tested (a) the potential for effectively differentiating between rape offense crime scenes using quantitative and qualitative distinctions within the behavioral dimensions of control, violence, and sexual activity, and (b) the extent to which redefining behavioral consistency more broadly to include dynamic trajectories of behavioral change may be more effective than limiting this definition to behavioral stability. Results of the individual differentiation analysis confirmed that sexual offenses can be successfully differentiated based on the specific degree and subtype of these behavioral dimensions present in each crime scene. In the subsequent analysis of consistency and behavioral trajectories within and across these dimensions, it was determined that while none of the offenders exhibited complete consistency across behavioral dimensions, a subsample of offenders remained fully consistent in at least one. Furthermore, of those who were not consistent, the vast majority followed an identifiable trajectory of change. Findings are discussed in the context of psychological theories of behavioral consistency as well as practical aspects of advancing the utility of behavioral linkage

    Vacuum ultraviolet laser induced breakdown spectroscopy (VUV-LIBS) for pharmaceutical analysis

    Get PDF
    Laser induced breakdown spectroscopy (LIBS) allows quick analysis to determine the elemental composition of the target material. Samples need little\no preparation, removing the risk of contamination or loss of analyte. It is minimally ablative so negligible amounts of the sample is destroyed, while allowing quantitative and qualitative results. Vacuum ultraviolet (VUV)-LIBS, due to the abundance of transitions at shorter wavelengths, offers improvements over LIBS in the visible region, such as achieving lower limits of detection for trace elements and extends LIBS to elements\samples not suitable to visible LIBS. These qualities also make VUV-LIBS attractive for pharmaceutical analysis. Due to success in the pharmaceutical sector molecules representing the active pharmaceutical ingredients (APIs) have become increasingly complex. These organic compounds reveal spectra densely populated with carbon and oxygen lines in the visible and infrared regions, making it increasingly difficult to identify an inorganic analyte. The VUV region poses a solution as there is much better spacing between spectral lines. VUV-LIBS experiments were carried out on pharmaceutical samples. This work is a proof of principle that VUV-LIBS in conjunction with machine learning can tell pharmaceuticals apart via classification. This work will attempt to test this principle in two ways. Firstly, by classifying pharmaceuticals that are very different from one another i.e., having different APIs. This first test will gauge the efficacy of separating into different classes analytes that are essentially carbohydrates with distinctly different APIs apart from one another using their VUV emission spectra. Secondly, by classifying two different brands of the same pharmaceutical, i.e., paracetamol. The second test will investigate of the ability of machine learning to abstract and identify the differences in the spectra of two pharmaceuticals with the same API and separate them. This second test presents the application of VUV-LIBS combined with machine learning as a solution for at-line analysis of similar analytes e.g., quality control. The machine learning techniques explored in this thesis were convolutional neural networks (CNNs), support vector machines, self-organizing maps and competitive learning. The motivation for the application of principal component analysis (PCA) and machine learning is for the classification of analytes, allowing us to distinguish pharmaceuticals from one another based on their spectra. PCA and the machine learning techniques are compared against one another in this thesis. Several innovations were made; this work is the first in LIBS to implement the use of a short-time Fourier transform (STFT) method to generate input images for a CNN for VUV-LIBS spectra. This is also believed to be the first work in LIBS to carry out the development and application of an ellipsoidal classifier based on PCA. The results of this work show that by lowering the pulse energy it is possible to gather more useful spectra over the surface of a sample. Although this yields spectra with poorer signal-to-noise, the samples can still be classified using the machine learning analytics. The results in this thesis indicate that, of all the machine learning techniques evaluated, CNNs have the best classification accuracy combined with the fastest run time. Prudent data augmentation can significantly reduce experimental workloads, without reducing classification rates

    System Dynamics of Cognitive Vulnerabilities and Family Support Among Latina Children and Adolescents

    Get PDF
    The paper describes an approach to developing a data-driven development of a feedback theory of cognitive vulnerabilities and family support focused on understanding the dynamics experienced among Latina children, adolescents, and families. Family support is understood to be a response to avoidant and maladaptive behaviors that may be characteristic of cognitive vulnerabilities commonly associated depression and suicidal ideation. A formal feedback theory is developed, appraised, and analyzed using a combination of secondary analysis of qualitative interviews (N = 30) and quantitative analysis using system dynamics modeling and simulation. Implications for prevention practice, treatment, and future research are discussed

    Pattern Recognition for Complex Heterogeneous Time-Series Data: An Analysis of Microbial Community Dynamics

    Get PDF
    Microbial life is the most wide-spread and the most abundant life form on earth. They exist in complex and diverse communities in environments from the deep ocean trenches to Himalayan snowfields. Microbial life is essential for other forms of life as well. Scientific studies of microbial activity include diverse communities such as plant root microbiome, insect gut microbiome and human skin microbiome. In the human body alone, the number of microbial life forms supersedes the number of human body cells. Hence it is essential to understand microbial community dynamics. With the advent of 16S rRNA sequencing, we have access to a plethora of data on the microbiome, warrantying a shift from in-vitro analysis to in-silico analysis. This thesis focuses on challenges in analysing microbial community dynamics through complex, heterogeneous and temporal data. Firstly, we look at the mathematical modelling of microbial community dynamics and inference of microbial interaction networks by analysing longitudinal sequencing data. We look at the problem with the aims of minimising the assumptions involved and improving the accuracy of the inferred interaction networks. Secondly, we explore the temporally dynamic nature of microbial interaction networks. We look at the fallacies of static microbial interaction networks and approaches suitable for modelling temporally dynamic microbial interaction networks. Thirdly, we study multiple temporal microbial datasets from similar environments to understand macro and micro patterns apparent in these communities. We explore the individuality and conformity of microbial communities through visualisation techniques. Finally, we explore the possibility and challenges in representing heterogeneous microbial temporal activity in unique signatures. In summary, in this work, we have explored various aspects of complex, heterogeneous and time-series data through microbial temporal abundance datasets and have enhanced the knowledge about these complex and diverse communities through a pattern recognition approach

    The Unbalanced Classification Problem: Detecting Breaches in Security

    Get PDF
    This research proposes several methods designed to improve solutions for security classification problems. The security classification problem involves unbalanced, high-dimensional, binary classification problems that are prevalent today. The imbalance within this data involves a significant majority of the negative class and a minority positive class. Any system that needs protection from malicious activity, intruders, theft, or other types of breaches in security must address this problem. These breaches in security are considered instances of the positive class. Given numerical data that represent observations or instances which require classification, state of the art machine learning algorithms can be applied. However, the unbalanced and high-dimensional structure of the data must be considered prior to applying these learning methods. High-dimensional data poses a “curse of dimensionality” which can be overcome through the analysis of subspaces. Exploration of intelligent subspace modeling and the fusion of subspace models is proposed. Detailed analysis of the one-class support vector machine, as well as its weaknesses and proposals to overcome these shortcomings are included. A fundamental method for evaluation of the binary classification model is the receiver operating characteristic (ROC) curve and the area under the curve (AUC). This work details the underlying statistics involved with ROC curves, contributing a comprehensive review of ROC curve construction and analysis techniques to include a novel graphic for illustrating the connection between ROC curves and classifier decision values. The major innovations of this work include synergistic classifier fusion through the analysis of ROC curves and rankings, insight into the statistical behavior of the Gaussian kernel, and novel methods for applying machine learning techniques to defend against computer intrusion detection. The primary empirical vehicle for this research is computer intrusion detection data, and both host-based intrusion detection systems (HIDS) and network-based intrusion detection systems (NIDS) are addressed. Empirical studies also include military tactical scenarios

    The transfer and persistence of environmental trace indicators, and methods for digital data acquisition from photographs and micrographs: applications for forensic science research

    Get PDF
    Environmental forms of trace evidence (such as mineral grains, pollen grains, algae, and sediment) can offer valuable insights within forensic casework. An issue facing forensic science as a whole, and these environmental indicators specifically, is a relative dearth of empirical research which would underpin the interpretation of such indicators when attempting forensic reconstruction. This thesis aims to address this lacuna, undertaking experiments to: (1) Explore variables which affect the rates of transfer and persistence, with specific focus upon quartz grains (a terrestrial indicator) and diatom valves (an aquatic indicator) upon footwear materials (a substrate that has been under-represented in past studies); (2) Conduct research into the effects of particle size and morphology upon transfer and persistence; (3) Develop and adapt methodologies to undertake this research. Accordingly, the outputs of this thesis are: (1) The creation of new datasets which could inform the interpretation of these trace indicators within forensic investigations and crime reconstruction scenarios and (2) The development of novel methodologies which could be employed in future research to attempt to accelerate data collection and analysis, without compromising on accuracy. This research is interdisciplinary, combining theory from forensic science, analytical techniques from the environmental sciences, and some elements of image processing and analysis. This research was funded by the Engineering and Physical Sciences Research Council of the United Kingdom through the Security Science Doctoral Training Research Centre (UCL SECReT) based at University College London (EP/G037264/1)
    corecore