555 research outputs found

    Rapid acquisition of long spatial sequences in long-term memory

    Get PDF
    Learning complex movement sequences requires an active, attentional selection of the content that is learned. The selection mechanism can not be investigated in classical stimulus-guided sequence learning paradigms because it requires a movement sequence production that is not triggered by external stimuli. In deferred imitation learning the whole stimulus sequence is presented and reproduction is started only after the presentation has ended. In order to investigate how the selective control of the learning process proceeds in natural learning situations and to investigate all influencing parameters we developed a new paradigm in which long sequences were learned by deferred imitation learning. In this task a long sequence of stimuli was presented on a graphic tablet and reproduced by manual pointing after the stimulus presentation was finished. Since the sequence exceeded the capacity of working memory because of its length it had to be reproduced and learned in several trials. Therefore, an attentional selection was required during learning. In our first study a method for evaluating reproduction performance in the new learning paradigm was developed. The assignment of reproductions to target positions posed a major methodological difficulty. This problem was solved by introducing an assignment algorithm that takes the order of reproduction into account. The algorithm was explained, it was further compared to an algorithm that performs a nearest neighbor assignment and finally validated by a comparison to a human operator assignment. The results showed that the assignment algorithm is an appropriate method for analyzing long sequences of pointing movements and is suitable for evaluating reproduction performance and learning progress in deferred imitation learning of long sequences. In the second study we investigated further how long sequences of pointing movements are acquired. Long-term retention tests showed that the sequences were retained for at least two weeks in long-term memory. A transfer test showed that the sequences were represented in an effector independent representation. The distributions of pointing positions were analyzed in detail in order to characterize the control signal of the pointing movements. The analysis showed that position errors to successive target positions were not dependent on the movement direction and further, that directional error did not propagate to reproductions of successive target positions. These results suggest that end points rather than movement trajectories are memorized in this learning task. Our third study evaluated the organization and reorganization of the sequence representation in memory. The change in sequence reproduction without intermediate presentations showed that the remembered target positions drifted away from the initial representation, where the target drift saturated after about 5 trials. The analysis of the drift direction of representations of single target positions showed that there was no systematic drift direction for single subjects. Further it indicated that the representation did not drift to similar, but to different patterns across subjects. In order to investigate whether sequences are encoded in chunks or as single target positions we performed an experiment in which two target positions in a well learned sequence were exchanged. We analyzed the effect of the target exchange on target positions neighboring the exchanged target position. The target exchange effected neither the position nor the variance of neighboring memorized target positions. These results support the view that single target positions rather than chunks of target positions are memorized. Thus our study suggests that the sequence acquisition is guided by an active selection process which is able to quickly acquire abstract movement plans. Our findings further support the view that these movement plans are represented as strings of independent, absolute target positions

    Copynumber: Efficient algorithms for single- and multi-track copy number segmentation.

    Get PDF
    BACKGROUND: Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number. RESULTS: A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented. CONCLUSIONS: The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Predicting Flavonoid UGT Regioselectivity with Graphical Residue Models and Machine Learning.

    Get PDF
    Machine learning is applied to a challenging and biologically significant protein classification problem: the prediction of flavonoid UGT acceptor regioselectivity from primary protein sequence. Novel indices characterizing graphical models of protein residues are introduced. The indices are compared with existing amino acid indices and found to cluster residues appropriately. A variety of models employing the indices are then investigated by examining their performance when analyzed using nearest neighbor, support vector machine, and Bayesian neural network classifiers. Improvements over nearest neighbor classifications relying on standard alignment similarity scores are reported

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Human Fatigue Predictions in Complex Aviation Crew Operational Impact Conditions

    Get PDF
    In this last decade, several regulatory frameworks across the world in all modes of transportation had brought fatigue and its risk management in operations to the forefront. Of all transportation modes air travel has been the safest means of transportation. Still as part of continuous improvement efforts, regulators are insisting the operators to adopt strong fatigue science and its foundational principles to reinforce safety risk assessment and management. Fatigue risk management is a data driven system that finds a realistic balance between safety and productivity in an organization. This work discusses the effects of mathematical modeling of fatigue and its quantification in the context of fatigue risk management for complex global logistics operations. A new concept called Duty DNA is designed within the system that helps to predict and forecast sleep, duty deformations and fatigue. The need for a robust structure of elements to house the components to measure and manage fatigue risk in operations is also debated. By operating on the principles of fatigue management, new science-based predictive, proactive and reactive approaches were designed for an industry leading fatigue risk management program Accurately predicting sleep is very critical to predicting fatigue and alertness. Mathematical models are being developed to track the biological processes quantitatively and predicting temporal profile of fatigue given a person’s sleep history, planned work schedule including night and day exposure. As these models are being continuously worked to improve, a new limited deep learning machine learning based approach is attempted to predict fatigue for a duty in isolation without knowing much of work schedule history. The model within also predicts the duty disruptions and predicted fatigue at the end state of duty

    Eddy current defect response analysis using sum of Gaussian methods

    Get PDF
    This dissertation is a study of methods to automatedly detect and produce approximations of eddy current differential coil defect signatures in terms of a summed collection of Gaussian functions (SoG). Datasets consisting of varying material, defect size, inspection frequency, and coil diameter were investigated. Dimensionally reduced representations of the defect responses were obtained utilizing common existing reduction methods and novel enhancements to them utilizing SoG Representations. Efficacy of the SoG enhanced representations were studied utilizing common Machine Learning (ML) interpretable classifier designs with the SoG representations indicating significant improvement of common analysis metrics

    Pseudorandom sequence generation using binary cellular automata

    Get PDF
    Tezin basılısı İstanbul Şehir Üniversitesi Kütüphanesi'ndedir.Random numbers are an integral part of many applications from computer simulations, gaming, security protocols to the practices of applied mathematics and physics. As randomness plays more critical roles, cheap and fast generation methods are becoming a point of interest for both scientific and technological use. Cellular Automata (CA) is a class of functions which attracts attention mostly due to the potential it holds in modeling complex phenomena in nature along with its discreteness and simplicity. Several studies are available in the literature expressing its potentiality for generating randomness and presenting its advantages over commonly used random number generators. Most of the researches in the CA field focus on one-dimensional 3-input CA rules. In this study, we perform an exhaustive search over the set of 5-input CA to find out the rules with high randomness quality. As the measure of quality, the outcomes of NIST Statistical Test Suite are used. Since the set of 5-input CA rules is very large (including more than 4.2 billions of rules), they are eliminated by discarding poor-quality rules before testing. In the literature, generally entropy is used as the elimination criterion, but we preferred mutual information. The main motive behind that choice is to find out a metric for elimination which is directly computed on the truth table of the CA rule instead of the generated sequence. As the test results collected on 3- and 4-input CA indicate, all rules with very good statistical performance have zero mutual information. By exploiting this observation, we limit the set to be tested to the rules with zero mutual information. The reasons and consequences of this choice are discussed. In total, more than 248 millions of rules are tested. Among them, 120 rules show out- standing performance with all attempted neighborhood schemes. Along with these tests, one of them is subjected to a more detailed testing and test results are included. Keywords: Cellular Automata, Pseudorandom Number Generators, Randomness TestsContents Declaration of Authorship ii Abstract iii Öz iv Acknowledgments v List of Figures ix List of Tables x 1 Introduction 1 2 Random Number Sequences 4 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Theoretical Approaches to Randomness . . . . . . . . . . . . . . . . . . . 5 2.2.1 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.2 Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.3 Computability Theory . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Random Number Generator Classification . . . . . . . . . . . . . . . . . . 7 2.3.1 Physical TRNGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 Non-Physical TRNGs . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.3 Pseudorandom Number Generators . . . . . . . . . . . . . . . . . . 10 2.3.3.1 Generic Design of Pseudorandom Number Generators . . 10 2.3.3.2 Cryptographically Secure Pseudorandom Number Gener- ators . . . . . . . . . . . . . .11 2.3.4 Hybrid Random Number Generators . . . . . . . . . . . . . . . . . 13 2.4 A Comparison between True and Pseudo RNGs . . . . . . . . . . . . . . . 14 2.5 General Requirements on Random Number Sequences . . . . . . . . . . . 14 2.6 Evaluation Criteria of PRNGs . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Statistical Test Suites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.8 NIST Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.8.1 Hypothetical Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.8.2 Tests in NIST Test Suite . . . . . . . . . . . . . . . . . . . . . . . . 20 2.8.2.1 Frequency Test . . . . . . . . . . . . . . . . . . . . . . . . 20 2.8.2.2 Block Frequency Test . . . . . . . . . . . . . . . . . . . . 20 2.8.2.3 Runs Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.8.2.4 Longest Run of Ones in a Block . . . . . . . . . . . . . . 21 2.8.2.5 Binary Matrix Rank Test . . . . . . . . . . . . . . . . . . 21 2.8.2.6 Spectral Test . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.8.2.7 Non-overlapping Template Matching Test . . . . . . . . . 22 2.8.2.8 Overlapping Template Matching Test . . . . . . . . . . . 22 2.8.2.9 Universal Statistical Test . . . . . . . . . . . . . . . . . . 23 2.8.2.10 Linear Complexity Test . . . . . . . . . . . . . . . . . . . 23 2.8.2.11 Serial Test . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.8.2.12 Approximate Entropy Test . . . . . . . . . . . . . . . . . 24 2.8.2.13 Cumulative Sums Test . . . . . . . . . . . . . . . . . . . . 24 2.8.2.14 Random Excursions Test . . . . . . . . . . . . . . . . . . 24 2.8.2.15 Random Excursions Variant Test . . . . . . . . . . . . . . 25 3 Cellular Automata 26 3.1 History of Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . .26 3.1.1 von Neumann’s Work . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.2 Conway’s Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.3 Wolfram’s Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Cellular Automata and the Definitive Parameters . . . . . . . . . . . . . . 31 3.2.1 Lattice Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.2 Cell Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.3 Guiding Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.4 Neighborhood Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 A Formal Definition of Cellular Automata . . . . . . . . . . . . . . . . . . 37 3.4 Elementary Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Rule Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6 Producing Randomness via Cellular Automata . . . . . . . . . . . . . . . 42 3.6.1 CA-Based PRNGs . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.6.2 Balancedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Test Results 47 4.1 Output of a Statistical Test . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Testing Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3 Interpretation of the Test Results . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Rate of success over all trials . . . . . . . . . . . . . . . . . . . . . 49 4.3.2 Distribution of P-values . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4 Testing over a big space of functions . . . . . . . . . . . . . . . . . . . . . 50 4.5 Our Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.6 Results and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.6.1 Change in State Width . . . . . . . . . . . . . . . . . . . . . . . . 53 4.6.2 Change in Neighborhood Scheme . . . . . . . . . . . . . . . . . . . 53 4.6.3 Entropy vs. Statistical Quality . . . . . . . . . . . . . . . . . . . . 58 4.6.4 Mutual Information vs. Statistical Quality . . . . . . . . . . . . . . 60 4.6.5 Entropy vs. Mutual Information . . . . . . . . . . . . . . . . . . . 62 4.6.6 Overall Test Results of 4- and 5-input CA . . . . . . . . . . . . . . 6 4.7 The simplest rule: 1435932310 . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 Conclusion 74 A Test Results for Rule 30 and Rule 45 77 B 120 Rules with their Shortest Boolean Formulae 80 Bibliograph

    Using Sequence Mining Techniques for Understanding Incorrect Behavioral Patterns on Interactive Tasks

    Get PDF
    Interactive tasks designed to elicit real-life problem-solving behavior are rapidly becoming more widely used in educational assessment. Incorrect responses to such tasks can occur for a variety of different reasons such as low proficiency levels, low metacognitive strategies, or motivational issues. We demonstrate how behavioral patterns associated with incorrect responses can, in part, be understood, supporting insights into the different sources of failure on a task. To this end, we make use of sequence mining techniques that leverage the information contained in time-stamped action sequences commonly logged in assessments with interactive tasks for (a) investigating what distinguishes incorrect behavioral patterns from correct ones and (b) identifying subgroups of examinees with similar incorrect behavioral patterns. Analyzing a task from the Programme for the International Assessment of Adult Competencies 2012 assessment, we find incorrect behavioral patterns to be more heterogeneous than correct ones. We identify multiple subgroups of incorrect behavioral patterns, which point toward different levels of effort and lack of different subskills needed for solving the task. Albeit focusing on a single task, meaningful patterns of major differences in how examinees approach a given task that generalize across multiple tasks are uncovered. Implications for the construction and analysis of interactive tasks as well as the design of interventions for complex problem-solving skills are derived

    Methods for Transcriptome Assembly in the Allopolyploid Brassica napus

    Get PDF
    Canada is the world’s largest producer of canola and the trend of production is ever increasing with an annual growth rate of 9.38% according to FAOSTAT. In 2017, canola acreage surpassed wheat in Saskatchewan, the highest producer of both crops in Canada. Country-wide, the total farming area of canola increased by 9.9% to 22.4 million acres while wheat area saw a slight decline to 23.3 million acres. While Canada is the highest producer of the crop, yields are lower than other countries. To maximize the benefit of this market, canola cultivation could be made more efficient with further characterization of the organism’s genes and their involvement in plant robustness. Such studies using transcriptome analysis have been successful in organisms with relatively small and simple genomes. However, such analyses in B. napus are complicated by the allopolyploid genome structure resulting from ancestral whole genome duplications in the species’ evolutionary history. Homeologous gene pairs originating from the orthology between the two B. napus progenitor species complicate the process of transcriptome assembly. Modern assemblers: Trinity, Oases and SOAPdenovo-Trans were used to generate several de novo transcriptome assemblies for B. napus. A variety of metrics were used to determine the impact that the complex genome structure has on transcriptome studies. In particular, the most important questions for transcriptome assembly in B. napus were how does varying the k-mer parameter effect assembly quality, and to what extent do similar genes resulting from homeology within B. napus complicate the process of assembly. These metrics used for evaluating the assemblies include basic assembly statistics such as the number of contigs and contig lengths (via N25, N50 and N75 statistics); as well as more involved investigation via comparison to annotated coding DNA sequences; evaluation softwares scores for de novo transcriptome assemblies and finally; quantification of homeolog differentiation by alignment to previously identified pairs of homeologous genes. These metrics provided a picture of the trade-offs between assembly softwares and the k-parameter determining the length of subsequences used to build de Bruijn graphs for de novo transcriptome assembly. It was shown that shorter k-mer lengths produce fewer, and more complete contigs due to the shorter required overlap between read sequences; while longer k-mer lengths increase the sensitivity of an assembler to sequence variation between similar gene sequences. The Trinity assembler outperformed Oases and SOAPdenovo-Trans when considering the total breadth of evaluation metrics, generating longer transcripts with fewer chimers between homeologous gene pairs
    corecore