3,761 research outputs found

    Investigating the size and value effect in determining performance of Australian listed companies: A neural network approach

    Full text link
    This paper explores the size and value effect in influencing performance of individual companies using backpropagation neural networks. According to existing theory, companies with small market capitalization and high book to market ratios have a tendency to perform better in the future. Data from over 300 Australian Stock Exchange listed companies between 2000-2004 is examined and a neural network is trained to predict company performance based on market capitalization, book to market ratio, beta and standard deviation. Evidence for the value effect was found over longer time periods but there was less for the size effect. Poor company performance was also observed to be correlated with high risk. Β© 2006, Australian Computer Society, Inc

    Static and dynamic selection thresholds governing the accumulation of information in genetic algorithms using ranked populations

    Full text link
    Mutation applied indiscriminately across a population has, on average, a detrimental effect on the accumulation of solution alleles within the population and is usually beneficial only when targeted at individuals with few solution alleles. Many common selection techniques can delete individuals with more solution alleles than are easily recovered by mutation. The paper identifies static and dynamic selection thresholds governing accumulation of information in a genetic algorithm (GA). When individuals are ranked by fitness, there exists a dynamic threshold defined by the solution density of surviving individuals and a lower static threshold defined by the solution density of the information source used for mutation. Replacing individuals ranked below the static threshold with randomly generated individuals avoids the need for mutation while maintaining diversity in the population with a consequent improvement in population fitness. By replacing individuals ranked between the thresholds with randomly selected individuals from above the dynamic threshold, population fitness improves dramatically. We model the dynamic behavior of GAs using these thresholds and demonstrate their effectiveness by simulation and benchmark problems. Β© 2010 by the Massachusetts Institute of Technology

    HDAX: Historical symbolic modelling of delay time series in a communications network

    Full text link
    There are certain performance parameters like packet delay, delay variation (jitter) and loss, which are decision factors for online quality of service (QoS) traffic routing. Although considerable efforts have been placed on the Internet to assure QoS, the dominant TCP/IP - like the best-effort communications policy - does not provide sufficient guarantee without abrupt change in the protocols. Estimation and forecasting end-to-end delay and its variations are essential tasks in network routing management for detecting anomalies. A large amount of research has been done to provide foreknowledge of network anomalies by characterizing and forecasting delay with numerical forecasting methods. However, the methods are time consuming and not efficient for real-time application when dealing with large online datasets. Application is more difficult when the data is missing or not available during online forecasting. Moreover, the time cost in statistical methods for trivial forecasting accuracy is prohibitive. Consequently, many researchers suggest a transition from computing with numbers to the manipulation of perceptions in the form of fuzzy linguistic variables. The current work addresses the issue of defining a delay approximation model for packet switching in communications networks. In particular, we focus on decision-making for smart routing management, which is based on the knowledge provided by data mining (informed) agents. We propose a historical symbolic delay approximation model (HDAX) for delay forecasting. Preliminary experiments with the model show good accuracy in forecasting the delay time-series as well as a reduction in the time cost of the forecasting method. HDAX compares favourably with the competing Autoregressive Moving Average (ARMA) algorithm in terms of execution time and accuracy. Β© 2009, Australian Computer Society, Inc

    Robust simulation of lamprey tracking

    Full text link
    Biologically realistic computer simulation of vertebrates is a challenging problem with exciting applications in computer graphics and robotics. Once the mechanics of locomotion are available it is interesting to mediate this locomotion with higher level behavior such as target tracking. One recent approach simulates a relatively simple vertebrate, the lamprey, using recurrent neural networks to model the central pattern generator of the spine and a physical model for the body. Target tracking behavior has also been implemented for such a model. However, previous approaches suffer from deficiencies where particular orientations of the body to the target cause the central pattern generator to shutdown. This paper describes an approach to making target tracking more robust. Β© Springer-Verlag Berlin Heidelberg 2006

    Using a kernel-based approach to visualize integrated Chronic Fatigue Syndrome datasets

    Full text link
    We describe the use of a kernel-based approach using the Laplacian matrix to visualize an integrated Chronic Fatigue Syndrome dataset comprising symptom and fatigue questionnaire and patient classification data, complete blood evaluation data and patient gene expression profiles. We present visualizations of the individual and integrated datasets with the linear and Gaussian kernel functions. An efficient approach inspired by computational linguistics for constructing a linear kernel matrix for the gene expression data is described. Visualizations of the questionnaire data show a cluster of non-fatigued individuals distinct from those suffering from Chronic Fatigue Syndrome that supports the fact that diagnosis is generally made using this kind of data. Clusters unrelated to patient classes were found in the gene expression data. Structure from the gene expression dataset dominated visualizations of integrated datasets that included gene expression data. Β© 2006, Australian Computer Society, Inc

    Enhancing in silico protein-based vaccine discovery for eukaryotic pathogens using predicted peptide-MHC binding and peptide conservation scores

    Get PDF
    Β© 2014 Goodswen et al. Given thousands of proteins constituting a eukaryotic pathogen, the principal objective for a high-throughput in silico vaccine discovery pipeline is to select those proteins worthy of laboratory validation. Accurate prediction of T-cell epitopes on protein antigens is one crucial piece of evidence that would aid in this selection. Prediction of peptides recognised by T-cell receptors have to date proved to be of insufficient accuracy. The in silico approach is consequently reliant on an indirect method, which involves the prediction of peptides binding to major histocompatibility complex (MHC) molecules. There is no guarantee nevertheless that predicted peptide-MHC complexes will be presented by antigen-presenting cells and/or recognised by cognate T-cell receptors. The aim of this study was to determine if predicted peptide-MHC binding scores could provide contributing evidence to establish a protein's potential as a vaccine. Using T-Cell MHC class I binding prediction tools provided by the Immune Epitope Database and Analysis Resource, peptide binding affinity to 76 common MHC I alleles were predicted for 160 Toxoplasma gondii proteins: 75 taken from published studies represented proteins known or expected to induce T-cell immune responses and 85 considered less likely vaccine candidates. The results show there is no universal set of rules that can be applied directly to binding scores to distinguish a vaccine from a non-vaccine candidate. We present, however, two proposed strategies exploiting binding scores that provide supporting evidence that a protein is likely to induce a T-cell immune response-one using random forest (a machine learning algorithm) with a 72% sensitivity and 82.4% specificity and the other, using amino acid conservation scores with a 74.6% sensitivity and 70.5% specificity when applied to the 160 benchmark proteins. More importantly, the binding score strategies are valuable evidence contributors to the overall in silico vaccine discovery pool of evidence

    Feature selection of imbalanced gene expression microarray data

    Full text link
    Gene expression data is a very complex data set characterised by abundant numbers of features but with a low number of observations. However, only a small number of these features are relevant to an outcome of interest. With this kind of data set, feature selection becomes a real prerequisite. This paper proposes a methodology for feature selection for an imbalanced leukaemia gene expression data based on random forest algorithm. It presents the importance of feature selection in terms of reducing the number of features, enhancing the quality of machine learning and providing better understanding for biologists in diagnosis and prediction. Algorithms are presented to show the methodology and strategy for feature selection taking care to avoid over fitting. Moreover, experiments are done using imbalanced Leukaemia gene expression data and special measurement is used to evaluate the quality of feature selection and performance of classification. Β© 2011 IEEE

    Cellular quantitative analysis of neuroblastoma tumor and splitting overlapping cells

    Get PDF
    Β© 2014 Tafavogh et al.; licensee BioMed Central Ltd. Background: Neuroblastoma Tumor (NT) is one of the most aggressive types of infant cancer. Essential to accurate diagnosis and prognosis is cellular quantitative analysis of the tumor. Counting enormous numbers of cells under an optical microscope is error-prone. There is therefore an urgent demand from pathologists for robust and automated cell counting systems. However, the main challenge in developing these systems is the inability of them to distinguish between overlapping cells and single cells, and to split the overlapping cells. We address this challenge in two stages by: 1) distinguishing overlapping cells from single cells using the morphological differences between them such as area, uniformity of diameters and cell concavity; and 2) splitting overlapping cells into single cells. We propose a novel approach by using the dominant concave regions of cells as markers to identify the overlap region. We then find the initial splitting points at the critical points of the concave regions by decomposing the concave regions into their components such as arcs, chords and edges, and the distance between the components is analyzed using the developed seed growing technique. Lastly, a shortest path determination approach is developed to determine the optimum splitting route between two candidate initial splitting points.Results: We compare the cell counting results of our system with those of a pathologist as the ground-truth. We also compare the system with three state-of-the-art methods, and the results of statistical tests show a significant improvement in the performance of our system compared to state-of-the-art methods. The F-measure obtained by our system is 88.70%. To evaluate the generalizability of our algorithm, we apply it to images of follicular lymphoma, which has similar histological regions to NT. Of the algorithms tested, our algorithm obtains the highest F-measure of 92.79%.Conclusion: We develop a novel overlapping cell splitting algorithm to enhance the cellular quantitative analysis of infant neuroblastoma. The performance of the proposed algorithm promises a reliable automated cell counting system for pathology laboratories. Moreover, the high performance obtained by our algorithm for images of follicular lymphoma demonstrates the generalization of the proposed algorithm for cancers with similar histological regions and histological structures

    Fast simulation of animal locomotion: Lamprey swimming

    Get PDF
    Β© 2006 by International Federation for Information Processing. All rights reserved. Biologically realistic computer simulation of vertebrate locomotion is an interesting and challenging problem with applications in computer graphics and robotics. One current approach simulates a relatively simple vertebrate, the lamprey, using recurrent neural networks for the spine and a physical model for the body. The model is realized as a system of differential equations. The drawback with this approach is the slow speed of simulation. This paper describes two approaches to speeding up simulation of lamprey locomotion without sacrificing too much biological realism: (i) use of superior numerical integration algorithms and (ii) simplifications to the neural architecture of the lamprey

    Evaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques

    Get PDF
    Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen's genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers. Β© 2012 Goodswen et al
    • …
    corecore