2,104 research outputs found

    DISCOVERING INTERESTING PATTERNS FOR INVESTMENT DECISION MAKING WITH GLOWER C - A GENETIC LEARNER OVERLAID WITH ENTROPY REDUCTION

    Get PDF
    Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open-ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify and use one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment withGLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.Information Systems Working Papers Serie

    Ovarian Transcriptomic Analyses in the Urban Human Health Pest, the Western Black Widow Spider

    Get PDF
    Due to their abundance and ability to invade diverse environments, many arthropods have become pests of economic and health concern, especially in urban areas. Transcriptomic analyses of arthropod ovaries have provided insight into life history variation and fecundity, yet there are few studies in spiders despite their diversity within arthropods. Here, we generated a de novo ovarian transcriptome from 10 individuals of the western black widow spider (Latrodectus hesperus), a human health pest of high abundance in urban areas, to conduct comparative ovarian transcriptomic analyses. Biological processes enriched for metabolism—specifically purine, and thiamine metabolic pathways linked to oocyte development—were significantly abundant in L. hesperus. Functional and pathway annotations revealed overlap among diverse arachnid ovarian transcriptomes for highly-conserved genes and those linked to fecundity, such as oocyte maturation in vitellogenin and vitelline membrane outer layer proteins, hormones, and hormone receptors required for ovary development, and regulation of fertility-related genes. Comparative studies across arachnids are greatly needed to understand the evolutionary similarities of the spider ovary, and here, the identification of ovarian proteins in L. hesperus provides potential for understanding how increased fecundity is linked to the success of this urban pest

    DISCOVERING INTERESTING PATTERNS FOR INVESTMENT DECISION MAKING WITH GLOWER C - A GENETIC LEARNER OVERLAID WITH ENTROPY REDUCTION

    Get PDF
    Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open-ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify and use one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment withGLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.Information Systems Working Papers Serie

    Temporospatial Context-Aware Vehicular Crash Risk Prediction

    Get PDF
    With the demand for more vehicles increasing, road safety is becoming a growing concern. Traffic collisions take many lives and cost billions of dollars in losses. This explains the growing interest of governments, academic institutions and companies in road safety. The vastness and availability of road accident data has provided new opportunities for gaining a better understanding of accident risk factors and for developing more effective accident prediction and prevention regimes. Much of the empirical research on road safety and accident analysis utilizes statistical models which capture limited aspects of crashes. On the other hand, data mining has recently gained interest as a reliable approach for investigating road-accident data and for providing predictive insights. While some risk factors contribute more frequently in the occurrence of a road accident, the importance of driver behavior, temporospatial factors, and real-time traffic dynamics have been underestimated. This study proposes a framework for predicting crash risk based on historical accident data. The proposed framework incorporates machine learning and data analytics techniques to identify driving patterns and other risk factors associated with potential vehicle crashes. These techniques include clustering, association rule mining, information fusion, and Bayesian networks. Swarm intelligence based association rule mining is employed to uncover the underlying relationships and dependencies in collision databases. Data segmentation methods are employed to eliminate the effect of dependent variables. Extracted rules can be used along with real-time mobility to predict crashes and their severity in real-time. The national collision database of Canada (NCDB) is used in this research to generate association rules with crash risk oriented subsequents, and to compare the performance of the swarm intelligence based approach with that of other association rule miners. Many industry-demanding datasets, including road-accident datasets, are deficient in descriptive factors. This is a significant barrier for uncovering meaningful risk factor relationships. To resolve this issue, this study proposes a knwoledgebase approximation framework to enhance the crash risk analysis by integrating pieces of evidence discovered from disparate datasets capturing different aspects of mobility. Dempster-Shafer theory is utilized as a key element of this knowledgebase approximation. This method can integrate association rules with acceptable accuracy under certain circumstances that are discussed in this thesis. The proposed framework is tested on the lymphography dataset and the road-accident database of the Great Britain. The derived insights are then used as the basis for constructing a Bayesian network that can estimate crash likelihood and risk levels so as to warn drivers and prevent accidents in real-time. This Bayesian network approach offers a way to implement a naturalistic driving analysis process for predicting traffic collision risk based on the findings from the data-driven model. A traffic incident detection and localization method is also proposed as a component of the risk analysis model. Detecting and localizing traffic incidents enables timely response to accidents and facilitates effective and efficient traffic flow management. The results obtained from the experimental work conducted on this component is indicative of the capability of our Dempster-Shafer data-fusion-based incident detection method in overcoming the challenges arising from erroneous and noisy sensor readings

    Resolving Structure in Human Brain Organization: Identifying Mesoscale Organization in Weighted Network Representations

    Full text link
    Human brain anatomy and function display a combination of modular and hierarchical organization, suggesting the importance of both cohesive structures and variable resolutions in the facilitation of healthy cognitive processes. However, tools to simultaneously probe these features of brain architecture require further development. We propose and apply a set of methods to extract cohesive structures in network representations of brain connectivity using multi-resolution techniques. We employ a combination of soft thresholding, windowed thresholding, and resolution in community detection, that enable us to identify and isolate structures associated with different weights. One such mesoscale structure is bipartivity, which quantifies the extent to which the brain is divided into two partitions with high connectivity between partitions and low connectivity within partitions. A second, complementary mesoscale structure is modularity, which quantifies the extent to which the brain is divided into multiple communities with strong connectivity within each community and weak connectivity between communities. Our methods lead to multi-resolution curves of these network diagnostics over a range of spatial, geometric, and structural scales. For statistical comparison, we contrast our results with those obtained for several benchmark null models. Our work demonstrates that multi-resolution diagnostic curves capture complex organizational profiles in weighted graphs. We apply these methods to the identification of resolution-specific characteristics of healthy weighted graph architecture and altered connectivity profiles in psychiatric disease.Comment: Comments welcom

    Genome-Wide Association Analysis of Oxidative Stress Resistance in Drosophila melanogaster

    Get PDF
    Background: Aerobic organisms are susceptible to damage by reactive oxygen species. Oxidative stress resistance is a quantitative trait with population variation attributable to the interplay between genetic and environmental factors. Drosophila melanogaster provides an ideal system to study the genetics of variation for resistance to oxidative stress. Methods and Findings: We used 167 wild-derived inbred lines of the Drosophila Genetic Reference Panel for a genomewide association study of acute oxidative stress resistance to two oxidizing agents, paraquat and menadione sodium bisulfite. We found significant genetic variation for both stressors. Single nucleotide polymorphisms (SNPs) associated with variation in oxidative stress resistance were often sex-specific and agent-dependent, with a small subset common for both sexes or treatments. Associated SNPs had moderately large effects, with an inverse relationship between effect size and allele frequency. Linear models with up to 12 SNPs explained 67–79 % and 56–66 % of the phenotypic variance for resistance to paraquat and menadione sodium bisulfite, respectively. Many genes implicated were novel with no known role in oxidative stress resistance. Bioinformatics analyses revealed a cellular network comprising DNA metabolism and neuronal development, consistent with targets of oxidative stress-inducing agents. We confirmed associations of seven candidate genes associated with natural variation in oxidative stress resistance through mutational analysis. Conclusions: We identified novel candidate genes associated with variation in resistance to oxidative stress that hav

    Genomic, Evolutionary and Functional Analyses of Diapause in Drosophila Melanogaster

    Get PDF
    Understanding the genetic basis of adaptation has been and remains to be one major goal of ecological and evolutionary genetics. The variation in diapause propensity in the model organism Drosophila melanogaster represents different life-history strategies underlying adaptation to regular and widespread environmental heterogeneity, thus provides an ideal model to study the genetic control of ecologically important complex phenotype. This work employs global genomic and transcriptomic approaches to identify genetic polymorphisms co-segregating with diapause propensity, as well as genes that are differentially regulated at the transcriptional level as a function of the diapause phenotype. I show that genetic polymorphisms co-segregating with diapause propensity are found throughout all major chromosomes, demonstrating that diapause is a multi-genic trait. I show that diapause in D. melanogaster is an actively regulated phenotype at the transcriptional level, suggesting that diapause is not a simple physiological or reproductive quiescence. I also demonstrate that genetic polymorphisms co-segregating with diapause propensity, as well as genes differentially expressed as a function of diapause are enriched for clinally varying and seasonal oscillating SNPs, supporting the hypothesis that natural variation in diapause propensity underlies adaptation to spatially and temporally varying selective pressures. In addition to global genomic and transcriptomic screens, I also performed functional analysis of one candidate polymorphism on the gene Crystalllin, which represents an intersection of multiple global screens related to seasonal adaptation. I show that this polymorphism affects patterns of gene expression and a subset of fitness-related phenotypes including diapause, in an environment-specific manner. Taken together, this work provide a holistic view of the genetic basis of a complex trait underlying climatic adaptation in wild populations of D. melanogaster, linking genetic polymorphism, gene regulation, organismal phenotype, population dynamics and environmental parameters
    • …
    corecore