321,280 research outputs found

    Unleashing the power of meta-threading for evolution/structure-based function inference of proteins

    Get PDF
    Protein threading is widely used in the prediction of protein structure and the subsequent functional annotation. Most threading approaches employ similar criteria for the template identification for use in both protein structure and function modeling. Using structure similarity alone might result in a high false positive rate in protein function inference, which suggests that selecting functional templates should be subject to a different set of constraints. In this study, we extend the functionality of eThread, a recently developed approach to meta-threading, focusing on the optimal selection of functional templates. We optimized the selection of template proteins to cover a broad spectrum of protein molecular function: ligand, metal, inorganic cluster, protein, and nucleic acid binding. In large-scale benchmarks, we demonstrate that the recognition rates in identifying templates that bind molecular partners in similar locations are very high, typically 70-80%, at the expense of a relatively low false positive rate. eThread also provides useful insights into the chemical properties of binding molecules and the structural features of binding. For instance, the sensitivity in recognizing similar protein-binding interfaces is 58% at only 18% false positive rate. Furthermore, in comparative analysis, we demonstrate that meta-threading supported by machine learning outperforms single-threading approaches in functional template selection. We show that meta-threading effectively detects many facets of protein molecular function, even in a low-sequence identity regime. The enhanced version of eThread is freely available as a webserver and stand-alone software at http://www.brylinski.org/ethread. © 2013 Brylinski

    Neural Correlates of Effective Learning in Experienced Medical Decision-Makers

    Get PDF
    Accurate associative learning is often hindered by confirmation bias and success-chasing, which together can conspire to produce or solidify false beliefs in the decision-maker. We performed functional magnetic resonance imaging in 35 experienced physicians, while they learned to choose between two treatments in a series of virtual patient encounters. We estimated a learning model for each subject based on their observed behavior and this model divided clearly into high performers and low performers. The high performers showed small, but equal learning rates for both successes (positive outcomes) and failures (no response to the drug). In contrast, low performers showed very large and asymmetric learning rates, learning significantly more from successes than failures; a tendency that led to sub-optimal treatment choices. Consistently with these behavioral findings, high performers showed larger, more sustained BOLD responses to failed vs. successful outcomes in the dorsolateral prefrontal cortex and inferior parietal lobule while low performers displayed the opposite response profile. Furthermore, participants' learning asymmetry correlated with anticipatory activation in the nucleus accumbens at trial onset, well before outcome presentation. Subjects with anticipatory activation in the nucleus accumbens showed more success-chasing during learning. These results suggest that high performers' brains achieve better outcomes by attending to informative failures during training, rather than chasing the reward value of successes. The differential brain activations between high and low performers could potentially be developed into biomarkers to identify efficient learners on novel decision tasks, in medical or other contexts

    Malware Target Recognition via Static Heuristics

    Get PDF
    Organizations increasingly rely on the confidentiality, integrity and availability of their information and communications technologies to conduct effective business operations while maintaining their competitive edge. Exploitation of these networks via the introduction of undetected malware ultimately degrades their competitive edge, while taking advantage of limited network visibility and the high cost of analyzing massive numbers of programs. This article introduces the novel Malware Target Recognition (MaTR) system which combines the decision tree machine learning algorithm with static heuristic features for malware detection. By focusing on contextually important static heuristic features, this research demonstrates superior detection results. Experimental results on large sample datasets demonstrate near ideal malware detection performance (99.9+% accuracy) with low false positive (8.73e-4) and false negative rates (8.03e-4) at the same point on the performance curve. Test results against a set of publicly unknown malware, including potential advanced competitor tools, show MaTR’s superior detection rate (99%) versus the union of detections from three commercial antivirus products (60%). The resulting model is a fine granularity sensor with potential to dramatically augment cyberspace situation awareness

    Cognitive and Psychiatric Predictors to Psychosis in Velocardiofacial Syndrome: A 3-Year Follow-Up Study

    Get PDF
    Objective: To predict prodromal psychosis in adolescents with velocardiofacial syndrome (VCFS). Method: 70 youth with VCFS, 27 siblings of youth with VCFS and 25 community controls were followed from childhood (Mean age = 11.8 years) into mid-adolescence (mean age 15.0 years). Psychological tests measuring intelligence, academic achievement, learning/memory, attention and executive functioning as well as measures of parent and clinician ratings of child psychiatric functioning were completed at both time point. Results: Major depressive disorder, oppositional defiant disorder, and generalized anxiety disorder diagnoses increased in the VCFS sample. With very low false positive rates, the best predictor of adolescent prodromal psychotic symptoms was parent ratings of childhood odd/eccentric symptoms and child performance on a measure of executive functioning, the Wisconsin Card Sorting Test. Conclusions: Similar to the non-VCFS prodromal psychosis literature, a combination of cognitive and psychiatric variables appears to predict psychosis in adolescence. A child with VCFS who screens positive is noteworthy and demands clinical attention

    Real-bogus classification for the Zwicky Transient Facility using deep learning

    Get PDF
    Efficient automated detection of flux-transient, re-occurring flux-variable, and moving objects is increasingly important for large-scale astronomical surveys. We present BRAAI, a convolutional-neural-network, deep-learning real/bogus classifier designed to separate genuine astrophysical events and objects from false positive, or bogus, detections in the data of the Zwicky Transient Facility (ZTF), a new robotic time-domain survey currently in operation at the Palomar Observatory in California, USA. BRAAI demonstrates a state-of-the-art performance as quantified by its low false negative and false positive rates. We describe the open-source software tools used internally at Caltech to archive and access ZTF’s alerts and light curves (KOWALSKI ), and to label the data (ZWICKYVERSE). We also report the initial results of the classifier deployment on the Edge Tensor Processing Units that show comparable performance in terms of accuracy, but in a much more (cost-) efficient manner, which has significant implications for current and future surveys

    Financial Fraud Detection and Data Mining of Imbalanced Databases using State Space Machine Learning

    Get PDF
    Risky decisions made by humans exhibit characteristics common to each decision. The related systems experience repeated abuse by risky humans and their actions collude to form a systemic behavioural set. Financial fraud is an example of such risky behaviour. Fraud detection models have drawn attention since the financial crisis of 2008 because of their frequency, size and technological advances leading to financial market manipulation. Statistical methods dominate industrial fraud detection systems at banks, insurance companies and financial marketplaces. Most efforts thus far have focused on anomaly detection problems and simple rules in the academic literature and industrial setting. There are unsolved issues in modeling the behaviour of risky agents in real-world financial markets using machine learning. This research studies the challenges posed by fraud detection, including the problem of imbalanced class distributions, and investigates the use of Reinforcement Learning (RL) to model risky human behaviour. Models have been developed to transform the relevant financial data into a state-space system. Reinforcement Learning agents uncover the decision-making processes by risky humans and derive an optimal path of behaviour at the end of the learning process. States are weighted by risk and then classified as positive (risky) or negative (not-risky). The positive samples are composed of features that represent the hidden information underlying the risky behaviour. Reinforcement Learning is implemented as unsupervised and supervised models. The unsupervised learning agent searches for risky behaviour without any previous knowledge of the data; it is not “trained” on data with true class labels. Instead, the RL learner relates samples through experience. The supervised learner is trained on a proportion (e.g. 90%) of the data with class labels. It derives a policy of optimal actions to be taken at each state during the training stage. One policy is selected from several learning agents and then the model is exposed to the other proportion (e.g. 10%) of data for classification. RL is hybridized with a Hidden Markov Model (HMM) in the supervised learning model to impose a probabilistic framework around the risky agent’s behaviour. We first study an insider trading example to demonstrate how learning algorithms can mimic risky agents. The classification power of the model is further demonstrated by applying it to a real-world based database for debit card transaction fraud. We then apply the models to two problems found in Statistics Canada databases: heart disease detection and female labour force participation. All models are evaluated using appropriate measures for imbalanced class problems: “sensitivity” and “false positive”. Sensitivity measures the number of correctly classified positive samples (e.g. fraud) as a proportion of all positive samples in the data. False positive counts the number of negative samples classified positive as a proportion of all negative samples in the data. The intent is to maximize sensitivity and minimize the false positive rate. All models show high sensitivity rates while exhibiting low false positive rates. These two metrics are ideal for industrial implementation because of high levels of identification at a low cost. Fraud detection rate is the focus with detection rates of 75-85% proving that RL is a superior method for data mining of imbalanced databases. By solving the problem of hidden information, this research can facilitate the detection of risky human behaviour and prevent it from happening

    The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning

    Full text link
    The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last several years, three formal definitions of fairness have gained prominence: (1) anti-classification, meaning that protected attributes---like race, gender, and their proxies---are not explicitly used to make decisions; (2) classification parity, meaning that common measures of predictive performance (e.g., false positive and false negative rates) are equal across groups defined by the protected attributes; and (3) calibration, meaning that conditional on risk estimates, outcomes are independent of protected attributes. Here we show that all three of these fairness definitions suffer from significant statistical limitations. Requiring anti-classification or classification parity can, perversely, harm the very groups they were designed to protect; and calibration, though generally desirable, provides little guarantee that decisions are equitable. In contrast to these formal fairness criteria, we argue that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce. Such a strategy, while not universally applicable, often aligns well with policy objectives; notably, this strategy will typically violate both anti-classification and classification parity. In practice, it requires significant effort to construct suitable risk estimates. One must carefully define and measure the targets of prediction to avoid retrenching biases in the data. But, importantly, one cannot generally address these difficulties by requiring that algorithms satisfy popular mathematical formalizations of fairness. By highlighting these challenges in the foundation of fair machine learning, we hope to help researchers and practitioners productively advance the area
    • …
    corecore