478 research outputs found

    Human pol II promoter prediction: time series descriptors and machine learning

    Get PDF
    Although several in silico promoter prediction methods have been developed to date, they are still limited in predictive performance. The limitations are due to the challenge of selecting appropriate features of promoters that distinguish them from non-promoters and the generalization or predictive ability of the machine-learning algorithms. In this paper we attempt to define a novel approach by using unique descriptors and machine-learning methods for the recognition of eukaryotic polymerase II promoters. In this study, non-linear time series descriptors along with non-linear machine-learning algorithms, such as support vector machine (SVM), are used to discriminate between promoter and non-promoter regions. The basic idea here is to use descriptors that do not depend on the primary DNA sequence and provide a clear distinction between promoter and non-promoter regions. The classification model built on a set of 1000 promoter and 1500 non-promoter sequences, showed a 10-fold cross-validation accuracy of 87% and an independent test set had an accuracy >85% in both promoter and non-promoter identification. This approach correctly identified all 20 experimentally verified promoters of human chromosome 22. The high sensitivity and selectivity indicates that n-mer frequencies along with non-linear time series descriptors, such as Lyapunov component stability and Tsallis entropy, and supervised machine-learning methods, such as SVMs, can be useful in the identification of pol II promoters

    A Novel Method for Intelligent Single Fault Detection of Bearings Using SAE and Improved D–S Evidence Theory

    Get PDF
    In order to realize single fault detection (SFD) from the multi-fault coupling bearing data and further research on the multi-fault situation of bearings, this paper proposes a method based on features self-extraction of a Sparse Auto-Encoder (SAE) and results fusion of improved Dempster–Shafer evidence theory (D–S). Multi-fault signal compression features of bearings were extracted by SAE on multiple vibration sensors’ data. Data sets were constructed by the extracted compression features to train the Support Vector Machine (SVM) according to the rule of single fault detection (R-SFD) this paper proposed. Fault detection results were obtained by the improved D–S evidence theory, which was implemented via correcting the 0 factor in the Basic Probability Assignment (BPA) and modifying the evidence weight by Pearson Correlation Coefficient (PCC). Extensive evaluations of the proposed method on the experiment platform datasets showed that the proposed method could realize single fault detection from multi-fault bearings. Fault detection accuracy increases as the output feature dimension of SAE increases; when the feature dimension reached 200, the average detection accuracy of the three sensors for bearing inner, outer, and ball faults achieved 87.36%, 87.86% and 84.46%, respectively. The three types’ fault detection accuracy—reached to 99.12%, 99.33% and 98.46% by the improved Dempster–Shafer evidence theory (IDS) to fuse the sensors’ results—is respectively 0.38%, 2.06% and 0.76% higher than the traditional D–S evidence theory. That indicated the effectiveness of improving the D–S evidence theory by evidence weight calculation of PCC

    An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

    Full text link
    We study the problem of learning to rank from pairwise preferences, and solve a long-standing open problem that has led to development of many heuristics but no provable results for our particular problem. Given a set VV of nn elements, we wish to linearly order them given pairwise preference labels. A pairwise preference label is obtained as a response, typically from a human, to the question "which if preferred, u or v?fortwoelements for two elements u,v\in V.Weassumepossiblenon−transitivityparadoxeswhichmayarisenaturallyduetohumanmistakesorirrationality.Thegoalistolinearlyordertheelementsfromthemostpreferredtotheleastpreferred,whiledisagreeingwithasfewpairwisepreferencelabelsaspossible.Ourperformanceismeasuredbytwoparameters:Thelossandthequerycomplexity(numberofpairwisepreferencelabelsweobtain).Thisisatypicallearningproblem,withtheexceptionthatthespacefromwhichthepairwisepreferencesisdrawnisfinite,consistingof. We assume possible non-transitivity paradoxes which may arise naturally due to human mistakes or irrationality. The goal is to linearly order the elements from the most preferred to the least preferred, while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The loss and the query complexity (number of pairwise preference labels we obtain). This is a typical learning problem, with the exception that the space from which the pairwise preferences is drawn is finite, consisting of {n\choose 2}$ possibilities only. We present an active learning algorithm for this problem, with query bounds significantly beating general (non active) bounds for the same error guarantee, while almost achieving the information theoretical lower bound. Our main construct is a decomposition of the input s.t. (i) each block incurs high loss at optimum, and (ii) the optimal solution respecting the decomposition is not much worse than the true opt. The decomposition is done by adapting a recent result by Kenyon and Schudy for a related combinatorial optimization problem to the query efficient setting. We thus settle an open problem posed by learning-to-rank theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we show how to use it in concert with an SVM relaxation.Comment: Fixed a tiny error in theorem 3.1 statemen

    An Ensemble Deep Convolutional Neural Network Model with Improved D-S Evidence Fusion for Bearing Fault Diagnosis

    Get PDF
    Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster–Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions

    The Survey, Taxonomy, and Future Directions of Trustworthy AI: A Meta Decision of Strategic Decisions

    Full text link
    When making strategic decisions, we are often confronted with overwhelming information to process. The situation can be further complicated when some pieces of evidence are contradicted each other or paradoxical. The challenge then becomes how to determine which information is useful and which ones should be eliminated. This process is known as meta-decision. Likewise, when it comes to using Artificial Intelligence (AI) systems for strategic decision-making, placing trust in the AI itself becomes a meta-decision, given that many AI systems are viewed as opaque "black boxes" that process large amounts of data. Trusting an opaque system involves deciding on the level of Trustworthy AI (TAI). We propose a new approach to address this issue by introducing a novel taxonomy or framework of TAI, which encompasses three crucial domains: articulate, authentic, and basic for different levels of trust. To underpin these domains, we create ten dimensions to measure trust: explainability/transparency, fairness/diversity, generalizability, privacy, data governance, safety/robustness, accountability, reproducibility, reliability, and sustainability. We aim to use this taxonomy to conduct a comprehensive survey and explore different TAI approaches from a strategic decision-making perspective

    An Ensemble Deep Convolutional Neural Network Model with Improved D-S Evidence Fusion for Bearing Fault Diagnosis

    Get PDF
    Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster–Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions

    Understanding confounding effects in linguistic coordination: an information-theoretic approach

    Full text link
    We suggest an information-theoretic approach for measuring stylistic coordination in dialogues. The proposed measure has a simple predictive interpretation and can account for various confounding factors through proper conditioning. We revisit some of the previous studies that reported strong signatures of stylistic accommodation, and find that a significant part of the observed coordination can be attributed to a simple confounding effect - length coordination. Specifically, longer utterances tend to be followed by longer responses, which gives rise to spurious correlations in the other stylistic features. We propose a test to distinguish correlations in length due to contextual factors (topic of conversation, user verbosity, etc.) and turn-by-turn coordination. We also suggest a test to identify whether stylistic coordination persists even after accounting for length coordination and contextual factors

    Sentiment Paradoxes in Social Networks: Why Your Friends Are More Positive Than You?

    Full text link
    Most people consider their friends to be more positive than themselves, exhibiting a Sentiment Paradox. Psychology research attributes this paradox to human cognition bias. With the goal to understand this phenomenon, we study sentiment paradoxes in social networks. Our work shows that social connections (friends, followees, or followers) of users are indeed (not just illusively) more positive than the users themselves. This is mostly due to positive users having more friends. We identify five sentiment paradoxes at different network levels ranging from triads to large-scale communities. Empirical and theoretical evidence are provided to validate the existence of such sentiment paradoxes. By investigating the relationships between the sentiment paradox and other well-developed network paradoxes, i.e., friendship paradox and activity paradox, we find that user sentiments are positively correlated to their number of friends but rarely to their social activity. Finally, we demonstrate how sentiment paradoxes can be used to predict user sentiments.Comment: The 14th International AAAI Conference on Web and Social Media (ICWSM 2020
    • …
    corecore