Search CORE

27,814 research outputs found

A machine learning based framework to identify and classify long terminal repeat retrotransposons

Author: Blockeel Hendrik
Carareto Claudia MA
Cerri Ricardo
Costa Eduardo
Fischer Carlos N
Ramon Jan
Schietgat Leander
Vens Celine
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER'S predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ghent University Academic Bibliography

Directory of Open Access Journals

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Algorithm Selection Framework for Cyber Attack Detection

Author: Ajmera Aman
Arel-Bundock Vincent
Brazdil Pavel
Cui Can
Jacob Sunil
Janosi Andras
Maxwell Paul
Paliwal Swati
Revathi S
Rice John
Simpson Timothy W
Smith Michael R.
Sobirey Michael
Tavallaee Mahbod
Utgoff Paul E
Wolberg William H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2020
Field of study

The number of cyber threats against both wired and wireless computer systems and other components of the Internet of Things continues to increase annually. In this work, an algorithm selection framework is employed on the NSL-KDD data set and a novel paradigm of machine learning taxonomy is presented. The framework uses a combination of user input and meta-features to select the best algorithm to detect cyber attacks on a network. Performance is compared between a rule-of-thumb strategy and a meta-learning strategy. The framework removes the conjecture of the common trial-and-error algorithm selection method. The framework recommends five algorithms from the taxonomy. Both strategies recommend a high-performing algorithm, though not the best performing. The work demonstrates the close connectedness between algorithm selection and the taxonomy for which it is premised.Comment: 6 pages, 7 figures, 1 table, accepted to WiseML '2

arXiv.org e-Print Archive

AFTI Scholar (Air Force Institute of Technology)

Crossref

USMA Digital Commons (United States Military Academy, West Point)

Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules

Author: Berck Peter
Daelemans Walter
Gillis Steven
Publication venue
Publication date: 01/01/1996
Field of study

We describe a case study in the application of {\em symbolic machine learning} techniques for the discovery of linguistic rules and categories. A supervised rule induction algorithm is used to learn to predict the correct diminutive suffix given the phonological representation of Dutch nouns. The system produces rules which are comparable to rules proposed by linguists. Furthermore, in the process of learning this morphological task, the phonemes used are grouped into phonologically relevant categories. We discuss the relevance of our method for linguistics and language technology

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

A review of multi-instance learning assumptions

Author: Foulds James Richard
Frank Eibe
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2010
Field of study

Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area

Research Commons@Waikato

Boosting Classifiers for Drifting Concepts

Author: Klinkenberg Ralf
Scholz Martin
Publication venue
Publication date
Field of study

This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. --

Research Papers in Economics