83 research outputs found
CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features
In this paper we propose a crossover operator for evolutionary algorithms
with real values that is based on the statistical theory of population
distributions. The operator is based on the theoretical distribution of the
values of the genes of the best individuals in the population. The proposed
operator takes into account the localization and dispersion features of the
best individuals of the population with the objective that these features would
be inherited by the offspring. Our aim is the optimization of the balance
between exploration and exploitation in the search process. In order to test
the efficiency and robustness of this crossover, we have used a set of
functions to be optimized with regard to different criteria, such as,
multimodality, separability, regularity and epistasis. With this set of
functions we can extract conclusions in function of the problem at hand. We
analyze the results using ANOVA and multiple comparison statistical tests. As
an example of how our crossover can be used to solve artificial intelligence
problems, we have applied the proposed model to the problem of obtaining the
weight of each network in a ensemble of neural networks. The results obtained
are above the performance of standard methods
Coevolution of Generative Adversarial Networks
Generative adversarial networks (GAN) became a hot topic, presenting
impressive results in the field of computer vision. However, there are still
open problems with the GAN model, such as the training stability and the
hand-design of architectures. Neuroevolution is a technique that can be used to
provide the automatic design of network architectures even in large search
spaces as in deep neural networks. Therefore, this project proposes COEGAN, a
model that combines neuroevolution and coevolution in the coordination of the
GAN training algorithm. The proposal uses the adversarial characteristic
between the generator and discriminator components to design an algorithm using
coevolution techniques. Our proposal was evaluated in the MNIST dataset. The
results suggest the improvement of the training stability and the automatic
discovery of efficient network architectures for GANs. Our model also partially
solves the mode collapse problem.Comment: Published in EvoApplications 201
OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets
In current research, an enormous amount of information is constantly being produced, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection, or text mining, share the following two features: large data sets and class-imbalanced distribution of samples. Although many methods have been proposed for dealing with class-imbalanced data sets, most of these methods are not scalable to the very large data sets common to those research fields. In this paper, we propose a new approach to dealing with the class-imbalance problem that is scalable to data sets with many millions of instances and hundreds of features. This proposal is based on the divide-and-conquer principle combined with application of the selection process to balanced subsets of the whole data set. This divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the whole data set into memory. Using 40 class-imbalanced medium-sized data sets, we will demonstrate our method's ability to improve the results of state-of-the-art instance selection methods for class-imbalanced data sets. Using three very large data sets, we will show the scalability of our proposal to millions of instances and hundreds of features
Improving translation initiation site and stop codon recognition by using more than two classes
Motivation: The recognition of translation initiation sites and stop codons is a fundamental part of any gene recognition program. Currently, the most successful methods use powerful classifiers, such as support vector machines with various string kernels. These methods all use two classes, one of positive instances and another one of negative instances that are constructed using sequences from the whole genome. However, the features of the negative sequences differ depending on the position of the negative samples in the gene. There are differences depending on whether they are from exons, introns, intergenic regions or any other functional part of the genome. Thus, the positive class is fairly homogeneous, as all its sequences come from the same part of the gene, but the negative class is composed of different instances. The classifier suffers from this problem. In this article, we propose the training of different classifiers with different negative, more homogeneous, classes and the combination of these classifiers for improved accuracy. Results: The proposed method achieves better accuracy than the best state-of-the-art method, both in terms of the geometric mean of the specificity and sensitivity and the area under the receiver operating characteristic and precision recall curves. The method is tested on the whole human genome. The results for recognizing both translation initiation sites and stop codons indicated improvements in the rates of both false-negative results (FN) and false-positive results (FP). On an average, for translation initiation site recognition, the false-negative ratio was reduced by 30.2% and the FP ratio decreased by 10.9%. For stop codon prediction, FP were reduced by 41.4% and FN by 31.7%. Availability and implementation: The source code is licensed under the General Public License and is thus freely available. The datasets and source code can be obtained from http://cib.uco.es/site-recognition. Contact: [email protected]
Structure Discovery in Mixed Order Hyper Networks
Background Mixed Order Hyper Networks (MOHNs) are a type of neural network in which the interactions between inputs are modelled explicitly by weights that can connect any number of neurons. Such networks have a human readability that networks with hidden units lack. They can be used for regression, classification or as content addressable memories and have been shown to be useful as fitness function models in constraint satisfaction tasks. They are fast to train and, when their structure is fixed, do not suffer from local minima in the cost function during training. However, their main drawback is that the correct structure (which neurons to connect with weights) must be discovered from data and an exhaustive search is not possible for networks of over around 30 inputs. Results This paper presents an algorithm designed to discover a set of weights that satisfy the joint constraints of low training error and a parsimonious model. The combined structure discovery and weight learning process was found to be faster, more accurate and have less variance than training an MLP. Conclusions There are a number of advantages to using higher order weights rather than hidden units in a neural network but discovering the correct structure for those weights can be challenging. With the method proposed in this paper, the use of high order networks becomes tractable
Instance reduction for one-class classification
Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. These algorithms have been designed and widely analyzed in multi-class problems providing very competitive results. However, this issue was rarely addressed in the context of one-class classification. In this specific domain a reduction of the training set may not only decrease the classification time and classifierâs complexity, but also allows us to handle internal noisy data and simplify the data description boundary. We propose two methods for achieving this goal. The first one is a flexible framework that adjusts any instance reduction method to one-class scenario by introduction of meaningful artificial outliers. The second one is a novel modification of evolutionary instance reduction technique that is based on differential evolution and uses consistency measure for model evaluation in filter or wrapper modes. It is a powerful native one-class solution that does not require an access to counterexamples. Both of the proposed algorithms can be applied to any type of one-class classifier. On the basis of extensive computational experiments, we show that the proposed methods are highly efficient techniques to reduce the complexity and improve the classification performance in one-class scenarios
Prototype generation on structural data using dissimilarity space representation
Data reduction techniques play a key role in instance-based classification to lower the amount of data to be processed. Among the different existing approaches, prototype selection (PS) and prototype generation (PG) are the most representative ones. These two families differ in the way the reduced set is obtained from the initial one: While the former aims at selecting the most representative elements from the set, the latter creates new data out of it. Although PG is considered to delimit more efficiently decision boundaries, the operations required are not so well defined in scenarios involving structural data such as strings, trees, or graphs. This work studies the possibility of using dissimilarity space (DS) methods as an intermediate process for mapping the initial structural representation to a statistical one, thereby allowing the use of PG methods. A comparative experiment over string data is carried out in which our proposal is faced to PS methods on the original space. Results show that the proposed strategy is able to achieve significantly similar results to PS in the initial space, thus standing as a clear alternative to the classic approach, with some additional advantages derived from the DS representation.This work was partially supported by the Spanish Ministerio de EducaciĂłn, Cultura y Deporte through a FPU fellowship (AP2012â0939), Vicerrectorado de InvestigaciĂłn, Desarrollo e InnovaciĂłn de la Universidad de Alicante through FPU program (UAFPU2014â5883), and the Spanish Ministerio de EconomĂa y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R supported by EU FEDER funds)
Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi
© 2011 Xu et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.DOI: 10.1186/1471-2164-12-161Background.Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results. To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt), we have generated Expressed Sequence Tags (ESTs) by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores) and asexual (germinated urediniospores) stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum), 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs). Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt) and stripe rust, P. striiformis f. sp. tritici (Pst), and poplar leaf rust Melampsora species, and the corn smut fungus, Ustilago maydis (Um). While extensive homologies were found, many genes appeared novel and species-specific; over 40% of genes did not match any known sequence in existing databases. Focusing on spore stages, direct comparison to Um identified potential functional homologs, possibly allowing heterologous functional analysis in that model fungus. Many potentially secreted protein genes were identified by similarity searches against genes and proteins of Pgt and Melampsora spp., revealing apparent orthologs. Conclusions. The current set of Pt unigenes contributes to gene discovery in this major cereal pathogen and will be invaluable for gene model verification in the genome sequence
The management of acute venous thromboembolism in clinical practice. Results from the European PREFER in VTE Registry
Venous thromboembolism (VTE) is a significant cause of morbidity and mortality in Europe. Data from real-world registries are necessary, as clinical trials do not represent the full spectrum of VTE patients seen in clinical practice. We aimed to document the epidemiology, management and outcomes of VTE using data from a large, observational database. PREFER in VTE was an international, non-interventional disease registry conducted between January 2013 and July 2015 in primary and secondary care across seven European countries. Consecutive patients with acute VTE were documented and followed up over 12 months. PREFER in VTE included 3,455 patients with a mean age of 60.8 ± 17.0 years. Overall, 53.0 % were male. The majority of patients were assessed in the hospital setting as inpatients or outpatients (78.5 %). The diagnosis was deep-vein thrombosis (DVT) in 59.5 % and pulmonary embolism (PE) in 40.5 %. The most common comorbidities were the various types of cardiovascular disease (excluding hypertension; 45.5 %), hypertension (42.3 %) and dyslipidaemia (21.1 %). Following the index VTE, a large proportion of patients received initial therapy with heparin (73.2 %), almost half received a vitamin K antagonist (48.7 %) and nearly a quarter received a DOAC (24.5 %). Almost a quarter of all presentations were for recurrent VTE, with >80 % of previous episodes having occurred more than 12 months prior to baseline. In conclusion, PREFER in VTE has provided contemporary insights into VTE patients and their real-world management, including their baseline characteristics, risk factors, disease history, symptoms and signs, initial therapy and outcomes
Differential clinical characteristics and prognosis of intraventricular conduction defects in patients with chronic heart failure
Intraventricular conduction defects (IVCDs) can impair prognosis of heart failure (HF), but their specific impact is not well established. This study aimed to analyse the clinical profile and outcomes of HF patients with LBBB, right bundle branch block (RBBB), left anterior fascicular block (LAFB), and no IVCDs. Clinical variables and outcomes after a median follow-up of 21 months were analysed in 1762 patients with chronic HF and LBBB (n = 532), RBBB (n = 134), LAFB (n = 154), and no IVCDs (n = 942). LBBB was associated with more marked LV dilation, depressed LVEF, and mitral valve regurgitation. Patients with RBBB presented overt signs of congestive HF and depressed right ventricular motion. The LAFB group presented intermediate clinical characteristics, and patients with no IVCDs were more often women with less enlarged left ventricles and less depressed LVEF. Death occurred in 332 patients (interannual mortality = 10.8%): cardiovascular in 257, extravascular in 61, and of unknown origin in 14 patients. Cardiac death occurred in 230 (pump failure in 171 and sudden death in 59). An adjusted Cox model showed higher risk of cardiac death and pump failure death in the LBBB and RBBB than in the LAFB and the no IVCD groups. LBBB and RBBB are associated with different clinical profiles and both are independent predictors of increased risk of cardiac death in patients with HF. A more favourable prognosis was observed in patients with LAFB and in those free of IVCDs. Further research in HF patients with RBBB is warranted
- âŠ