261 research outputs found

    A Study of SVM Kernel Functions for Sensitivity Classification Ensembles with POS Sequences

    Get PDF
    Freedom of Information (FOI) laws legislate that government documents should be opened to the public. However, many government documents contain sensitive information, such as confidential information, that is exempt from release. Therefore, government documents must be sensitivity reviewed prior to release, to identify and close any sensitive information. With the adoption of born-digital documents, such as email, there is a need for automatic sensitivity classification to assist digital sensitivity review. SVM classifiers and Part-of-Speech sequences have separately been shown to be promising for sensitivity classification. However, sequence classification methodologies, and specifically SVM kernel functions, have not been fully investigated for sensitivity classification. Therefore, in this work, we present an evaluation of five SVM kernel functions for sensitivity classification using POS sequences. Moreover, we show that an ensemble classifier that combines POS sequence classification with text classification can significantly improve sensitivity classification effectiveness (+6.09% F2) compared with a text classification baseline, according to McNemar's test of significance

    CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features

    Full text link
    In this paper we propose a crossover operator for evolutionary algorithms with real values that is based on the statistical theory of population distributions. The operator is based on the theoretical distribution of the values of the genes of the best individuals in the population. The proposed operator takes into account the localization and dispersion features of the best individuals of the population with the objective that these features would be inherited by the offspring. Our aim is the optimization of the balance between exploration and exploitation in the search process. In order to test the efficiency and robustness of this crossover, we have used a set of functions to be optimized with regard to different criteria, such as, multimodality, separability, regularity and epistasis. With this set of functions we can extract conclusions in function of the problem at hand. We analyze the results using ANOVA and multiple comparison statistical tests. As an example of how our crossover can be used to solve artificial intelligence problems, we have applied the proposed model to the problem of obtaining the weight of each network in a ensemble of neural networks. The results obtained are above the performance of standard methods

    Characterization of a new partitivirus strain in Verticillium dahliae provides further evidence of the spread of the highly virulent defoliating pathotype through new introductions

    Get PDF
    The soilborne pathogen Verticillium dahliae, causal agent of Verticillium wilt, has a worldwide distribution and many hosts of agronomic value. The worldwide spread of a highly virulent defoliating (D) pathotype has greatly increased the threat posed by V. dahliae in olive trees. For effective disease management, it is important to know if the D pathotype is spreading long distances from contaminated material, or if D pathotype isolates may have originated locally from native V. dahliae populations several times. We identified a double-stranded RNA mycovirus in an olive D pathotype isolate from Turkey. Sequencing and phylogenetic analysis clustered the virus with members of the family Partitiviridae. The virus was most similar to a partitivirus previously identified in a V. dahliae isolate from cotton in China (VdPV1), with sequence identities of 94% and 91% at the nucleotide level for RNA1 and RNA2, respectively. The virus therefore corresponded to a strain of the established species, and we designated it VdPV1-ol (VdPV1 from olive). The identification of the same viral species in these two fungal isolates from geographically distant origins provides evidence of their relationships, supporting the hypothesis of long-distance movement of V. dahliae isolates.This research was supported by the Spanish Ministry of Science and Innovation (Grants AGL2009- 13445), and the Junta de Andalucía (Grant FEDER P07-TIC-02682) and AGL2013-48980-R.Peer reviewe

    Nonlinear Boosting Projections for Ensemble Construction

    Get PDF
    In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach combines the philosophy of boosting, putting more effort on difficult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difficulties to previous classifiers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classified instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classifier. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting method

    Improving multiclass pattern recognition by the combination of two strategies

    Get PDF
    We present a new method of multiclass classification based on the combination of one- vs- all method and a modification of one- vs- one method. This combination of one- vs- all and one- vs- one methods proposed enforces the strength of both methods. A study of the behavior of the two methods identifies some of the sources of their failure. The performance of a classifier can be improved if the two methods are combined in one, in such a way that the main sources of their failure are partially avoided

    ML-k’sNN: Label Dependent k Values for Multi-Label k-Nearest Neighbor Rule

    Get PDF
    Multi-label classification as a data mining task has recently attracted increasing interest from researchers. Many current data mining applications address problems with instances that belong to more than one category. These problems require the development of new, efficient methods. Multi-label k-nearest neighbors rule, ML-kNN, is among the best-performing methods for multi-label problems. Current methods use a unique k value for all labels, as in the single-label method. However, the distributions of the labels are frequently very different. In such scenarios, a unique k value for the labels might be suboptimal. In this paper, we propose a novel approach in which each label is predicted with a different value of k. Obtaining the best k for each label is stated as an optimization problem. Three different algorithms are proposed for this task, depending on which multi-label metric is the target of our optimization process. In a large set of 40 real-world multi-label problems, our approach improves the results of two different tested ML-kNN implementations

    Caracterización del régimen temporal de masas de agua no permanentes mediante la implementación del modelo hidrológico TETIS y la herramienta TREHS. Aplicación en la Demarcación Hidrográfica del Júcar

    Full text link
    At the Jucar river basin district 20% of the water bodies are water courses in which the flow rate is zero for long periods of time. This water bodies have in most cases a nonpermanent hydrological regime, however it is especially important to define the natural hydrological regime as well as the alteration degree. The main objective of this study is the characterization of the hydrological regime of nonpermanent water bodies without gauging station in the Jucar river basin district, throught the implementation of the hydrological model TETIS. In addition, the study includes the use and interpretation of the application TREHS, which is currently in development by the University of Barcelona within the framework of the Life TRivers project on temporary rivers. This application allows, on the one hand, the hydrological characterization of the temporal water bodies and on the other hand an evaluation of the degree of alteration of the natural regime. For this study four nonpermanent water bodies belonging to the Jucar river basin district have been selected, being this the Barranco del Carraixet, the Río Cervol, the Rambla de la Viuda and the Rambla de Alcalá. One of the main drawbacks when modeling nonpermanent rivers is the lack of gauging stations, so it has been necessary to previously calibrate the TETIS in the selected water bodies that have gauging stations (Rambla de la Viuda, Carraixet and Cervol), so that calibrations can be extrapolated for the modeling of temporary rivers that lack gauging stations. It is need to consider that in most of cases, the gauging stations of the nonpermanent rivers of the district are not in operation at present, and their hydrological series go back to the 1910s or 1930s. To be able to calibrate the model in this water bodies (Río Cervol), it has been necessary to request AEMET historical rainfall data. For the calibration of TETIS in water bodies with more recent hydrological series (Barranco del Carraixet and Río Cervol) have been used interpolated meteorological data from the Spain02 project prepared by AEMET and the Meteorology Group of the University of Cantabria (UNICAN), that collect precipitation and daily maximum and minimum temperatures on a regular mesh of high resolution in its version 4 (0.11ºx0.11º). The study starts with the TFMs realized by Hebert Tejada Espinoza and Ana Sánchez García on modeling with TETIS in Barranco del Carraixet and Rambla de la Viuda respectively. Nevertheless, these initial models don¿t allow to represent adequate representation of the null flow rates nor the spatial variability of the hydrological regime in these rivers. To this end, a modification has been introduced in the TETIS model that incorporates transmission losses into the riverbed, as well as karst areas and springs. These modifications have meant a very important improvement in the results obtained. In this respect indicate that the introduction of transmission losses into the riverbed in the model of the Rambla de la Viuda has been carried out within the TFM of Carlos Israel Montalvo. Later, TETIS was implemented in the Río Cervol and Rambla de Alcalá, in the first there is a gauging station with data between the years 1912 and 1929, for which the historical rainfall data provided by AEMET have been used, geological data of the IGME, data on land uses of Corine Land Cover and soil characteristics of the European Soil Database, among others. Regarding the implementation of the hydrological model TETIS in the Rambla de Alcalá, it should be considered that it does not have gauging stations, so it has been necessary to extrapolate the calibration obtained from the other rivers modeled. With respect to the application TREHS, in development, it allows to introduce hydrological information from different sources, like models, gauging stations, surveys or direct observations. In addition it allows to differentiate the information that responds to the unaltered natural regEn la Confederación Hidrográfica del Júcar el 20% de las masas de agua tipo río son masas de agua en las que el caudal circulante es nulo durante largos periodos de tiempo. Estas masas de agua tienen un régimen hidrológico en la mayoría de los casos no permanente, sin embargo resulta especialmente importante definir el régimen hidrológico natural así como el grado de alteración. El objeto principal de este estudio es la caracterización del régimen hidrológico en masas de agua no permanentes sin estación de aforos de la demarcación hidrográfica del Júcar, mediante la implementación del modelo hidrológico TETIS. Además el estudio incluye el uso e interpretación de la aplicación TREHS actualmente en desarrollo por la Universidad de Barcelona dentro del marco del proyecto Life TRivers, sobre ríos temporales. Esta aplicación permite por un lado, la caracterización hidrológica de las masas de agua temporales y por otro lado una evaluación del grado de alteración del régimen natural. Se han seleccionado para el presente estudio cuatro masas de agua con régimen no permanente de la Demarcación Hidrográfica del Júcar, situándose estas en el Barranco del Carraixet, la Rambla de la Viuda, el Río Cervol y la Rambla de Alcalá. Uno de los principales inconvenientes a la hora de modelar ríos no permanentes es la falta de estaciones de aforos, por ello ha sido necesario calibrar previamente el modelo TETIS en las masas de agua seleccionadas que disponen de estaciones de aforo (Rambla de la Viuda, Carraixet y Cervol), de manera que dichas calibraciones se puedan extrapolar para la modelación de ríos temporales que carecen de estaciones de aforo. También debe considerarse que, en la mayoría de los ríos no permanentes de la Demarcación que disponen de estación de aforo, se trata de series hidrológicas muy antiguas que remontan normalmente a los años 1912 y 1930. Para poder calibrar el modelo TETIS en estas masas (Cervol) ha sido necesario solicitar a AEMET datos de pluviómetros históricos. Para la calibración del modelo TETIS en masas con series hidrológicas aforadas más recientes (Carraixet y Rambla de la Viuda), se han utilizado datos meteorológicos interpolados del proyecto Spain02 elaborados por AEMET y el Grupo de Meteorología de la Univ. de Cantabria (UNICAN), que recogen precipitación y temperaturas máximas y mínimas diarias sobre una malla regular de alta resolución en su versión 4 (0.11ºx0.11º). Se parte de los TFM realizados por Hebert Tejada Espinoza y Ana Sánchez García, sobre modelación en TETIS en el barranco del Carraixet y Rambla de la Viuda, respectivamente. Sin embargo, estos modelos iniciales no permiten la representación adecuada de los caudales nulos, ni la variabilidad espacial del régimen hidrológico en estos ríos. Para ello se ha introducido una modificación en el modelo TETIS que incorpora las pérdidas por transmisión en cauce, así como las zonas kársticas y los manantiales. Estas modificaciones han supuesto una mejora muy importante en los resultados obtenidos. A este respecto indicar que la introducción de las perdidas por transmisión en cauce en el modelo de la rambla de la Viuda ha sido realizada dentro del TFM de Carlos Israel Montalvo. Posteriormente, se ha implementado el modelo TETIS en el río Cervol y en la rambla de Alcalá, en el río Cervol existen datos de aforo entre los años 1912 y 1929, para ello se han utilizado los datos pluviométricos históricos proporcionados por AEMET, los datos geológicos del IGME, datos sobre usos del suelo de Corine Land Cover y características del suelo de la Base de Datos Europea de Suelos, entre otros. En cuanto a la implementación del modelo hidrológico TETIS en la Rambla de Alcalá, debe considerarse que no dispone de estaciones de aforo, por lo que ha sido necesario extrapolar la calibración obtenida de los otros ríos modelados. En cuanto a la aplicación TREHS, en desarrollo, ést[CA] En la Confederació Hidrogràfica del Xúquer el 20% de les masses d'aigua tipus riu són masses d'aigua en què el cabal circulant és nul durant llargs períodes de temps. Estes masses d'aigua tenen un règim hidrològic en la majoria dels casos no permanent, no obstant això resulta especialment important definir el règim hidrològic natural així com el grau d'alteració. L'objecte principal d'este estudi és la caracterització del règim hidrològic en masses d'aigua no permanents sense estació d'aforaments de la demarcació hidrogràfica del Xúquer, per mitjà de la implementació del model hidrològic TETIS. A més l'estudi inclou l'ús i interpretació de l'aplicació TREHS actualment en desenrotllament per la Universitat de Barcelona dins del marc del projecte Life TRivers, sobre rius temporals. Esta aplicació permet d'una banda, la caracterització hidrològica de les masses d'aigua temporals i per un altre costat una avaluació del grau d'alteració del règim natural. S'han seleccionat per al present estudi tres masses d'aigua amb règim no permanent de la Demarcació Hidrogràfica del Xúquer, situant-se estes en el Barranc del Carraixet, el Riu Cervol i la Rambla d'Alcalá. Un dels principals inconvenients a l'hora de modelar rius no permanents és la falta d'estacions d'aforaments, per això ha sigut necessari calibrar prèviament el model TETIS en les masses d'aigua seleccionades que disposen d'estacions d'aforament (Carraixet i Cervol), de manera que aquestes calibratges es puguin extrapolar per a la modelació de rius temporals que no tenen estacions d'aforament. També ha de considerar-se que, en la majoria dels rius no permanents de la Demarcació que disposen d'estació d'aforament, es tracta de sèries hidrològiques molt antigues que remunten normalment als anys 1912 i 1930. Per a poder calibrar el model TETIS en estes masses (Cervol) ha sigut necessari sol·licitar a AEMET dades de pluviòmetres històrics. Per a la calibratge del model TETIS en masses amb sèries hidrològiques aforades més recents (Carraixet), s'han utilitzat dades meteorològiques interpolats del projecte Spain02 elaborats per AEMET i el Grup de Meteorologia de la Univ. de Cantàbria (UNICAN), que arrepleguen precipitació i temperatures màximes i mínimes diàries sobre una malla regular d'alta resolució en la seua versió 4 (0.11ºx0.11º). Es partix del TFM realitzat per Hebert Tejada Espinoza, sobre modelació en TETIS en el barranc del Carraixet. No obstant això, aquests models inicials no permeten la representació adequada dels cabals nuls, ni la variabilitat espacial del règim hidrològic en aquests rius. Per a això s'ha introduït una modificació en el model TETIS que incorpora les pèrdues per transmissió en llit, així com les zones kàrstiques i els brolladors. Estes modificacions han suposat una millora molt important en els resultats obtinguts. A este respecte indicar que la calibració de les perdudes per transmissió en llit ha sigut realitzada en la rambla de la Vídua dins del TFM de Carlos Israel Montalvo. Posteriorment, s'ha implementat el model TETIS en el riu Cervol i en la rambla d'Alcalá, en el riu Cervol hi ha dades d'aforament entre els anys 1912 i 1929, per a això s'han utilitzat les dades pluviomètriques històriques proporcionats per AEMET, les dades geològiques de l'IGME, dades sobre usos del sòl de Corine Land Cover i característiques del sòl de la Base de Dades Europea de Sòls, entre altres. Quant a la implementació del model hidrològic TETIS en la Rambla d'Alcalá, ha de considerar-se que no disposa d'estacions d'aforament, per la qual cosa ha sigut necessari extrapolar el calibratge obtingut dels altres rius modelats. Quant a l'aplicació TREHS, en desenrotllament, permet introduir informació hidrològica de diferents fonts, com models, estacions d'aforament, enquestes o observacions directes. A més permet diferenciar la informació que respon al règim natural no alterat i la informació que fa referència a la situació actual. Amb les diferents fonts d'informació classifica el riu d'estudi en diferents hidrotipus. Una vegada obtingudes i validades les sèries hidrològiques del règim natural per mitjà del model TETIS, s'introduïx esta informació dins de l'aplicació TREHS junt amb les enquestes i observacions directes, permetent definir l'hidrotipus.Pedrajas García, J. (2017). Caracterización del régimen temporal de masas de agua no permanentes mediante la implementación del modelo hidrológico TETIS y la herramienta TREHS. Aplicación en la Demarcación Hidrográfica del Júcar. http://hdl.handle.net/10251/89514TFG

    Estudio y seguimiento de una red de parcelas de experimentación en una masa de Pinus halepensis coetánea situadas en M.U.P nº99 Mas de l Ascle en el término municipal de Alcalá de Xivert (Castellón)

    Full text link
    [ES] En el año 2009 se estableció una red de parcelas permanentes en el M.U.P nº 99 Mas de l Ascle situado en el termino municipal Alcalá de Xivert, en la provincia de Castellón; esta red de parcelas esta formada por 26 parcelas, 14 de ellas situadas en una masa de Pinus halepensis procedente de regeneración natural tras un incendio forestal en el año 1993, mientras que las 12 restantes, se ubicaron en una masa de Pinus halepensis procedente de repoblación forestal realizada en el año 1984 y que no se quemó en 1993. En el año 2009 se realizaron diferentes tratamientos silvícolas en dichas parcelas con el objetivo de analizar el efecto que tiene cada uno de ellos; en las masas provenientes de regeneración natural post-incendio se actuó con distintas tipologías de clareos. y en las masas provenientes de repoblación con distintas modalidades de claras; tanto en las parcelas en las que se actuó mediante claras como en las que se actuó mediante clareos, se dejaron 3 parcelas sin aplicar tratamiento alguno con el objetivo que actuaran como testigo. El objetivo de este trabajo será evaluar como estos distintos tratamientos han afectado a la masa forestal pasados 11 años de su establecimiento y aplicación del claro, y comparar su eficacia respecto al objetivo de gestión deseado; para ello se realizará un inventario en las parcelas, midiendo diámetros y alturas en las parcelas tratadas mediante claras, y diámetros, alturas y matorral (porcentaje ocupado de superficie, altura y especie) en las parcelas tratadas con clareos. Con los resultados del inventario se realizará un análisis estadístico para evaluar y comparar los efectos de cada tratamiento, para ello también se contará con el inventario que se realizó en el año en el que se establecieron las parcelas.[EN] In 2009 a network of permanent plots was established in the M.U.P No. 99 “Mas de l’Ascle” located in Alcalá de Xivert, in the province of Castellón; this network of plots is made up of 26 plots, 14 of them located in a Pinus halepensis mass originated from a natural regeneration after a forest fire in 1993, while the remaining 12 were located in a Pinus halepensis mass originated from a reforestation made in 1984 which did not burn in 1993. In 2009, different silvicultural treatments were carried out on the plots in order to analyze the effect that each one of them has; in the stands originated from natural post-fire regeneration, different types of precommercial thinning were acted upon and in the stands from reforestation with different types of thinning; plots were left without applying any treatment in order to act as a control. The objective of this work is to evaluate how these different treatments have affected the forest mass after 11 years of its establishment and application of the thinning, and to compare its effectiveness with respect to the desired management objective; for this, an inventory has been carried out in the plots, measuring diameters and heights in the plots treated by thinning, and diameters, heights and scrub (occupied percentage of surface, height and species) in the plots treated with precommercial thinning. With the results of the inventory, a statistical analysis has been done to evaluate and compare the effects of each treatment, for this we also used the inventory that was performed in the year in which the plots were established. In the thinning plots, no significant differences have been obtained when comparing the growths of the different treatments, because the initial density was low thus, it is necessary more time and to continue with the treatments to draw conclusions in these plots. In the precommercial thinning plots, significant differences have been obtained when comparing the growths in basal area and volume between the different treatments, being the growths higher when the intensity of treatment is lower.Pedrajas García, J. (2020). Estudio y seguimiento de una red de parcelas de experimentación en una masa de Pinus halepensis coetánea situadas en M.U.P nº99 Mas de l Ascle en el término municipal de Alcalá de Xivert (Castellón). Universitat Politècnica de València. http://hdl.handle.net/10251/157861TFG

    Graph-Based Feature Selection Approach for Molecular Activity Prediction

    Get PDF
    In the construction of QSAR models for the prediction of molecular activity, feature selection is a common task aimed at improving the results and understanding of the problem. The selection of features allows elimination of irrelevant and redundant features, reduces the effect of dimensionality problems, and improves the generalization and interpretability of the models. In many feature selection applications, such as those based on ensembles of feature selectors, it is necessary to combine different selection processes. In this work, we evaluate the application of a new feature selection approach to the prediction of molecular activity, based on the construction of an undirected graph to combine base feature selectors. The experimental results demonstrate the efficiency of the graph-based method in terms of the classification performance, reduction, and redundancy compared to the standard voting method. The graph-based method can be extended to different feature selection algorithms and applied to other cheminformatics problems

    Coevolution of Generative Adversarial Networks

    Full text link
    Generative adversarial networks (GAN) became a hot topic, presenting impressive results in the field of computer vision. However, there are still open problems with the GAN model, such as the training stability and the hand-design of architectures. Neuroevolution is a technique that can be used to provide the automatic design of network architectures even in large search spaces as in deep neural networks. Therefore, this project proposes COEGAN, a model that combines neuroevolution and coevolution in the coordination of the GAN training algorithm. The proposal uses the adversarial characteristic between the generator and discriminator components to design an algorithm using coevolution techniques. Our proposal was evaluated in the MNIST dataset. The results suggest the improvement of the training stability and the automatic discovery of efficient network architectures for GANs. Our model also partially solves the mode collapse problem.Comment: Published in EvoApplications 201
    corecore