173 research outputs found

    SDRS: a new lossless dimensionality reduction for text corpora

    Get PDF
    In recent years, most content-based spam filters have been implemented using Machine Learning (ML) approaches by means of token-based representations of textual contents. After introducing multiple performance enhancements, the impact has been virtually irrelevant. Recent studies have introduced synset-based content representations as a reliable way to improve classification, as well as different forms to take advantage of semantic information to address problems, such as dimensionality reduction. These preliminary solutions present some limitations and enforce simplifications that must be gradually redefined in order to obtain significant improvements in spam content filtering. This study addresses the problem of feature reduction by introducing a new semantic-based proposal (SDRS) that avoids losing knowledge (lossless). Synset-features can be semantically grouped by taking advantage of taxonomic relations (mainly hypernyms) provided by BabelNet ontological dictionary (e.g. “Viagra” and “Cialis” can be summarized into the single features “anti-impotence drug”, “drug” or “chemical substance” depending on the generalization of 1, 2 or 3 levels). In order to decide how many levels should be used to generalize each synset of a dataset, our proposal takes advantage of Multi-Objective Evolutionary Algorithms (MOEA) and particularly, of the Non-dominated Sorting Genetic Algorithm (NSGA-II). We have compared the performance achieved by a Naïve Bayes classifier, using both token-based and synset-based dataset representations, with and without executing dimensional reductions. As a result, our lossless semantic reduction strategy was able to find optimal semantic-based feature grouping strategies for the input texts, leading to a better performance of Naïve Bayes classifiers.info:eu-repo/semantics/acceptedVersio

    El análisis de curvas ROC en estudios epidemiológicos de psicopatología infantil: aplicación al cuestionario CBCL

    Get PDF
    ROC analysis was applied in order to study diagnostic accuracy of the Child Behavior Checklist (CBCL) and to obtain the optimal cut-off in a sample of 196 pediatric and psychiatric patients 6 to 17 years old. The group of origin, the diagnosis in the Diagnostic lnterview for Children and Adolescents-Revised (DICA-R) and the clinician's diagnosis were used as external validators. The results indicate that the discriminant power of the CBCL for the presence or absence of psychopathology depends on the external validator used. The best results were obtained when it was considered the group of origin and the DlCA-R diagnoses. As screening test, a cut-off between 50 and 54 gave the best sensitivity.Mediante el análisis de curvas ROC se estudia la precisión diagnóstica del Child Behavior Checklist (CBCL) y se obtiene elpunto de corte óptimo en una muestra de 196 niños y adolescentes procedentes de centros de consulta pediátrica y psiquiátrica. Se utilizaron como patrones de referencia el grupo de procedencia, el diagnóstico según la entrevista diagnóstica estructurada DICA-R y el diagnóstico del clínico. Los resultados indican que la capacidad del CBCL para discriminar entre sujetos con y sin psicopatología depende en gran medida del patrón de referencia utilizado, siendo mejor el rendimiento de la prueba cuando se contrasta con el grupo de procedencia o con la entrevista diagnóstica estructurada. Como prueba de cribado, el punto de corte de la puntuación total del CBCL se situada entre los valores 50 y 54 para optimizar la sensibilidad

    Multi-objective evolutionary optimization for dimensionality reduction of texts represented by synsets

    Get PDF
    Despite new developments in machine learning classification techniques, improving the accuracy of spam filtering is a difficult task due to linguistic phenomena that limit its effectiveness. In particular, we highlight polysemy, synonymy, the usage of hypernyms/hyponyms, and the presence of irrelevant/confusing words. These problems should be solved at the pre-processing stage to avoid using inconsistent information in the building of classification models. Previous studies have suggested that the use of synset-based representation strategies could be successfully used to solve synonymy and polysemy problems. Complementarily, it is possible to take advantage of hyponymy/hypernymy-based to implement dimensionality reduction strategies. These strategies could unify textual terms to model the intentions of the document without losing any information (e.g., bringing together the synsets “viagra”, “ciallis”, “levitra” and other representing similar drugs by using “virility drug” which is a hyponym for all of them). These feature reduction schemes are known as lossless strategies as the information is not removed but only generalised. However, in some types of text classification problems (such as spam filtering) it may not be worthwhile to keep all the information and let dimensionality reduction algorithms discard information that may be irrelevant or confusing. In this work, we are introducing the feature reduction as a multi-objective optimisation problem to be solved using a Multi-Objective Evolutionary Algorithm (MOEA). Our algorithm allows, with minor modifications, to implement lossless (using only semantic-based synset grouping), low-loss (discarding irrelevant information and using semantic-based synset grouping) or lossy (discarding only irrelevant information) strategies. The contribution of this study is two-fold: (i) to introduce different dimensionality reduction methods (lossless, low-loss and lossy) as an optimization problem that can be solved using MOEA and (ii) to provide an experimental comparison of lossless and low-loss schemes for text representation. The results obtained support the usefulness of the low-loss method to improve the efficiency of classifiers.info:eu-repo/semantics/publishedVersio

    Traducir a Shakespeare. La palabra del actor

    Get PDF
    La riqueza de Shakespeare como dramaturgo clásico ha originado, a lo largo de los siglos, innumerables traducciones que han abordado su obra dramática desde perspectivas muy diversas y han hecho hincapié en distintos valores. Como traductores de Shakespeare, nuestro objetivo es subrayar un valor del texto de sumo interés para la traducción: el texto está concebido como una «partitura teatral» (Montalt, 1996) que contiene todas las marcas o apoyos orales y gestuales necesarios para que los actores puedan dar vida a los personajes sobre el escenario. El actor meta requiere un texto meta que le proporcione esas marcas o apoyos en el mismo grado de eficacia en que el texto inglés lo hace para el actor en inglés.

    Genetic connectivity and hybridization with its siter species challenge the current management paradigm of white anglerfish (Lophius piscatorius)

    Get PDF
    Understanding the inter and intraspecific dynamics of fish populations is essential to promote effective management and conservation actions and to predict adaptation to changing conditions. This is possible through the analysis of thousands of genetic markers, which has proven useful to resolve connectivity among populations. Here, we have tackled this issue in the white anglerfish (Lophius piscatorius), which inhabits the Northeast Atlantic and Mediterranean Sea and coexists with its morphologically almost identical sister species, the black anglerfish (L. budegassa). Our genetic analyses based on 16,000 SNP markers and 700 samples reveal that i) the white anglerfish from the Mediterranean Sea and the Atlantic Ocean are genetically isolated, but that no differentiation can be observed within the later, and that ii) black and white anglerfish naturally hybridize, resulting in a population of about 20% of, most likely sterile, hybrids in some areas. These findings challenge the current paradigm of white anglerfish management, which considers three independent management units within the North East Atlantic and assumes that all mature fish have reproductive potential. Additionally, the northwards distribution of both species, likely due to temperature raises, calls for further monitoring of the abundance and distribution of hybrids to anticipate the effects of climate change in the interactions between both species and their potential resilience

    El análisis de curvas ROC en estudios epidemiológicos de psicopatología infantil: aplicación al cuestionario CBCL

    Get PDF
    ROC analysis was applied in order to study diagnostic accuracy of the Child Behavior Checklist (CBCL) and to obtain the optimal cut-off in a sample of 196 pediatric and psychiatric patients 6 to 17 years old. The group of origin, the diagnosis in the Diagnostic lnterview for Children and Adolescents-Revised (DICA-R) and the clinician's diagnosis were used as external validators. The results indicate that the discriminant power of the CBCL for the presence or absence of psychopathology depends on the external validator used. The best results were obtained when it was considered the group of origin and the DlCA-R diagnoses. As screening test, a cut-off between 50 and 54 gave the best sensitivity.Mediante el análisis de curvas ROC se estudia la precisión diagnóstica del Child Behavior Checklist (CBCL) y se obtiene elpunto de corte óptimo en una muestra de 196 niños y adolescentes procedentes de centros de consulta pediátrica y psiquiátrica. Se utilizaron como patrones de referencia el grupo de procedencia, el diagnóstico según la entrevista diagnóstica estructurada DICA-R y el diagnóstico del clínico. Los resultados indican que la capacidad del CBCL para discriminar entre sujetos con y sin psicopatología depende en gran medida del patrón de referencia utilizado, siendo mejor el rendimiento de la prueba cuando se contrasta con el grupo de procedencia o con la entrevista diagnóstica estructurada. Como prueba de cribado, el punto de corte de la puntuación total del CBCL se situada entre los valores 50 y 54 para optimizar la sensibilidad

    Inventory of Callous-Unemotional Traits in a Community Sample of Preschoolers

    Get PDF
    The purpose of this study was to test the factor structure of the Inventory of Callous-Unemotional Traits (ICU; Frick, 2004) and to study the relation between the derived dimensions and external variables in a community sample of preschool children. A total of 622 children 3 and 4 years of age were assessed with a semistructured diagnostic interview, the ICU, and other questionnaires on psychopathology, temperament, and executive functioning, completed by parents and teachers. Confirmatory factor analysis derived from teachers' ICU responses yielded three dimensions: Callousness, Uncaring, and Unemotional. Callousness and Uncaring subscale scores correlated with the specific scales related to aggressive behavior, temperament, executive functioning, and conduct problems. The ICU scale scores discriminated cross-sectionally oppositional defiant disorder (ODD) and conduct disorder (CD) diagnoses, aggressive and nonaggressive symptoms of CD, use of services, and ODD/CD-related family burden. Longitudinally, Callousness subscale score at age 3 predicted ODD or CD diagnosis at age 4. Unemotional was not associated with aggressive measures, but it was linked to anxiety disorders cross-sectionally and longitudinally. Callous-Unemotional traits contributed significantly to predicting disruptive behavior disorders controlling for sex, temperament, and executive functioning (predictive accuracy between 3 and 5%). The ICU is a promising questionnaire for identifying early Callous and Uncaring traits in preschool years that may help in the identification of a subset of preschool children who might have severe behavioral problems

    Dimensions of oppositional defiant disorder in 3-year-old preschoolers

    Get PDF
    Background: To test the factor structure of oppositional defiant disorder (ODD) symptoms and to study the relationships between the proposed dimensions and external variables in a community sample of preschool children. Method: A sample of 1,341 3-year-old preschoolers was randomly selected and screened for a double-phase design. In total, 622 families were assessed with a diagnostic semi-structured interview and questionnaires on psychopathology, temperament and executive functioning completed by parents and teachers. Results: Using categorical and dimensional symptoms of ODD it was possible to confirm, cross-informant and cross-method, distinct dimensions for defining the structure of ODD: one made up of irritable and headstrong and the other of negative affect, oppositional behaviour and antagonistic behaviour. Specific associations with DSM-IV disorders were found, and irritable was associated with anxiety disorders, whereas headstrong was associated with disruptive disorders, including aggressive and non-aggressive CD symptoms. Also, negative affect was associated with anxiety disorders and non-aggressive CD symptoms, oppositional behaviour with disruptive disorders and aggressive CD symptoms, and antagonistic behaviours with disruptive disorders and, in boys, with mood disorders. The dimensions correlated with specific scales of psychopathology, temperament and executive functioning. Conclusions: Oppositional defiant disorder is a heterogeneous disorder from preschool age. Different dimensions, with moderate to acceptable reliability and convergent and discriminant validity with other psychological constructs, can be identified early in life

    Behavior Rating Inventory of Executive Functioning-Preschool (BRIEF-P) Applied to Teachers : Psychometric Properties and Usefulness for Disruptive Disorders in 3-Year-Old preschoolers

    Get PDF
    Objective: We provide validation data on the Behavior Rating Inventory of Executive Functioning-Preschool version (BRIEF-P) in preschool children. Method: Teachers of a community sample of six hundred and twenty 3-year-olds, who were followed up at age 4, responded to the BRIEF-P, and parents and children answered different psychological measures. Results: Confirmatory factor analysis achieved adequate fit of the original structure (five-first-order-factor plus three-second-order-factor model) after excluding four items. The derived dimensions obtained satisfactory internal consistency, moderate convergent validity with psychopathology and temperament, and good ability to discriminate between children with ADHD. BRIEF-P scales were not associated with a performance-based measure of attention. The teacher's BRIEF-P adds significant clinical information for the diagnosis of ADHD (ΔR2 from 5.3 to 15.3) when used with other instruments for the assessment of psychopathology, functional impairment, or performance-based attention. Conclusion: The BRIEF-P may be useful in the identification of preschool children, specifically those with ADHD, who might have a dysfunction in executive functioning
    corecore