155 research outputs found

    SDRS: a new lossless dimensionality reduction for text corpora

    Get PDF
    In recent years, most content-based spam filters have been implemented using Machine Learning (ML) approaches by means of token-based representations of textual contents. After introducing multiple performance enhancements, the impact has been virtually irrelevant. Recent studies have introduced synset-based content representations as a reliable way to improve classification, as well as different forms to take advantage of semantic information to address problems, such as dimensionality reduction. These preliminary solutions present some limitations and enforce simplifications that must be gradually redefined in order to obtain significant improvements in spam content filtering. This study addresses the problem of feature reduction by introducing a new semantic-based proposal (SDRS) that avoids losing knowledge (lossless). Synset-features can be semantically grouped by taking advantage of taxonomic relations (mainly hypernyms) provided by BabelNet ontological dictionary (e.g. “Viagra” and “Cialis” can be summarized into the single features “anti-impotence drug”, “drug” or “chemical substance” depending on the generalization of 1, 2 or 3 levels). In order to decide how many levels should be used to generalize each synset of a dataset, our proposal takes advantage of Multi-Objective Evolutionary Algorithms (MOEA) and particularly, of the Non-dominated Sorting Genetic Algorithm (NSGA-II). We have compared the performance achieved by a Naïve Bayes classifier, using both token-based and synset-based dataset representations, with and without executing dimensional reductions. As a result, our lossless semantic reduction strategy was able to find optimal semantic-based feature grouping strategies for the input texts, leading to a better performance of Naïve Bayes classifiers.info:eu-repo/semantics/acceptedVersio

    Multi-objective evolutionary optimization for dimensionality reduction of texts represented by synsets

    Get PDF
    Despite new developments in machine learning classification techniques, improving the accuracy of spam filtering is a difficult task due to linguistic phenomena that limit its effectiveness. In particular, we highlight polysemy, synonymy, the usage of hypernyms/hyponyms, and the presence of irrelevant/confusing words. These problems should be solved at the pre-processing stage to avoid using inconsistent information in the building of classification models. Previous studies have suggested that the use of synset-based representation strategies could be successfully used to solve synonymy and polysemy problems. Complementarily, it is possible to take advantage of hyponymy/hypernymy-based to implement dimensionality reduction strategies. These strategies could unify textual terms to model the intentions of the document without losing any information (e.g., bringing together the synsets “viagra”, “ciallis”, “levitra” and other representing similar drugs by using “virility drug” which is a hyponym for all of them). These feature reduction schemes are known as lossless strategies as the information is not removed but only generalised. However, in some types of text classification problems (such as spam filtering) it may not be worthwhile to keep all the information and let dimensionality reduction algorithms discard information that may be irrelevant or confusing. In this work, we are introducing the feature reduction as a multi-objective optimisation problem to be solved using a Multi-Objective Evolutionary Algorithm (MOEA). Our algorithm allows, with minor modifications, to implement lossless (using only semantic-based synset grouping), low-loss (discarding irrelevant information and using semantic-based synset grouping) or lossy (discarding only irrelevant information) strategies. The contribution of this study is two-fold: (i) to introduce different dimensionality reduction methods (lossless, low-loss and lossy) as an optimization problem that can be solved using MOEA and (ii) to provide an experimental comparison of lossless and low-loss schemes for text representation. The results obtained support the usefulness of the low-loss method to improve the efficiency of classifiers.info:eu-repo/semantics/publishedVersio

    Computerized Generation and Finite Element Stress Analysis of Endodontic Rotary Files

    Get PDF
    Introduction: The finite element method has been extensively used to analyze the mechanical behavior of endodontic rotary files under bending and torsional conditions. This methodology requires elevated computer-aided design skills to reproduce the geometry of the endodontic file, and also mathematical knowledge to perform the finite element analysis. In this study, an automated procedure is proposed for the computerized generation and finite element analysis of endodontic rotary files under bending and torsional conditions. Methods: An endodontic rotary file with a 25mm total length, 0.25mm at the tip, 1.20mm at 16mm from the tip, 2mm pitch and squared cross section was generated using the proposed procedure and submitted for analysis under bending and torsional conditions by clamping the last 3mm of the endodontic rotary file and applying a transverse load of 0.1N and a torsional moment of 0.3N.cm. Results: The results of the finite element analyses showed a maximum von Mises stress of 398MPa resulting from the bending analysis and a maximum von Mises stress of 843MPa resulting from the torsional analysis, both of which are next to the encastre point. Conclusions: The automated procedure allows an accurate description of the geometry of the endodontic file to be obtained based on its design parameters as well as a finite element model of the endodontic file from the previously generated geometry

    Development of a screening tool enabling identification of infants and toddlers at risk for family abuse and neglect : A feasibility study from three South European countries

    Get PDF
    Background: Child abuse is a health and social problem, and few screening instruments are available for the detection of risk in primary health care. The aim was to develop a screening instrument to be used by professionals in the public health care sector, thus enabling the detection of infants and toddlers at risk of emotional and physical abuse and neglect, and to provide evidence for the feasibility of the instrument in Cyprus, Greece and Spain. Method: A total of 50 health professionals from paediatric public health-care centres in the three countries were involved in a three-step process for guiding the development of the screening tool and its application. Results: A nine-item screening tool, consisting of items assessing relational emotional abuse, physical abuse and other risk factors, was developed. The screening tool was applied on a total of 219 families with 0 to 3-year-old children attending public health centres in the three countries. Clinicians reported that they agreed on the inclusion of the questions (86.4-100%) and that they found the questions to be useful for the clinical evaluation of the family (63.2-100%). Conclusion: The screening tool shows considerable face validity and was reported feasible by an international set of clinician

    DESCRIPCIÓN DE LOS GENOTIPOS DE C. TRACHOMATIS EN EL HOSPITAL DE BASURTO-BILBAO

    Get PDF
    Se describen las características genotípicas de los aislamientos de C. trachomatis en una consulta de infecciones de transmisión sexual (ITS) en Bilbao para valorar la posible introducción de la cepa cwCT, variante aislada en Suecia, en la población diana del hospital de Basurto-Bilbao

    Phylogenomic Analysis of Odyssella thessalonicensis Fortifies the Common Origin of Rickettsiales, Pelagibacter ubique and Reclimonas americana Mitochondrion

    Get PDF
    Background: The evolution of the Alphaproteobacteria and origin of the mitochondria are topics of considerable debate. Most studies have placed the mitochondria ancestor within the Rickettsiales order. Ten years ago, the bacterium Odyssella thessalonicensis was isolated from Acanthamoeba spp., and the 16S rDNA phylogeny placed it within the Rickettsiales. Recently, the whole genome of O. thessalonicensis has been sequenced, and 16S rDNA phylogeny and more robust and accurate phylogenomic analyses have been performed with 65 highly conserved proteins. Methodology/Principal Findings: The results suggested that the O. thessalonicensis emerged between the Rickettsiales and other Alphaproteobacteria. The mitochondrial proteins of the Reclinomonas americana have been used to locate the phylogenetic position of the mitochondrion ancestor within the Alphaproteobacteria tree. Using the K tree score method, nine mitochondrion-encoded proteins, whose phylogenies were congruent with the Alphaproteobacteria phylogenomic tree, have been selected and concatenated for Bayesian and Maximum Likelihood phylogenies. The Reclinomonas americana mitochondrion is a sister taxon to the free-living bacteria Candidatus Pelagibacter ubique, and together, they form a clade that is deeply rooted in the Rickettsiales clade. Conclusions/Significance: The Reclinomonas americana mitochondrion phylogenomic study confirmed that mitochondri

    Instability of Plastid DNA in the Nuclear Genome

    Get PDF
    Functional gene transfer from the plastid (chloroplast) and mitochondrial genomes to the nucleus has been an important driving force in eukaryotic evolution. Non-functional DNA transfer is far more frequent, and the frequency of such transfers from the plastid to the nucleus has been determined experimentally in tobacco using transplastomic lines containing, in their plastid genome, a kanamycin resistance gene (neo) readymade for nuclear expression. Contrary to expectations, non-Mendelian segregation of the kanamycin resistance phenotype is seen in progeny of some lines in which neo has been transferred to the nuclear genome. Here, we provide a detailed analysis of the instability of kanamycin resistance in nine of these lines, and we show that it is due to deletion of neo. Four lines showed instability with variation between progeny derived from different areas of the same plant, suggesting a loss of neo during somatic cell division. One line showed a consistent reduction in the proportion of kanamycin-resistant progeny, suggesting a loss of neo during meiosis, and the remaining four lines were relatively stable. To avoid genomic enlargement, the high frequency of plastid DNA integration into the nuclear genome necessitates a counterbalancing removal process. This is the first demonstration of such loss involving a high proportion of recent nuclear integrants. We propose that insertion, deletion, and rearrangement of plastid sequences in the nuclear genome are important evolutionary processes in the generation of novel nuclear genes. This work is also relevant in the context of transgenic plant research and crop production, because similar processes to those described here may be involved in the loss of plant transgenes

    Distribution and Phylogeny of EFL and EF-1α in Euglenozoa Suggest Ancestral Co-Occurrence Followed by Differential Loss

    Get PDF
    BACKGROUND: The eukaryotic elongation factor EF-1alpha (also known as EF1A) catalyzes aminoacyl-tRNA binding by the ribosome during translation. Homologs of this essential protein occur in all domains of life, and it was previously thought to be ubiquitous in eukaryotes. Recently, however, a number of eukaryotes were found to lack EF-1alpha and instead encode a related protein called EFL (for EF-Like). EFL-encoding organisms are scattered widely across the tree of eukaryotes, and all have close relatives that encode EF-1alpha. This intriguingly complex distribution has been attributed to multiple lateral transfers because EFL's near mutual exclusivity with EF-1alpha makes an extended period of co-occurrence seem unlikely. However, differential loss may play a role in EFL evolution, and this possibility has been less widely discussed. METHODOLOGY/PRINCIPAL FINDINGS: We have undertaken an EST- and PCR-based survey to determine the distribution of these two proteins in a previously under-sampled group, the Euglenozoa. EF-1alpha was found to be widespread and monophyletic, suggesting it is ancestral in this group. EFL was found in some species belonging to each of the three euglenozoan lineages, diplonemids, kinetoplastids, and euglenids. CONCLUSIONS/SIGNIFICANCE: Interestingly, the kinetoplastid EFL sequences are specifically related despite the fact that the lineages in which they are found are not sisters to one another, suggesting that EFL and EF-1alpha co-occurred in an early ancestor of kinetoplastids. This represents the strongest phylogenetic evidence to date that differential loss has contributed to the complex distribution of EFL and EF-1alpha

    Syndromes of self-reported psychopathology for ages 18-59 in 29 societies

    Get PDF
    This study tested the multi-society generalizability of an eight-syndrome assessment model derived from factor analyses of American adults' self-ratings of 120 behavioral, emotional, and social problems. The Adult Self-Report (ASR; Achenbach and Rescorla 2003) was completed by 17,152 18-59-year-olds in 29 societies. Confirmatory factor analyses tested the fit of self-ratings in each sample to the eight-syndrome model. The primary model fit index (Root Mean Square Error of Approximation) showed good model fit for all samples, while secondary indices showed acceptable to good fit. Only 5 (0.06%) of the 8,598 estimated parameters were outside the admissible parameter space. Confidence intervals indicated that sampling fluctuations could account for the deviant parameters. Results thus supported the tested model in societies differing widely in social, political, and economic systems, languages, ethnicities, religions, and geographical regions. Although other items, societies, and analytic methods might yield different results, the findings indicate that adults in very diverse societies were willing and able to rate themselves on the same standardized set of 120 problem items. Moreover, their self-ratings fit an eight-syndrome model previously derived from self-ratings by American adults. The support for the statistically derived syndrome model is consistent with previous findings for parent, teacher, and self-ratings of 11/2-18-year-olds in many societies. The ASR and its parallel collateral-report instrument, the Adult Behavior Checklist (ABCL), may offer mental health professionals practical tools for the multi-informant assessment of clinical constructs of adult psychopathology that appear to be meaningful across diverse societies
    corecore