2,179 research outputs found

    SDRS: a new lossless dimensionality reduction for text corpora

    Get PDF
    In recent years, most content-based spam filters have been implemented using Machine Learning (ML) approaches by means of token-based representations of textual contents. After introducing multiple performance enhancements, the impact has been virtually irrelevant. Recent studies have introduced synset-based content representations as a reliable way to improve classification, as well as different forms to take advantage of semantic information to address problems, such as dimensionality reduction. These preliminary solutions present some limitations and enforce simplifications that must be gradually redefined in order to obtain significant improvements in spam content filtering. This study addresses the problem of feature reduction by introducing a new semantic-based proposal (SDRS) that avoids losing knowledge (lossless). Synset-features can be semantically grouped by taking advantage of taxonomic relations (mainly hypernyms) provided by BabelNet ontological dictionary (e.g. “Viagra” and “Cialis” can be summarized into the single features “anti-impotence drug”, “drug” or “chemical substance” depending on the generalization of 1, 2 or 3 levels). In order to decide how many levels should be used to generalize each synset of a dataset, our proposal takes advantage of Multi-Objective Evolutionary Algorithms (MOEA) and particularly, of the Non-dominated Sorting Genetic Algorithm (NSGA-II). We have compared the performance achieved by a Naïve Bayes classifier, using both token-based and synset-based dataset representations, with and without executing dimensional reductions. As a result, our lossless semantic reduction strategy was able to find optimal semantic-based feature grouping strategies for the input texts, leading to a better performance of Naïve Bayes classifiers.info:eu-repo/semantics/acceptedVersio

    Multi-objective evolutionary optimization for dimensionality reduction of texts represented by synsets

    Get PDF
    Despite new developments in machine learning classification techniques, improving the accuracy of spam filtering is a difficult task due to linguistic phenomena that limit its effectiveness. In particular, we highlight polysemy, synonymy, the usage of hypernyms/hyponyms, and the presence of irrelevant/confusing words. These problems should be solved at the pre-processing stage to avoid using inconsistent information in the building of classification models. Previous studies have suggested that the use of synset-based representation strategies could be successfully used to solve synonymy and polysemy problems. Complementarily, it is possible to take advantage of hyponymy/hypernymy-based to implement dimensionality reduction strategies. These strategies could unify textual terms to model the intentions of the document without losing any information (e.g., bringing together the synsets “viagra”, “ciallis”, “levitra” and other representing similar drugs by using “virility drug” which is a hyponym for all of them). These feature reduction schemes are known as lossless strategies as the information is not removed but only generalised. However, in some types of text classification problems (such as spam filtering) it may not be worthwhile to keep all the information and let dimensionality reduction algorithms discard information that may be irrelevant or confusing. In this work, we are introducing the feature reduction as a multi-objective optimisation problem to be solved using a Multi-Objective Evolutionary Algorithm (MOEA). Our algorithm allows, with minor modifications, to implement lossless (using only semantic-based synset grouping), low-loss (discarding irrelevant information and using semantic-based synset grouping) or lossy (discarding only irrelevant information) strategies. The contribution of this study is two-fold: (i) to introduce different dimensionality reduction methods (lossless, low-loss and lossy) as an optimization problem that can be solved using MOEA and (ii) to provide an experimental comparison of lossless and low-loss schemes for text representation. The results obtained support the usefulness of the low-loss method to improve the efficiency of classifiers.info:eu-repo/semantics/publishedVersio

    Polen atmosférico en San Sebastián: 1983, 1984, 1985. I. Polen total y Gramíneas

    Get PDF
    Con un Captador volumétrico Burkard se ha estudiado el polen atmosférico en San Sebastián desde el 21 de marzo de 1983 al 30 de septiembre de 1985. Se han confeccionado unos gráficos de concentraciones medias semanales de pólenes totales y de Gramíneas como contribución a la realización del Mapa Polínico de España

    Plan de adaptación de la subcuenca del arroyo Tortugas ante un escenario de cambio climático

    Get PDF
    En las últimas décadas en la región central de la llanura pampeana argentina, el manejo del uso del suelo junto a características naturales de anegamiento (tipo de suelo, napas poco profundas, limitada pendiente) y variaciones en las precipitaciones (asignadas al cambio climático), han resultado en inundaciones, con consecuencias sociales, económicas y ambientales. Una de las cuencas más afectadas ha sido la del Arroyo Tortugas (Córdoba), donde sucesivos eventos de inundación, afectaron tanto actividades agropecuarias como urbanas. A pesar de la importancia y frecuencia de dichos eventos, los cuales se prevén que se incrementen en un contexto de creciente variabilidad climática, no ha sido propuesto un plan integral que contemple la adaptación de las actividades dentro de la cuenca. En consecuencia, el objetivo del siguiente trabajo propone un plan de gestión integral potencial para la adaptación del sistema agropecuario y urbano de la Subcuenca Arroyo Tortugas (SAT) en la Provincia de Córdoba, Argentina. Se analizaron las características generales de la cuenca y su comportamiento climatológico e hidrológico, así como proyecciones de eventos extremos (temperatura y precipitaciones) provistos por la Tercera Comunicación Nacional de Cambio Climático para la región, como ejes orientadores de la gestión de cuencas y la elección de medidas de adaptación al cambio climático del sector agropecuario y urbano, ante potenciales eventos de inundación y temperaturas elevadas en la SAT.Trabajo publicado en Acta Bioquímica Clínica Latinoamericana; no. 52, supl. 2, parte II, diciembre de 2018.Universidad Nacional de La Plat

    Characteristics of Liver Transplantation in Argentina: A Multicenter Study

    Get PDF
    Introduction: There is a lack of information regarding outcomes after liver transplant in Latin America. Objectives: This study sought to describe outcomes after liver transplant in adult patients from Argentina. Methods: We performed an ambispective cohort study of adult patients transplanted between June 2010 and October 2012 in 6 centers from Argentina. Only patients who survived after the first 48 hours postransplantation were included. Pretransplantation and posttransplantation data were collected. Results: A total of 200 patients were included in the study. Median age at time of transplant was 50 (interquartile range [IQR] 26 to 54) years. In total, 173 (86%) patients had cirrhosis, and the most frequent etiology in these patients was hepatitis C (32%). A total of 35 (17%) patients were transplanted with hepatocellular carcinoma. In patients with cirrhosis, the median Model for End-Stage Liver Disease (MELD) score at time of liver transplant was 25 (IQR 19 to 30). Median time on the waiting list for elective patients was 101 (IQR 27 to 295) days, and 3 (IQR 2 to 4) days for urgent patients. Almost 40% of the patients were readmitted during the first 6 months after liver transplant. Acute rejection occurred in 27% of the patients. Biliary and vascular complications were reported in 39 (19%) and 19 (9%) patients, respectively. Renal failure, diabetes, and dyslipidemia were present in 40 (26%), 87 (57%), and 77 (50%) at 2 years, respectively. Conclusions: We believe the information contained in this article might be of value for reviewing current practices and developing local policies.Fil: Haddad, L.. Instituto Universitario del Hospital Italiano de Buenos Aires; ArgentinaFil: Marciano, S.. Instituto Universitario del Hospital Italiano de Buenos Aires; ArgentinaFil: Cleres, M.. Fundación Favaloro; ArgentinaFil: Zerega, A.. Sanatorio Allende; ArgentinaFil: Piñero, F.. Hospital Universitario Austral; ArgentinaFil: Orozco, F.. Hospital Aleman; ArgentinaFil: Braslavsky, G.. Hospital General de Agudos Cosme Argerich; ArgentinaFil: Mendizabal, M.. Hospital Universitario Austral; ArgentinaFil: Gondolesi, Gabriel Eduardo. Fundación Favaloro; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Medicina Traslacional, Trasplante y Bioingeniería. Fundación Favaloro. Instituto de Medicina Traslacional, Trasplante y Bioingeniería; ArgentinaFil: Gil, O.. Sanatorio Allende; ArgentinaFil: Silva, M.. Hospital Universitario Austral; ArgentinaFil: Mastai, Ricardo. Hospital Aleman; ArgentinaFil: Imvertaza, O.. Hospital General de Agudos Cosme Argerich; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Medicina Traslacional, Trasplante y Bioingeniería. Fundación Favaloro. Instituto de Medicina Traslacional, Trasplante y Bioingeniería; ArgentinaFil: Descalzi, V.. Fundación Favaloro; ArgentinaFil: Gadano, A.. Instituto Universitario del Hospital Italiano de Buenos Aires; Argentin

    Mitochondrial echoes of first settlement and genetic continuity in El Salvador

    Get PDF
    Background: From Paleo-Indian times to recent historical episodes, the Mesoamerican isthmus played an important role in the distribution and patterns of variability all around the double American continent. However, the amount of genetic information currently available on Central American continental populations is very scarce. In order to shed light on the role of Mesoamerica in the peopling of the New World, the present study focuses on the analysis of the mtDNA variation in a population sample from El Salvador. Methodology/Principal Findings: We have carried out DNA sequencing of the entire control region of the mitochondrial DNA (mtDNA) genome in 90 individuals from El Salvador. We have also compiled more than 3,985 control region profiles from the public domain and the literature in order to carry out inter-population comparisons. The results reveal a predominant Native American component in this region: by far, the most prevalent mtDNA haplogroup in this country (at ~90%) is A2, in contrast with other North, Meso- and South American populations. Haplogroup A2 shows a star-like phylogeny and is very diverse with a substantial proportion of mtDNAs (45%; sequence range 16090–16365) still unobserved in other American populations. Two different Bayesian approaches used to estimate admixture proportions in El Salvador shows that the majority of the mtDNAs observed come from North America. A preliminary founder analysis indicates that the settlement of El Salvador occurred about 13,400±5,200 Y.B.P.. The founder age of A2 in El Salvador is close to the overall age of A2 in America, which suggests that the colonization of this region occurred within a few thousand years of the initial expansion into the Americas. Conclusions/Significance: As a whole, the results are compatible with the hypothesis that today's A2 variability in El Salvador represents to a large extent the indigenous component of the region. Concordant with this hypothesis is also the observation of a very limited contribution from European and African women (~5%). This implies that the Atlantic slave trade had a very small demographic impact in El Salvador in contrast to its transformation of the gene pool in neighbouring populations from the Caribbean facade

    Reconstructing the Indian Origin and Dispersal of the European Roma: A Maternal Genetic Perspective

    Get PDF
    Previous genetic, anthropological and linguistic studies have shown that Roma (Gypsies) constitute a founder population dispersed throughout Europe whose origins might be traced to the Indian subcontinent. Linguistic and anthropological evidence point to Indo-Aryan ethnic groups from North-western India as the ancestral parental population of Roma. Recently, a strong genetic hint supporting this theory came from a study of a private mutation causing primary congenital glaucoma. In the present study, complete mitochondrial control sequences of Iberian Roma and previously published maternal lineages of other European Roma were analyzed in order to establish the genetic affinities among Roma groups, determine the degree of admixture with neighbouring populations, infer the migration routes followed since the first arrival to Europe, and survey the origin of Roma within the Indian subcontinent. Our results show that the maternal lineage composition in the Roma groups follows a pattern of different migration routes, with several founder effects, and low effective population sizes along their dispersal. Our data allowed the confirmation of a North/West migration route shared by Polish, Lithuanian and Iberian Roma. Additionally, eleven Roma founder lineages were identified and degrees of admixture with host populations were estimated. Finally, the comparison with an extensive database of Indian sequences allowed us to identify the Punjab state, in North-western India, as the putative ancestral homeland of the European Roma, in agreement with previous linguistic and anthropological studies

    Measurement of χ c1 and χ c2 production with s√ = 7 TeV pp collisions at ATLAS

    Get PDF
    The prompt and non-prompt production cross-sections for the χ c1 and χ c2 charmonium states are measured in pp collisions at s√ = 7 TeV with the ATLAS detector at the LHC using 4.5 fb−1 of integrated luminosity. The χ c states are reconstructed through the radiative decay χ c → J/ψγ (with J/ψ → μ + μ −) where photons are reconstructed from γ → e + e − conversions. The production rate of the χ c2 state relative to the χ c1 state is measured for prompt and non-prompt χ c as a function of J/ψ transverse momentum. The prompt χ c cross-sections are combined with existing measurements of prompt J/ψ production to derive the fraction of prompt J/ψ produced in feed-down from χ c decays. The fractions of χ c1 and χ c2 produced in b-hadron decays are also measured

    Measurement of the production of a W boson in association with a charm quark in pp collisions at √s = 7 TeV with the ATLAS detector

    Get PDF
    The production of a W boson in association with a single charm quark is studied using 4.6 fb−1 of pp collision data at s√ = 7 TeV collected with the ATLAS detector at the Large Hadron Collider. In events in which a W boson decays to an electron or muon, the charm quark is tagged either by its semileptonic decay to a muon or by the presence of a charmed meson. The integrated and differential cross sections as a function of the pseudorapidity of the lepton from the W-boson decay are measured. Results are compared to the predictions of next-to-leading-order QCD calculations obtained from various parton distribution function parameterisations. The ratio of the strange-to-down sea-quark distributions is determined to be 0.96+0.26−0.30 at Q 2 = 1.9 GeV2, which supports the hypothesis of an SU(3)-symmetric composition of the light-quark sea. Additionally, the cross-section ratio σ(W + +c¯¯)/σ(W − + c) is compared to the predictions obtained using parton distribution function parameterisations with different assumptions about the s−s¯¯¯ quark asymmetry

    Measurements of fiducial and differential cross sections for Higgs boson production in the diphoton decay channel at s√=8 TeV with ATLAS

    Get PDF
    Measurements of fiducial and differential cross sections are presented for Higgs boson production in proton-proton collisions at a centre-of-mass energy of s√=8 TeV. The analysis is performed in the H → γγ decay channel using 20.3 fb−1 of data recorded by the ATLAS experiment at the CERN Large Hadron Collider. The signal is extracted using a fit to the diphoton invariant mass spectrum assuming that the width of the resonance is much smaller than the experimental resolution. The signal yields are corrected for the effects of detector inefficiency and resolution. The pp → H → γγ fiducial cross section is measured to be 43.2 ±9.4(stat.) − 2.9 + 3.2 (syst.) ±1.2(lumi)fb for a Higgs boson of mass 125.4GeV decaying to two isolated photons that have transverse momentum greater than 35% and 25% of the diphoton invariant mass and each with absolute pseudorapidity less than 2.37. Four additional fiducial cross sections and two cross-section limits are presented in phase space regions that test the theoretical modelling of different Higgs boson production mechanisms, or are sensitive to physics beyond the Standard Model. Differential cross sections are also presented, as a function of variables related to the diphoton kinematics and the jet activity produced in the Higgs boson events. The observed spectra are statistically limited but broadly in line with the theoretical expectations
    corecore