260 research outputs found

    ESSENCE: a portable methodology for building information extraction systems

    Get PDF
    One of the most important issues when constructing an Information Extraction System is how to obtain the knowledge needed for identifying relevant information in a document. A manual approach not only is an expensive solution but also has a negative effect on the portability of the system across domains. To automatize the knowledge acquisition process may partially solve this problem even if a human expert takes part in it only for specific tasks. This work presents a methodology ({sc Essence}) to automatically learn information extraction patterns from unrestricted text corpus representative of the domain. The methodology includes different steps from which we stress the specific pattern generalization process. Generalization reduces the pattern base and therefore reduces the amount of information to validate by an expert. As we will see, the use of the lexical knowledge along with the lexico-semantic relations from WordNet are our basis knowledge source, especially, for the generalization process.Postprint (published version

    Acquiring information extraction patterns from unannotated corpora

    Get PDF
    Information Extraction (IE) can be defined as the task of automatically extracting preespecified kind of information from a text document. The extracted information is encoded in the required format and then can be used, for example, for text summarization or as accurate index to retrieve new documents.The main issue when building IE systems is how to obtain the knowledge needed to identify relevant information in a document. Today, IE systems are commonly based on extraction rules or IE patterns to represent the kind of information to be extracted. Most approaches to IE pattern acquisition require expert human intervention in many steps of the acquisition process. This dissertation presents a novel method for acquiring IE patterns, Essence, that significantly reduces the need for human intervention. The method is based on ELA, a specifically designed learning algorithm for acquiring IE patterns from unannotated corpora.The distinctive features of Essence and ELA are that 1) they permit the automatic acquisition of IE patterns from unrestricted and untagged text representative of the domain, due to 2) their ability to identify regularities around semantically relevant concept-words for the IE task by 3) using non-domain-specific lexical knowledge tools such as WordNet and 4) restricting the human intervention to defining the task, and validating and typifying the set of IE patterns obtained.Since Essence does not require a corpus annotated with the type of information to be extracted and it does makes use of a general purpose ontology and widely applied syntactic tools, it reduces the expert effort required to build an IE system and therefore also reduces the effort of porting the method to any domain.In order to Essence be validated we conducted a set of experiments to test the performance of the method. We used Essence to generate IE patterns for a MUC-like task. Nevertheless, the evaluation procedure for MUC competitions does not provide a sound evaluation of IE systems, especially of learning systems. For this reason, we conducted an exhaustive set of experiments to further test the abilities of Essence.The results of these experiments indicate that the proposed method is able to learn effective IE patterns

    Introduction: towards a cross-disciplinary history of the global in the humanities and the social sciences

    Get PDF
    The interdisciplinary analysis of historical and contemporary global issues with increasingly productive flows of theories, concepts, methods, and practices is a principal goal in global studies. However, within the humanities and the social sciences, the idea of the 'global' is often restrained by disciplinary boundaries, with scant dialogue and transference between them. The present special issue addresses this fundamental gap by historicizing the notion of the 'global' in an interdisciplinary dialogue, with approaches from history, sociology, anthropology, literary studies, art history, and media and communication studies. Our objective is to gain greater insights on the global approach from several disciplines and to let their borrowings and contributions emerge

    Feature selection for support vector machines by alignment with ideal kernel

    Get PDF
    Feature selection has several potentially beneficial uses in machine learning. Some of them are to improve the performance of the learning method by removing noisy features, to reduce the feature set in data collection, and to better understand the data. In this report we present how to use empirical alignment, a well known measure for the fitness of kernels to data labels, to perform feature selection for support vector machines. We show that this measure improves the results obtained with other widely used measures for feature selection (like information gain or correlation) in linearly separable problems. We also show how alignment can be successfully used to select relevant features in non-linearly separable problems when using support vector machines.Postprint (published version

    Fast methodology for the reliable determination of nonylphenol in water samples by minimal labeling isotope dilution mass spectrometry

    Get PDF
    In this work we have developed and validated an accurate and fast methodology for the determination of 4-nonylphenol (technical mixture) in complex matrix water samples by UHPLC–ESI-MS/MS. The procedure is based on isotope dilution mass spectrometry (IDMS) in combination with isotope pattern deconvolution (IPD), which provides the concentration of the analyte directly from the spiked sample without requiring any methodological calibration graph. To avoid any possible isotopic effect during the analytical procedure the in-house synthesized 13C1-4-(3,6-dimethyl-3-heptyl)phenol was used as labeled compound. This proposed surrogate was able to compensate the matrix effect even from wastewater samples. A SPE pre-concentration step together with exhaustive efforts to avoid contamination were included to reach the signal-to-noise ratio necessary to detect the endogenous concentrations present in environmental samples. Calculations were performed acquiring only three transitions, achieving limits of detection lower than 100 ng/g for all water matrix assayed. Recoveries within 83–108% and coefficients of variation ranging from 1.5% to 9% were obtained. On the contrary a considerable overestimation was obtained with the most usual classical calibration procedure using 4-n-nonylphenol as internal standard, demonstrating the suitability of the minimal labeling approach

    Comparison of approaches to deal with matrix effects in LC-MS/MS based determinations of mycotoxins in food and feed

    Get PDF
    This study deals with one of the major concerns in mycotoxin determinations: the matrix effect related to LC-MS/ MS systems with electrospray ionization sources. To this end, in a first approach, the matrix effect has been evaluated in two ways: monitoring the signal of a compound (added to the mobile phase) during the entire chromatographicrun, and by classical post-extraction addition. The study was focused on nine selected mycotoxins: aflatoxin B1, fumonisins B1, B2 and B3, ochratoxin A, deoxynivalenol, T-2 and HT-2 toxins and zearalenone in various sample extracts giving moderate to strong matrix effects (maize, compound feed, straw, spices). Although the permanent monitoring of a compound provided a qualitative way of evaluating the matrix effects at each retention time, we concluded that it was not adequate as a quantitative approach to correct for the matrix effect. Matrix effects measured by post-extraction addition showed that the strongest ion suppression occurred for the spices (up to -89%). Five different calibration approaches to compensate for matrix effects were compared: multi-level external calibration using isotopically labelled internal standards, multi-level and single level standard addition, and two ways of singlepoint internal calibration: one point isotopic internal calibration and isotope pattern deconvolution. In general, recoveries and precision meeting the European Union requirements could be achieved with all approaches, with the exception of the single level standard addition at levels too close to the concentration in the sample. When an isotopically labelled internal standard is not available, single-level standard addition is the most efficient option.The Dutch Ministry of Economic Affairs is acknowledged for financially supporting this work. The authors acknowledge the financial support from Generalitat Valenciana (Research group of excellence Prometeo 2009/054 and Collaborative Research on Environment and Food Safety ISIC/2012/016). N. Fabregat-Cabello also acknowledges the Generalitat Valenciana for her Ph.D. research grant under the Program VALi+D

    Rapid screening of arsenic species in urine from exposed human by inductively coupled plasma mass spectrometry with germanium as internal standard

    Get PDF
    In the present work, internal standardization based on species-unspecific isotope dilution analysis technique is proposed in order to overcome the matrix effects and signal drift originated in the speciation of As in urine by HPLC-ICP-MS. To this end, 72Ge has been selected as a pseudo-isotope of As. The resulting mass flow chromatogram of the element allows the calculation of the corrected overall species concentrations without requiring any methodological calibration, providing high-throughput sample processing. The validation was carried out by analyzing a blank human urine fortified at three concentration levels and an unspiked human urine sample containing different species of arsenic. In all cases, recoveries ranging from 90 to 115% and RSD below 10% were attained with this approach. Furthermore, the proposed method provided results in excellent agreement with those obtained using standard additions and internal standard calibration, allowing a fast way to assess human exposure to arsenic species

    The polysemy of the words that children learn over time

    Get PDF
    Here we study polysemy as a potential learning bias in vocabulary learning in children. We employ a massive set of transcriptions of conversations between children and adults in English, to analyze the evolution of mean polysemy in the words produced by children whose ages range between 10 and 60 months. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, no dependency with time is found in adults. This may suggest that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Our hypothesis is twofold: (a) polysemy is a standalone bias or (b) polysemy is a side-effect of other biases. Interestingly, the bias for low polysemy above weakens when controlling by syntactic category (noun, verb, adjective or adverb). The pattern of the evolution of polysemy suggests that both hypotheses may apply to some extent, and that (b) would originate from a combination of the well-known preference for nouns and the lower polysemy of nouns with respect to other syntactic categories.Peer ReviewedPostprint (author's final draft

    Improving the WFD purposes by the incorporation of ecotoxicity tests

    Get PDF
    Trabajo presentado en la 4th SCARCE International Conference (Towards a better understanding of the links between stressors, hazard assessment and ecosystem services under water scarcity), celebrada en Cádiz el 25 y 26 de noviembre de 2013.The approval of the European Water Framework Dir ective (WFD) supposed a big step regarding aquatic ecosystems protection. According to this Directive, assessment of ecological status is based on three quality elements: biological, physicoc hemical and hydromorphological, but ecotoxicological status is still not included. Some studies have observed that biol ogical status is not always in coherence with physicochemical status, maybe due to the adaptati on mechanisms of aquatic organisms under chronic chemical exposure. In these situations, ecotoxicity t ests could be useful to obtain a better characterisation of these specific ecosystems. The general aim of this work is to add a battery of ecotoxicity tests to the current analyses defined by WFD in order to obtain a better ecological characteriza tion of freshwater systems. The specific aims of this work are: (1) to compare the effectiveness and viability of differen t ecotoxicity tests performed with freshwater sediments (directly and with pore water) ta king as target organisms different aquatic species, and (2) to evaluate the relationship between stream pollutants concentrations (organic pollutants and metals), biological and hydromorphological status and sediments ecotoxicity. For this purpose, thirteen sampling sites within the Ebro river watershed were selected. Data about priority pollutants in water, sediment and fish as well as biological and hydromor phological status of each sampling point will be achieved. Moreover, in each sampling reach, composite samples of sediment were collected by using a Van Veen grab. Sediment samples were stored at 4ºC prior to the ecotoxicity analysesThe ecotoxicity of pore water was evaluated by different bioassays ( Vibrio fischeri, Pseudokirshneriella subcapitata and Daphnia magna ) while the ecotoxicity of wh ole sediment was evaluated in Vibrio fischeri, Nitzschia palea and Chironomus riparius In addition, the concentration of total heavy metals and metal bioavailability was calculated by a sequential extraction according to the Community Bureau of Reference (BCR) method. To distinguish the potentially toxic fraction associated to heavy metals burden of sediments, an analysis of acid-volatile sulphide (AVS) and simultaneously extracted metals (SEM) was performed. Complementary sediment variables as humi dity, porosity, percentages of fines (<63 μm) organic carbon and organic matter were determined. This study expect to demonstrate that the integr ation of chemical, biological and ecotoxicological analyses could be crucial to unde rstand the hazard of pollutants in aquatic ecosystems, especially, in freshwater sediments. Future research in this area is needed in order to obtain more data and be able to establish a tree decision of freshwater analyses ev aluation. The poster will present the methodology purposed for this study as well as the first prelim inary results obtained from ecotoxicity tests.Authors would like to thank the Spanish Ministry of Economy and Competitiveness for its financial support through the project SCARCE (Consolider-Ingenio 2010 CSD2009-00065Peer Reviewe

    Kernel alignment for identifying objective criteria from brain MEG recordings in schizophrenia

    Get PDF
    The current wide access to data from different neuroimaging techniques has permitted to obtain data to explore the possibility of finding objective criteria that can be used for diagnostic purposes. In order to decide which features of the data are relevant for the diagnostic task, we present in this paper a simple method for feature selection based on kernel alignment with the ideal kernel in support vector machines (SVM). The method presented shows state-of-the-art performance while being more efficient than other methods for feature selection in SVM. It is also less prone to overfitting due to the properties of the alignment measure. All these abilities are essential in neuroimaging study, where the number of features representing recordings is usually very large compared with the number of recordings. The method has been applied to a dataset in order to determine objective criteria for the diagnosis of schizophrenia. The dataset analyzed has been obtained from multichannel magnetoencephalogram (MEG) recordings, corresponding to the recordings during the performance of a mismatch negativity (MMN) auditory task by a set of schizophrenia patients and a control group. All signal frequency bands are analyzed (from d (1–4 Hz) to high frequency ¿ (60–200 Hz)) and the signal correlations among the different sensors for these frequencies are used as features.Peer ReviewedPostprint (author's final draft
    corecore