44 research outputs found

    Gender bias in natural language processing: BioCorpus-5, a preliminary multilingual Gender-Balanced Corpus of in-domain wikipedia biographies

    Get PDF
    In natural language processing and the blind application of machine learning reflect social biases and stereotypes in training data. In this project, we develop a corpus for future analysis applications of this bias. The corpus uses the data extracted by a tool called Wiki-Tailor which helps to obtain multilingual biographies from Wikipedia. The extracted multilingual corpus of biographies based on actors, linguists and physicists is analyzed, and it is balanced in gender in five languages: Spanish, Catalan, French, English and German. For that purpose, it is necessary to create a semi-automatic software divided into two parts. On the one hand, a manual alignment of the text of each biography is carried out in order to obtain five text files for each author where the information of each line is parallel for each language. On the other hand, each file is formatted in each language parallelized in xml. The xml data enters each author's information (identifier, language, genre, etc.) and is presented in a single text file to make the system simpler and more useful to process. Finally, statistics are obtained from the corpus created so it can be used in future automatic natural language processing or machine learning applications which require multilingual parallel corpus either at the level of sentence or document.En el procesado del lenguaje natural (NLP), los sistemas neurales de traducción automática y la aplicación ciega del aprendizaje automático reflejan bias en los datos de entrenamiento. En este proyecto se crea un corpus con futuras aplicaciones de análisis de este bias a partir de los datos extraídos por una herramienta llamada Wiki-Tailor, que ayuda a obtener biografías multilingües de Wikipedia. Este corpus de biografías multilingües extraídas centrada en actores, físicos y lingüistas es analizado y balanceado en cinco idiomas: castellano, catalán, francés, inglés y alemán. Para ello, es necesaria la creación de un software semiautomático dividido en dos partes. En primer lugar, se realiza una alineación manual del texto de cada biografía para obtener como resultado cinco archivos de texto para cada autor donde la información de cada línea es paralela para cada idioma. En segundo lugar, se da formato a cada archivo en cada idioma paralelizado en xml. Los datos xml entran la información de cada autor (identificador, idioma, género, etc.) y se presentan en un archivo de texto único para que el sistema sea más sencillo y útil de procesar. Finalmente se obtienen estadísticas del corpus creado para que pueda ser utilizado en futuras aplicaciones de procesamiento automático del lenguaje natural o de aprendizaje automático que requieran corpus paralelo multilingüe, ya sea a nivel de oración o de documento.En el processament del llenguatge natural (NLP), els sistemes neurals de traducció automàtica i l'aplicació de l'aprenentatge automàtic reflecteixen bias i estereotips socials a l'entrenament de dades. En aquest projecte es crea un corpus amb futures aplicacions d'anàlisi d'aquest bias a partir de les dades extretes d'una eina anomenada Wiki-Tailor, que ajuda a obtenir biografies multilingües de Wikipedia. Aquest corpus de biografies multilingües extretes centrades en actors, físics i lingüistes és analitzat i balancejat en cinc idiomes diferents: castellà, català, francès, anglès i alemany. Per la seva realització, és necessària la creació d'un software semiautomàtic dividit en dos parts. En primer lloc, es realitza un alineament manual del text de cada biografia per obtenir com a resultat cinc arxius de text per a cada autor, on la informació de cada línia és paral·lela per a cada idioma. En segon lloc, s'utilitzen les dades en xml per marcar la informació paral·lela de cada autor (identificador, idioma, gènere, etc.) i es presenten en un fitxer de text únic perquè el sistema sigui més senzill i útil de processar. Finalment s'obtenen estadístiques del corpus creat per poder ser utilitzat en futures aplicacions de processament automàtic del llenguatge natural o d'aprenentatge automàtic que requereixin corpus paral·lel multilingüe, sigui a nivell d'oració o de document

    NetemCG – IP packet-loss injection using a continuous-time Gilbert model

    Get PDF
    Injection of IP packet loss is a versatile method for emulating real-world network conditions in performance studies. In order to reproduce realistic packet-loss patterns, stochastic fault-models are used. In this report we desribe our implementation of a Linux kernel module using a Continuous-Time Gilbert Model for packet-loss injection

    3-{1-[(2,4-Dinitrophenyl)hydrazino]­ethyl­idene}-5-(1-methylpropyl)pyrrolidine-2,4-dione

    Get PDF
    In the title compound, C16H19N5O6, two intramolecular N—H⋯O hydrogen bonds help to establish the conformation. In the crystal, intermolecular N—H⋯O links result in chains propagating in [010]

    Murine and human pluripotent stem cell-derived cardiac bodies form contractile myocardial tissue in vitro

    Get PDF
    AimsWe explored the use of highly purified murine and human pluripotent stem cell (PSC)-derived cardiomyocytes (CMs) to generate functional bioartificial cardiac tissue (BCT) and investigated the role of fibroblasts, ascorbic acid (AA), and mechanical stimuli on tissue formation, maturation, and functionality.Methods and resultsMurine and human embryonic/induced PSC-derived CMs were genetically enriched to generate three-dimensional CM aggregates, termed cardiac bodies (CBs). Addressing the critical limitation of major CM loss after single-cell dissociation, non-dissociated CBs were used for BCT generation, which resulted in a structurally and functionally homogenous syncytium. Continuous in situ characterization of BCTs, for 21 days, revealed that three critical factors cooperatively improve BCT formation and function: both (i) addition of fibroblasts and (ii) ascorbic acid supplementation support extracellular matrix remodelling and CB fusion, and (iii) increasing static stretch supports sarcomere alignment and CM coupling. All factors together considerably enhanced the contractility of murine and human BCTs, leading to a so far unparalleled active tension of 4.4 mN/mm2 in human BCTs using optimized conditions. Finally, advanced protocols were implemented for the generation of human PSC-derived cardiac tissue using a defined animal-free matrix composition.ConclusionBCT with contractile forces comparable with native myocardium can be generated from enriched, PSC-derived CMs, based on a novel concept of tissue formation from non-dissociated cardiac cell aggregates. In combination with the successful generation of tissue using a defined animal-free matrix, this represents a major step towards clinical applicability of stem cell-based heart tissue for myocardial repair. © 2013 The Author

    Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 2 Core Release 2

    Get PDF
    Computational models can help researchers to interpret data, understand biological functions, and make quantitative predictions. The Systems Biology Markup Language (SBML) is a file format for representing computational models in a declarative form that different software systems can exchange. SBML is oriented towards describing biological processes of the sort common in research on a number of topics, including metabolic pathways, cell signaling pathways, and many others. By supporting SBML as an input/output format, different tools can all operate on an identical representation of a model, removing opportunities for translation errors and assuring a common starting point for analyses and simulations. This document provides the specification for Release 2 of Version 2 of SBML Level 3 Core. The specification defines the data structures prescribed by SBML as well as their encoding in XML, the eXtensible Markup Language. Release 2 corrects some errors and clarifies some ambiguities discovered in Release 1. This specification also defines validation rules that determine the validity of an SBML document, and provides many examples of models in SBML form. Other materials and software are available from the SBML project website at http://sbml.org/

    Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 2 Core Release 2

    Get PDF
    Computational models can help researchers to interpret data, understand biological functions, and make quantitative predictions. The Systems Biology Markup Language (SBML) is a file format for representing computational models in a declarative form that different software systems can exchange. SBML is oriented towards describing biological processes of the sort common in research on a number of topics, including metabolic pathways, cell signaling pathways, and many others. By supporting SBML as an input/output format, different tools can all operate on an identical representation of a model, removing opportunities for translation errors and assuring a common starting point for analyses and simulations. This document provides the specification for Release 2 of Version 2 of SBML Level 3 Core. The specification defines the data structures prescribed by SBML as well as their encoding in XML, the eXtensible Markup Language. Release 2 corrects some errors and clarifies some ambiguities discovered in Release 1. This specification also defines validation rules that determine the validity of an SBML document, and provides many examples of models in SBML form. Other materials and software are available from the SBML project website at http://sbml.org/

    Performance of Survivin mRNA as a Biomarker for Bladder Cancer in the Prospective Study UroScreen

    Get PDF
    BACKGROUND: Urinary biomarkers have the potential to improve the early detection of bladder cancer. Most of the various known markers, however, have only been evaluated in studies with cross-sectional design. For proper validation a longitudinal design would be preferable. We used the prospective study UroScreen to evaluate survivin, a potential biomarker that has multiple functions in carcinogenesis. METHODS/RESULTS: Survivin was analyzed in 5,716 urine samples from 1,540 chemical workers previously exposed to aromatic amines. The workers participated in a surveillance program with yearly examinations between 2003 and 2010. RNA was extracted from urinary cells and survivin was determined by Real-Time PCR. During the study, 19 bladder tumors were detected. Multivariate generalized estimation equation (GEE) models showed that β-actin, representing RNA yield and quality, had the strongest influence on survivin positivity. Inflammation, hematuria and smoking did not confound the results. Survivin had a sensitivity of 21.1% for all and 36.4% for high-grade tumors. Specificity was 97.5%, the positive predictive value (PPV) 9.5%, and the negative predictive value (NPV) 99.0%. CONCLUSIONS: In this prospective and so far largest study on survivin, the marker showed a good NPV and specificity but a low PPV and sensitivity. This was partly due to the low number of cases, which limits the validity of the results. Compliance, urine quality, problems with the assay, and mRNA stability influenced the performance of survivin. However, most issues could be addressed with a more reliable assay in the future. One important finding is that survivin was not influenced by confounders like inflammation and exhibited a relatively low number of false-positives. Therefore, despite the low sensitivity, survivin may still be considered as a component of a multimarker panel
    corecore