67 research outputs found

    Differential Cell Sensitivity between OTA and LPS upon Releasing TNF-α

    Get PDF
    The release of tumor necrosis factor α (TNF-α) by ochratoxin A (OTA) was studied in various macrophage and non-macrophage cell lines and compared with E. coli lipopolysaccharide (LPS) as a standard TNF-α release agent. Cells were exposed either to 0, 2.5 or 12.5 ”mol/L OTA, or to 0.1 ”g/mL LPS, for up to 24 h. OTA at 2.5 ”mol/L and LPS at 0.1 ”g/mL were not toxic to the tested cells as indicated by viability markers. TNF-α was detected in the incubated cell medium of rat Kupffer cells, peritoneal rat macrophages, and the mouse monocyte macrophage cell line J774A.1: TNF-α concentrations were 1,000 pg/mL, 1,560 pg/mL, and 650 pg/mL, respectively, for 2.5 ”mol/L OTA exposure and 3,000 pg/mL, 2,600 pg/mL, and 2,115 pg/mL, respectively, for LPS exposure. Rat liver sinusoidal endothelial cells, rat hepatocytes, human HepG2 cells, and mouse L929 cells lacked any cytokine response to OTA, but showed a significant release of TNF-α after LPS exposure, with the exception of HepG2 cells. In non-responsive cell lines, OTA lacked both any activation of NF-ÎșB or the translocation of activated NF-ÎșB to the cell nucleus, i.e., in mouse L929 cells. In J774A.1 cells, OTA mediated TNF-α release via the pRaf/MEK 1/2-NF-ÎșB and p38-NF-ÎșB pathways, whereas LPS used pRaf/MEK 1/2–NF-ÎșB, but not p38-NF-ÎșB pathways. In contrast, in L929 cells, LPS used other pathways to activate NF-ÎșB. Our data indicate that only macrophages and macrophage derived cells respond to OTA and are considered as sources for TNF-α release upon OTA exposure

    Instantiation of n-Aires relationships in scientific articles guided by a domain termino-ontological resource

    No full text
    Cette thĂšse vise Ă  proposer de nouvelles mĂ©thodes de reprĂ©sentation et d’extraction de donnĂ©es expĂ©rimentales Ă  partir d’articles scientifiques.Ces mĂ©thodes ont Ă©tĂ© Ă©valuĂ©es sur un corpus d’articles dans le domaine des emballages alimentaires.L’objectif de cette thĂšse est de peupler une base de connaissances d’instances de relations N-Aires extraites de documents scientifiques textuels.Les donnĂ©es expĂ©rimentales peuvent ĂȘtre reprĂ©sentĂ©es sous forme de relations n-Aires composĂ©es d’arguments symboliques et quantitatifs.L’approche proposĂ©e s’appuie sur une Ressource Termino-Ontologique (RTO) et se dĂ©compose en deux Phases : (1) la reconnaissance et l’extraction des instances d’arguments et (2) la mise en relation de celles-ci dans des relations n-Aires.La Phase (1) propose une reprĂ©sentation originale des instances d’arguments extraites, appelĂ©e SciPuRe (Scientifique Publication Representation).Celle-ci intĂšgre des descripteurs ontologiques, lexicaux et structurels qui dĂ©crivent le contexte d’apparition des instances d’arguments et permettent de les trier selon leurs pertinences.Nos rĂ©sultats montrent l’importance du tri des instances pertinentes Ă  l’issue de la reconnaissance des arguments, les deux critĂšres les plus importants pour dĂ©terminer la pertinence d’une instance d’argument symbolique sont la spĂ©cificitĂ© du concept associĂ© Ă  l’argument dans la RTO et sa frĂ©quence dans le document.Pour les arguments quantitatifs, c’est l’appartenance de l’instance d’argument Ă  des sections des documents qui permet de dĂ©terminer sa pertinence.La Phase (2) s’appuie sur les informations prĂ©sentes dans les tableaux des documents pour guider l’extraction des relations n-Aires Ă  partir de relations partielles.Ces relations partielles sont ensuite complĂ©tĂ©es par les instances d’arguments reconnues lors de la Phase (1).Trois approches sont proposĂ©es et Ă©valuĂ©es afin d’identifier les instances d’arguments qui doivent complĂ©ter les relations :l’utilisation de la structure des documents, l’analyse des cooccurrences entre les instances d’arguments dans les textes, et enfin l’utilisation de modĂšles de word-embedding permettant de mesurer les similaritĂ©s entre les instances d’arguments candidates et les arguments dĂ©jĂ  renseignĂ©s dans les relations partielles.Nos expĂ©rimentations sur la Phase (2) confirment l’utilitĂ© des scores de pertinence calculĂ©s lors de la Phase (1) pour discriminer les instances d'arguments.L’analyse des rĂ©sultats avec diffĂ©rents filtrages des instances d’arguments candidates selon leurs pertinences montre un net effet positif lors du filtrage de 20% des instances avec les pertinences les plus faibles.Nous avons Ă©galement expĂ©rimentĂ© la possibilitĂ© de sĂ©lectionner plusieurs candidats pour chaque instance d’argument manquante dans une relation partielle, dans une approche d’assistance aux experts du domaine qui peuvent ensuite dĂ©terminer l’instance valide.Lors de la sĂ©lection d’un seul candidat, l’approche fondĂ©e sur les analyses des cooccurrences donne les meilleurs rĂ©sultats pour dĂ©tecter l’instance d’argument candidate valide.Avec une sĂ©lection plus importante, de trois ou cinq candidats, l’analyse des similaritĂ©s sĂ©mantiques permise par des modĂšles BERT de plongement lexicaux fournit de bons rĂ©sultats pour la dĂ©tection d’associations entre les instances d’arguments prĂ©sentes dans les relations partielles et les instances d’argument candidates Ă  la complĂ©tion des relations.Enfin, lors de la sĂ©lection de dix candidats, les expĂ©rimentations montrent que l’approche fondĂ©e sur la structure des documents est efficace pour complĂ©ter les relations n-Aires.This thesis belongs to the research field of smart data, where we search for specific information within large documents.It proposes new methods of representation and extraction of experimental data from scientific articles, more specifically in the domain of food packaging.The experimental data can be represented as n-Ary relations composed of symbolic and quantitative arguments.The latter are composed of a numerical value and a unit of measurement.The objective of this thesis is to populate a knowledge base with instances of N-Ary relations extracted from scientific textual documents.The proposed approach is based on an Ontological and Terminological Resource (OTR) and is divided into two Phases:(1) the recognition and extraction of argument instances of interest and(2) the linking of these instances in n-Ary relations.Phase (1) proposes an original representation of the extracted argument instances, called SciPuRe (Scientific Publication Representation).It integrates ontological, lexical and structural descriptors that describe the context of the argument instances and allows to sort them by their relevance.Phase (2) relies on the information present in the tables of the documents, extracted automatically, to guide the extraction of partial n-Arye relations, the tables containing an important part of the experimental data in the scientific articles.These partial relations are then completed with the argument instances recognized in Phase (1).Three approaches are proposed and evaluated in order to identify the argument instances that should complete the relations: the use of document structure, the analysis of cooccurrences between the argument instances in the texts, and finally the use of textit{word-embedding} models allowing to measure the similarities between the candidate argument instances and the arguments already filled in the partial relations.Our results show the importance of sorting the relevant instances after argument recognition in Phase (1) using SciPuRe features.Our experiments show that the two most important criteria for determining the relevance of a symbolic argument instance are the specificity of the concept associated with the argument in the OTR and its frequency in the document.For quantitative arguments, it is the apparition of the argument instance in sections of the documents that determines its relevance.Our experiments on Phase (2) confirm the usefulness of the relevance scores computed in Phase (1) to discriminate the instances.The analysis of the results with different filtering of the candidate argument instances according to their relevance shows a clear positive effect when filtering the fifth of the instances with the lowest relevance.Our experiments also consider the possibility of selecting multiple candidates for each missing argument instance in a partial relation, in an approach to assist domain experts who can then determine the valid instance.When selecting a single candidate, the approach based on co-occurrence analyses gives the best results in detecting the valid candidate argument instance.With a larger selection of three or five candidates, semantic similarity analysis enabled by BERT word embeddings model provides good results for detecting associations between the argument instances present in partial relations and the candidate argument instances for relation completion.Finally, when selecting ten candidates, the experiments show that the approach based on document structure is effective to complete the n-Ary relations

    Instanciation de relations n-aires dans des articles scientifiques guidée par une ressource termino-ontologique de domaine

    No full text
    This thesis belongs to the research field of smart data, where we search for specific information within large documents.It proposes new methods of representation and extraction of experimental data from scientific articles, more specifically in the domain of food packaging.The experimental data can be represented as n-Ary relations composed of symbolic and quantitative arguments.The latter are composed of a numerical value and a unit of measurement.The objective of this thesis is to populate a knowledge base with instances of N-Ary relations extracted from scientific textual documents.The proposed approach is based on an Ontological and Terminological Resource (OTR) and is divided into two Phases:(1) the recognition and extraction of argument instances of interest and(2) the linking of these instances in n-Ary relations.Phase (1) proposes an original representation of the extracted argument instances, called SciPuRe (Scientific Publication Representation).It integrates ontological, lexical and structural descriptors that describe the context of the argument instances and allows to sort them by their relevance.Phase (2) relies on the information present in the tables of the documents, extracted automatically, to guide the extraction of partial n-Arye relations, the tables containing an important part of the experimental data in the scientific articles.These partial relations are then completed with the argument instances recognized in Phase (1).Three approaches are proposed and evaluated in order to identify the argument instances that should complete the relations: the use of document structure, the analysis of cooccurrences between the argument instances in the texts, and finally the use of textit{word-embedding} models allowing to measure the similarities between the candidate argument instances and the arguments already filled in the partial relations.Our results show the importance of sorting the relevant instances after argument recognition in Phase (1) using SciPuRe features.Our experiments show that the two most important criteria for determining the relevance of a symbolic argument instance are the specificity of the concept associated with the argument in the OTR and its frequency in the document.For quantitative arguments, it is the apparition of the argument instance in sections of the documents that determines its relevance.Our experiments on Phase (2) confirm the usefulness of the relevance scores computed in Phase (1) to discriminate the instances.The analysis of the results with different filtering of the candidate argument instances according to their relevance shows a clear positive effect when filtering the fifth of the instances with the lowest relevance.Our experiments also consider the possibility of selecting multiple candidates for each missing argument instance in a partial relation, in an approach to assist domain experts who can then determine the valid instance.When selecting a single candidate, the approach based on co-occurrence analyses gives the best results in detecting the valid candidate argument instance.With a larger selection of three or five candidates, semantic similarity analysis enabled by BERT word embeddings model provides good results for detecting associations between the argument instances present in partial relations and the candidate argument instances for relation completion.Finally, when selecting ten candidates, the experiments show that the approach based on document structure is effective to complete the n-Ary relations.Cette thĂšse vise Ă  proposer de nouvelles mĂ©thodes de reprĂ©sentation et d’extraction de donnĂ©es expĂ©rimentales Ă  partir d’articles scientifiques.Ces mĂ©thodes ont Ă©tĂ© Ă©valuĂ©es sur un corpus d’articles dans le domaine des emballages alimentaires.L’objectif de cette thĂšse est de peupler une base de connaissances d’instances de relations N-Aires extraites de documents scientifiques textuels.Les donnĂ©es expĂ©rimentales peuvent ĂȘtre reprĂ©sentĂ©es sous forme de relations n-Aires composĂ©es d’arguments symboliques et quantitatifs.L’approche proposĂ©e s’appuie sur une Ressource Termino-Ontologique (RTO) et se dĂ©compose en deux Phases : (1) la reconnaissance et l’extraction des instances d’arguments et (2) la mise en relation de celles-ci dans des relations n-Aires.La Phase (1) propose une reprĂ©sentation originale des instances d’arguments extraites, appelĂ©e SciPuRe (Scientifique Publication Representation).Celle-ci intĂšgre des descripteurs ontologiques, lexicaux et structurels qui dĂ©crivent le contexte d’apparition des instances d’arguments et permettent de les trier selon leurs pertinences.Nos rĂ©sultats montrent l’importance du tri des instances pertinentes Ă  l’issue de la reconnaissance des arguments, les deux critĂšres les plus importants pour dĂ©terminer la pertinence d’une instance d’argument symbolique sont la spĂ©cificitĂ© du concept associĂ© Ă  l’argument dans la RTO et sa frĂ©quence dans le document.Pour les arguments quantitatifs, c’est l’appartenance de l’instance d’argument Ă  des sections des documents qui permet de dĂ©terminer sa pertinence.La Phase (2) s’appuie sur les informations prĂ©sentes dans les tableaux des documents pour guider l’extraction des relations n-Aires Ă  partir de relations partielles.Ces relations partielles sont ensuite complĂ©tĂ©es par les instances d’arguments reconnues lors de la Phase (1).Trois approches sont proposĂ©es et Ă©valuĂ©es afin d’identifier les instances d’arguments qui doivent complĂ©ter les relations :l’utilisation de la structure des documents, l’analyse des cooccurrences entre les instances d’arguments dans les textes, et enfin l’utilisation de modĂšles de word-embedding permettant de mesurer les similaritĂ©s entre les instances d’arguments candidates et les arguments dĂ©jĂ  renseignĂ©s dans les relations partielles.Nos expĂ©rimentations sur la Phase (2) confirment l’utilitĂ© des scores de pertinence calculĂ©s lors de la Phase (1) pour discriminer les instances d'arguments.L’analyse des rĂ©sultats avec diffĂ©rents filtrages des instances d’arguments candidates selon leurs pertinences montre un net effet positif lors du filtrage de 20% des instances avec les pertinences les plus faibles.Nous avons Ă©galement expĂ©rimentĂ© la possibilitĂ© de sĂ©lectionner plusieurs candidats pour chaque instance d’argument manquante dans une relation partielle, dans une approche d’assistance aux experts du domaine qui peuvent ensuite dĂ©terminer l’instance valide.Lors de la sĂ©lection d’un seul candidat, l’approche fondĂ©e sur les analyses des cooccurrences donne les meilleurs rĂ©sultats pour dĂ©tecter l’instance d’argument candidate valide.Avec une sĂ©lection plus importante, de trois ou cinq candidats, l’analyse des similaritĂ©s sĂ©mantiques permise par des modĂšles BERT de plongement lexicaux fournit de bons rĂ©sultats pour la dĂ©tection d’associations entre les instances d’arguments prĂ©sentes dans les relations partielles et les instances d’argument candidates Ă  la complĂ©tion des relations.Enfin, lors de la sĂ©lection de dix candidats, les expĂ©rimentations montrent que l’approche fondĂ©e sur la structure des documents est efficace pour complĂ©ter les relations n-Aires

    Instanciation de relations n-aires dans des articles scientifiques guidée par une ressource termino-ontologique de domaine

    No full text
    This thesis belongs to the research field of smart data, where we search for specific information within large documents.It proposes new methods of representation and extraction of experimental data from scientific articles, more specifically in the domain of food packaging.The experimental data can be represented as n-Ary relations composed of symbolic and quantitative arguments.The latter are composed of a numerical value and a unit of measurement.The objective of this thesis is to populate a knowledge base with instances of N-Ary relations extracted from scientific textual documents.The proposed approach is based on an Ontological and Terminological Resource (OTR) and is divided into two Phases:(1) the recognition and extraction of argument instances of interest and(2) the linking of these instances in n-Ary relations.Phase (1) proposes an original representation of the extracted argument instances, called SciPuRe (Scientific Publication Representation).It integrates ontological, lexical and structural descriptors that describe the context of the argument instances and allows to sort them by their relevance.Phase (2) relies on the information present in the tables of the documents, extracted automatically, to guide the extraction of partial n-Arye relations, the tables containing an important part of the experimental data in the scientific articles.These partial relations are then completed with the argument instances recognized in Phase (1).Three approaches are proposed and evaluated in order to identify the argument instances that should complete the relations: the use of document structure, the analysis of cooccurrences between the argument instances in the texts, and finally the use of textit{word-embedding} models allowing to measure the similarities between the candidate argument instances and the arguments already filled in the partial relations.Our results show the importance of sorting the relevant instances after argument recognition in Phase (1) using SciPuRe features.Our experiments show that the two most important criteria for determining the relevance of a symbolic argument instance are the specificity of the concept associated with the argument in the OTR and its frequency in the document.For quantitative arguments, it is the apparition of the argument instance in sections of the documents that determines its relevance.Our experiments on Phase (2) confirm the usefulness of the relevance scores computed in Phase (1) to discriminate the instances.The analysis of the results with different filtering of the candidate argument instances according to their relevance shows a clear positive effect when filtering the fifth of the instances with the lowest relevance.Our experiments also consider the possibility of selecting multiple candidates for each missing argument instance in a partial relation, in an approach to assist domain experts who can then determine the valid instance.When selecting a single candidate, the approach based on co-occurrence analyses gives the best results in detecting the valid candidate argument instance.With a larger selection of three or five candidates, semantic similarity analysis enabled by BERT word embeddings model provides good results for detecting associations between the argument instances present in partial relations and the candidate argument instances for relation completion.Finally, when selecting ten candidates, the experiments show that the approach based on document structure is effective to complete the n-Ary relations.Cette thĂšse vise Ă  proposer de nouvelles mĂ©thodes de reprĂ©sentation et d’extraction de donnĂ©es expĂ©rimentales Ă  partir d’articles scientifiques.Ces mĂ©thodes ont Ă©tĂ© Ă©valuĂ©es sur un corpus d’articles dans le domaine des emballages alimentaires.L’objectif de cette thĂšse est de peupler une base de connaissances d’instances de relations N-Aires extraites de documents scientifiques textuels.Les donnĂ©es expĂ©rimentales peuvent ĂȘtre reprĂ©sentĂ©es sous forme de relations n-Aires composĂ©es d’arguments symboliques et quantitatifs.L’approche proposĂ©e s’appuie sur une Ressource Termino-Ontologique (RTO) et se dĂ©compose en deux Phases : (1) la reconnaissance et l’extraction des instances d’arguments et (2) la mise en relation de celles-ci dans des relations n-Aires.La Phase (1) propose une reprĂ©sentation originale des instances d’arguments extraites, appelĂ©e SciPuRe (Scientifique Publication Representation).Celle-ci intĂšgre des descripteurs ontologiques, lexicaux et structurels qui dĂ©crivent le contexte d’apparition des instances d’arguments et permettent de les trier selon leurs pertinences.Nos rĂ©sultats montrent l’importance du tri des instances pertinentes Ă  l’issue de la reconnaissance des arguments, les deux critĂšres les plus importants pour dĂ©terminer la pertinence d’une instance d’argument symbolique sont la spĂ©cificitĂ© du concept associĂ© Ă  l’argument dans la RTO et sa frĂ©quence dans le document.Pour les arguments quantitatifs, c’est l’appartenance de l’instance d’argument Ă  des sections des documents qui permet de dĂ©terminer sa pertinence.La Phase (2) s’appuie sur les informations prĂ©sentes dans les tableaux des documents pour guider l’extraction des relations n-Aires Ă  partir de relations partielles.Ces relations partielles sont ensuite complĂ©tĂ©es par les instances d’arguments reconnues lors de la Phase (1).Trois approches sont proposĂ©es et Ă©valuĂ©es afin d’identifier les instances d’arguments qui doivent complĂ©ter les relations :l’utilisation de la structure des documents, l’analyse des cooccurrences entre les instances d’arguments dans les textes, et enfin l’utilisation de modĂšles de word-embedding permettant de mesurer les similaritĂ©s entre les instances d’arguments candidates et les arguments dĂ©jĂ  renseignĂ©s dans les relations partielles.Nos expĂ©rimentations sur la Phase (2) confirment l’utilitĂ© des scores de pertinence calculĂ©s lors de la Phase (1) pour discriminer les instances d'arguments.L’analyse des rĂ©sultats avec diffĂ©rents filtrages des instances d’arguments candidates selon leurs pertinences montre un net effet positif lors du filtrage de 20% des instances avec les pertinences les plus faibles.Nous avons Ă©galement expĂ©rimentĂ© la possibilitĂ© de sĂ©lectionner plusieurs candidats pour chaque instance d’argument manquante dans une relation partielle, dans une approche d’assistance aux experts du domaine qui peuvent ensuite dĂ©terminer l’instance valide.Lors de la sĂ©lection d’un seul candidat, l’approche fondĂ©e sur les analyses des cooccurrences donne les meilleurs rĂ©sultats pour dĂ©tecter l’instance d’argument candidate valide.Avec une sĂ©lection plus importante, de trois ou cinq candidats, l’analyse des similaritĂ©s sĂ©mantiques permise par des modĂšles BERT de plongement lexicaux fournit de bons rĂ©sultats pour la dĂ©tection d’associations entre les instances d’arguments prĂ©sentes dans les relations partielles et les instances d’argument candidates Ă  la complĂ©tion des relations.Enfin, lors de la sĂ©lection de dix candidats, les expĂ©rimentations montrent que l’approche fondĂ©e sur la structure des documents est efficace pour complĂ©ter les relations n-Aires

    Instanciation de relations n-aires dans des articles scientifiques guidée par une ressource termino-ontologique de domaine

    No full text
    This thesis belongs to the research field of smart data, where we search for specific information within large documents.It proposes new methods of representation and extraction of experimental data from scientific articles, more specifically in the domain of food packaging.The experimental data can be represented as n-Ary relations composed of symbolic and quantitative arguments.The latter are composed of a numerical value and a unit of measurement.The objective of this thesis is to populate a knowledge base with instances of N-Ary relations extracted from scientific textual documents.The proposed approach is based on an Ontological and Terminological Resource (OTR) and is divided into two Phases:(1) the recognition and extraction of argument instances of interest and(2) the linking of these instances in n-Ary relations.Phase (1) proposes an original representation of the extracted argument instances, called SciPuRe (Scientific Publication Representation).It integrates ontological, lexical and structural descriptors that describe the context of the argument instances and allows to sort them by their relevance.Phase (2) relies on the information present in the tables of the documents, extracted automatically, to guide the extraction of partial n-Arye relations, the tables containing an important part of the experimental data in the scientific articles.These partial relations are then completed with the argument instances recognized in Phase (1).Three approaches are proposed and evaluated in order to identify the argument instances that should complete the relations: the use of document structure, the analysis of cooccurrences between the argument instances in the texts, and finally the use of textit{word-embedding} models allowing to measure the similarities between the candidate argument instances and the arguments already filled in the partial relations.Our results show the importance of sorting the relevant instances after argument recognition in Phase (1) using SciPuRe features.Our experiments show that the two most important criteria for determining the relevance of a symbolic argument instance are the specificity of the concept associated with the argument in the OTR and its frequency in the document.For quantitative arguments, it is the apparition of the argument instance in sections of the documents that determines its relevance.Our experiments on Phase (2) confirm the usefulness of the relevance scores computed in Phase (1) to discriminate the instances.The analysis of the results with different filtering of the candidate argument instances according to their relevance shows a clear positive effect when filtering the fifth of the instances with the lowest relevance.Our experiments also consider the possibility of selecting multiple candidates for each missing argument instance in a partial relation, in an approach to assist domain experts who can then determine the valid instance.When selecting a single candidate, the approach based on co-occurrence analyses gives the best results in detecting the valid candidate argument instance.With a larger selection of three or five candidates, semantic similarity analysis enabled by BERT word embeddings model provides good results for detecting associations between the argument instances present in partial relations and the candidate argument instances for relation completion.Finally, when selecting ten candidates, the experiments show that the approach based on document structure is effective to complete the n-Ary relations.Cette thĂšse vise Ă  proposer de nouvelles mĂ©thodes de reprĂ©sentation et d’extraction de donnĂ©es expĂ©rimentales Ă  partir d’articles scientifiques.Ces mĂ©thodes ont Ă©tĂ© Ă©valuĂ©es sur un corpus d’articles dans le domaine des emballages alimentaires.L’objectif de cette thĂšse est de peupler une base de connaissances d’instances de relations N-Aires extraites de documents scientifiques textuels.Les donnĂ©es expĂ©rimentales peuvent ĂȘtre reprĂ©sentĂ©es sous forme de relations n-Aires composĂ©es d’arguments symboliques et quantitatifs.L’approche proposĂ©e s’appuie sur une Ressource Termino-Ontologique (RTO) et se dĂ©compose en deux Phases : (1) la reconnaissance et l’extraction des instances d’arguments et (2) la mise en relation de celles-ci dans des relations n-Aires.La Phase (1) propose une reprĂ©sentation originale des instances d’arguments extraites, appelĂ©e SciPuRe (Scientifique Publication Representation).Celle-ci intĂšgre des descripteurs ontologiques, lexicaux et structurels qui dĂ©crivent le contexte d’apparition des instances d’arguments et permettent de les trier selon leurs pertinences.Nos rĂ©sultats montrent l’importance du tri des instances pertinentes Ă  l’issue de la reconnaissance des arguments, les deux critĂšres les plus importants pour dĂ©terminer la pertinence d’une instance d’argument symbolique sont la spĂ©cificitĂ© du concept associĂ© Ă  l’argument dans la RTO et sa frĂ©quence dans le document.Pour les arguments quantitatifs, c’est l’appartenance de l’instance d’argument Ă  des sections des documents qui permet de dĂ©terminer sa pertinence.La Phase (2) s’appuie sur les informations prĂ©sentes dans les tableaux des documents pour guider l’extraction des relations n-Aires Ă  partir de relations partielles.Ces relations partielles sont ensuite complĂ©tĂ©es par les instances d’arguments reconnues lors de la Phase (1).Trois approches sont proposĂ©es et Ă©valuĂ©es afin d’identifier les instances d’arguments qui doivent complĂ©ter les relations :l’utilisation de la structure des documents, l’analyse des cooccurrences entre les instances d’arguments dans les textes, et enfin l’utilisation de modĂšles de word-embedding permettant de mesurer les similaritĂ©s entre les instances d’arguments candidates et les arguments dĂ©jĂ  renseignĂ©s dans les relations partielles.Nos expĂ©rimentations sur la Phase (2) confirment l’utilitĂ© des scores de pertinence calculĂ©s lors de la Phase (1) pour discriminer les instances d'arguments.L’analyse des rĂ©sultats avec diffĂ©rents filtrages des instances d’arguments candidates selon leurs pertinences montre un net effet positif lors du filtrage de 20% des instances avec les pertinences les plus faibles.Nous avons Ă©galement expĂ©rimentĂ© la possibilitĂ© de sĂ©lectionner plusieurs candidats pour chaque instance d’argument manquante dans une relation partielle, dans une approche d’assistance aux experts du domaine qui peuvent ensuite dĂ©terminer l’instance valide.Lors de la sĂ©lection d’un seul candidat, l’approche fondĂ©e sur les analyses des cooccurrences donne les meilleurs rĂ©sultats pour dĂ©tecter l’instance d’argument candidate valide.Avec une sĂ©lection plus importante, de trois ou cinq candidats, l’analyse des similaritĂ©s sĂ©mantiques permise par des modĂšles BERT de plongement lexicaux fournit de bons rĂ©sultats pour la dĂ©tection d’associations entre les instances d’arguments prĂ©sentes dans les relations partielles et les instances d’argument candidates Ă  la complĂ©tion des relations.Enfin, lors de la sĂ©lection de dix candidats, les expĂ©rimentations montrent que l’approche fondĂ©e sur la structure des documents est efficace pour complĂ©ter les relations n-Aires

    Instanciation de relations n-Aires dans des articles scientifiques guidée par une Ressource Termino-Ontologique de domaine

    No full text
    This thesis belongs to the research field of smart data, where we search for specific information within textual documents. It proposes new methods of representation and extraction of experimental data from scientific articles. These methods were evaluated on a corpus of articles in the food packaging domain.The experimental data can be represented as n-Ary relations composed of symbolic and quantitative arguments. The latter are composed of a numerical value and a unit of measurement. The objective of this thesis is to populate a knowledge base with instances of N-Ary relations extracted from scientific textual documents. The proposed approach is based on an Ontological and Terminological Resource (OTR) and is divided into two Phases: (1) the recognition and extraction of argument instances of interest and (2) the linking of these instances in n-Ary relations. Phase (1) proposes an original representation of the extracted argument instances, called SciPuRe (Scientific Publication Representation). It integrates ontological, lexical and structural descriptors that describe the context of the argument instances and allows to sort them by their relevance. Phase (2) relies on the information present in the tables of the documents, extracted automatically, to guide the extraction of partial n-Arye relations, the tables containing an important part of the experimental data in the scientific articles. These partial relations are then completed with the argument instances recognized in Phase (1). Three approaches are proposed and evaluated in order to identify the argument instances that should complete the relations: the use of document structure, the analysis of cooccurrences between the argument instances in the texts, and finally the use of word-embedding models allowing to measure the similarities between the candidate argument instances and the arguments already filled in the partial relations.Our results show the importance of sorting the relevant instances after argument recognition in Phase (1) using SciPuRe features. Our experiments show that the two most important criteria for determining the relevance of a symbolic argument instance are the specificity of the concept associated with the argument in the OTR and its frequency in the document. For quantitative arguments, it is the apparition of the argument instance in sections of the documents that determines its relevance. Our experiments on Phase (2) confirm the usefulness of the relevance scores computed in Phase (1) to discriminate the instances. The analysis of the results with different filtering of the candidate argument instances according to their relevance shows a clear positive effect when filtering 20%$ of the instances with the lowest relevance. We also experimented with the possibility of selecting multiple candidates for each missing argument instance in a partial relation, in an approach to assist domain experts who can then determine the valid instance. When selecting a single candidate, the approach based on co-occurrence analyses gives the best results in detecting the valid candidate argument instance. With a larger selection of three or five candidates, semantic similarity analysis enabled by BERT word embeddings model provides good results for detecting associations between the argument instances present in partial relations and the candidate argument instances for relation completion. Finally, when selecting ten candidates, the experiments show that the approach based on document structure is effective to complete the n-Ary relations.Cette thĂšse s’inscrit dans le domaine de recherche des smart data, oĂč nous recherchons des informations spĂ©cifiques au sein de documents textuels. Elle consiste Ă  proposer de nouvelles mĂ©thodes de reprĂ©sentation et d'extraction de donnĂ©es expĂ©rimentales Ă  partir d’articles scientifiques. Ces mĂ©thodes ont Ă©tĂ© Ă©valuĂ©es sur un corpus d'articles dans le domaine des emballages alimentaires.Les donnĂ©es expĂ©rimentales peuvent ĂȘtre reprĂ©sentĂ©es sous forme de relations n-Aires composĂ©es d’arguments symboliques et quantitatifs. Ces derniers sont constituĂ©s d’une valeur numĂ©rique et d’une unitĂ© de mesure. L'objectif de cette thĂšse est de peupler une base de connaissances d’instances de relations N-Aires extraites de documents scientifiques textuels. L’approche proposĂ©e s’appuie sur une Ressource Termino-Ontologique (RTO) et se dĂ©compose en deux Phases : (1) la reconnaissance et l'extraction des instances d’arguments d’intĂ©rĂȘt et (2) la mise en relation de ces instances dans des relations n-Aires. La Phase (1) propose une reprĂ©sentation originale des instances d’arguments extraites, appelĂ©e SciPuRe (Scientifique Publication Representation). Celle-ci intĂšgre des descripteurs ontologiques, lexicaux et structurels qui dĂ©crivent le contexte d'apparition des instances d'arguments et permet de les trier selon leurs pertinences. La Phase (2) s’appuie sur les informations prĂ©sentes dans les tableaux des documents, extraits automatiquement, pour guider l’extraction des relations n-Aires Ă  partir de relations partielles, les tableaux contenant une part importante des donnĂ©es expĂ©rimentales dans les articles scientifiques. Ces relations partielles sont ensuite complĂ©tĂ©es par les instances d’arguments reconnues lors de la Phase (1). Trois approches sont proposĂ©es et Ă©valuĂ©es afin d’identifier les instances d’arguments qui doivent complĂ©ter les relations : l’utilisation de la structure des documents, l’analyse des cooccurrences entre les instances d’arguments dans les textes, et enfin l’utilisation de modĂšles de word-embedding permettant de mesurer les similaritĂ©s entre les instances d’arguments candidates et les arguments dĂ©jĂ  renseignĂ©s dans les relations partielles.Nos rĂ©sultats montrent l’importance du tri des instances pertinentes Ă  l’issue de la reconnaissance des arguments lors de la Phase (1) en s’appuyant sur les descripteurs SciPuRe. Nos expĂ©rimentations montrent que les deux critĂšres les plus importants pour dĂ©terminer la pertinence d’une instance d'argument symbolique sont la spĂ©cificitĂ© du concept associĂ© Ă  l’argument dans la RTO et sa frĂ©quence dans le document. Pour les arguments quantitatifs, c’est l’appartenance de l’instance d'argument Ă  des sections des documents qui permet de dĂ©terminer sa pertinence. Nos expĂ©rimentations sur la Phase (2) confirment l’utilitĂ© des scores de pertinence calculĂ©s lors de la Phase (1) pour discriminer les instances.L'analyse des rĂ©sultats avec diffĂ©rents filtrages des instances d'arguments candidates selon leurs pertinences montre un net effet positif lors du filtrage de 20% des instances avec les pertinences les plus faibles. Nous avons Ă©galement expĂ©rimentĂ© la possibilitĂ© de sĂ©lectionner plusieurs candidats pour chaque instance d'argument manquante dans une relation partielle, dans une approche d'assistance aux experts du domaine qui peuvent ensuite dĂ©terminer l'instance valide. Lors de la sĂ©lection d'un seul candidat, l’approche fondĂ©e sur les analyses des cooccurrences donne les meilleurs rĂ©sultats pour dĂ©tecter l'instance d'argument candidate valide. Avec une sĂ©lection plus importante, de trois ou cinq candidats, l’analyse des similaritĂ©s sĂ©mantiques permise par des modĂšles BERT de plongement lexicaux fournit de bons rĂ©sultats pour la dĂ©tection d’associations entre les instances d’arguments prĂ©sentes dans les relations partielles et les instances d’argument candidates Ă  la complĂ©tion des relations. Enfin, lors de la sĂ©lection de dix candidats, les expĂ©rimentations montrent que l'approche fondĂ©e sur la structure des documents est efficace pour complĂ©ter les relations n-Aires

    Food packaging permeability and composition dataset dedicated to text-mining

    Get PDF
    This dataset is composed of symbolic and quantitative entities concerning food packaging composition and gas permeability. It was created from 50 scientific articles in English registered in html format from several international journals on the ScienceDirect website. The files were annotated independently by three experts on a WebAnno server. The aim of the annotation task was to recognize all entities related to packaging permeability measures and packaging composition. This annotation task is driven by an Ontological and Terminological Resource (OTR). An annotation guideline was designed in a collective and iterative approach involving the annotators. This dataset can be used to train or evaluate natural language processing (NLP) approaches in experimental fields, such as specialized entity recognition (e.g. terms and variations, units of measure, complex numerical values) or sentence level binary relation (e.g. value to unit, term to acronym)

    Investigating the detection of Tortured Phrases in Scientific Literature

    No full text
    International audienceWith the help of online tools, unscrupulous authors can today generate a pseudo-scientific article and attempt to publish it. Some of these tools work by replacing or paraphrasing existing texts to produce new content, but they have a tendency to generate nonsensical expressions. A recent study introduced the concept of "tortured phrase", an unexpected odd phrase that appears instead of the fixed expression. E.g. counterfeit consciousness instead of artificial intelligence. The present study aims at investigating how tortured phrases, that are not yet listed, can be detected automatically. We conducted several experiments, including nonneural binary classification, neural binary classification and cosine similarity comparison of the phrase tokens, yielding noticeable results
    • 

    corecore