Search CORE

6,508 research outputs found

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

Author: Guo Maozu
Liang Tianming
Liu Xiaoyan
Liu Yang
Sharma Gaurav
Xue Liang
Zhang Hao
Publication venue
Publication date: 29/10/2023
Field of study

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data. Specifically, we propose a graph view of sentence bags referring to an entity pair, which enables message-passing based aggregation of information related to the entity pair over the sentence bag. The proposed framework alleviates the common problem of noisy labeling in distantly supervised relation extraction and also effectively incorporates inter-dependencies between sentences within a bag. Extensive experiments on two large-scale biomedical relation datasets and the widely utilized NYT dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods for biomedical distant supervision relation extraction while also providing excellent performance for relation extraction in the general text mining domain

arXiv.org e-Print Archive

Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction

Author: Lombardo Rosario
Plank Barbara
Ramponi Alan
Publication venue: European Language Resources Association
Publication date: 01/05/2020
Field of study

The IT University of Copenhagen's Repository

Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach

Author: Chen Jinying
Fodeh Samah J.
Jagannatha Abhyuday N.
Yu Hong
Publication venue: eScholarship@UMassChan
Publication date: 31/10/2017
Field of study

BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P \u3c .001 for all measures and all conditions). Using a rich set of learning features contributed to ADS\u27s performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS\u27s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request

eScholarship@UMMS

Knowledge extraction from unstructured data

Author: Sakor Ahmad
Publication venue: Hannover : Institutionelles Repositorium der Leibniz Universität Hannover
Publication date: 01/01/2023
Field of study

Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models

Institutionelles Repositorium der Leibniz Universität Hannover

The clinical effectiveness of individual behaviour change interventions to reduce risky sexual behaviour after a negative human immunodeficiency virus test in men who have sex with men: systematic and realist reviews and intervention development

Author: Abraham
Aghaizu
Ajzen
Albarracín
Angus
Bailey
Bandura
Bartholomew
Becker
Bedoya
Berg
Berg
Berg
Blas
Borrelli
Borrelli
Bourne
Bourne
Bourne
Bowen
Bowen
Brady
British Association for Sexual Health & HIV
Brown
Bull
Cane
Carey
Carpenter
Carrico
Centers for Disease Control and Prevention
Chesney
Chiasson
Christensen
ClincalTrials.gov
ClincalTrials.gov
ClincalTrials.gov
Clutterbuck
Coffin
Cohen
Coia
Colfax
Conner
Craig
Crepaz
Crepaz
Das
Davidovich
Davidovich
DerSimonian
Desai
Dilley
Dilley
Dilley
Dilley
Dixon-Woods
Duan
Eaton
Eaton
Eaton
Elford
Ellis
European Centre for Disease Prevention and Control
Fisher
Fisher
Fletcher
Flowers
Flowers
Flowers
Flowers
Flowers
Forney
Frasca
French
Gardner
Gilbart
Glanz
Gold
Goodwin
Grantome
Gray
Guo
Hao
Health Protection Scotland
Herbst
Hidalgo
Higa
Higgins
Hightow-Weidman
Hightow-Weidman
Hirshfield
Hoaglin
Hosek
Hosek
Hosek
Huan
Hughes
Imrie
Janis
Jansen
Janz
Jbilou
Johnson
Johnson
Johnson
Johnson
Johnson
Jones
Kalichman
Kasatpibal
Kayser
Khosropour
Knussen
Ko
Koblin
Koblin
Landovitz
Lau
Lau
Lawson
Lorimer
Lu
Lucas
Macdonald
Malek
May
May
McKee
Melendez-Torres
Menza
Metcalf
Metcalf
Metsch
Michie
Michie
Michie
Michie
Michie
Mikolajczak
Mikolajczak
Mikolajczak
Miller
Mimiaga
Miranda
Moher
Morgenstern
Morgenstern
Murray
Mustanski
Mustanski
Nakagawa
Nakagawa
National Institute for Health and Care Excellence
National Institute for Health and Care Excellence
National Institute for Health and Care Excellence
Neumann
New South Wales Sexually Transmitted Infection Program Unit
NHS Healthcare Improvement Scotland
Noar
Noar
Olander
Operario
Outlaw
O’Donnell
Painter
Parsons
Pawson
Pawson
Pawson R
Phillips
Picciano
Picciano
Prestwich
Public Health England
Rashbrook
Read
Reback
Rees
Reeves
Rhodes
Rhodes
Robinson
Rosser
Rycroft-Malone
Safren
Santos
Schnall
Schwarzer
Shoptaw
Shoptaw
Shoptaw
Simms
Simms
Skarbinski
Strömdahl
Tan
Taylor
Tobin
Tracy
van Kesteren
Varney
Warner
Webb
Wilkerson
Williams
Williams
Wolfers
Wolitski
Wong
Ye
Yin
Young
Young
Zhang
Zule
Publication venue: 'National Institute for Health Research'
Publication date: 01/01/2017
Field of study

Background: Men who have sex with men (MSM) experience significant inequalities in health and well-being. They are the group in the UK at the highest risk of acquiring a human immunodeficiency virus (HIV) infection. Guidance relating to both HIV infection prevention, in general, and individual-level behaviour change interventions, in particular, is very limited. Objectives: To conduct an evidence synthesis of the clinical effectiveness of behaviour change interventions to reduce risky sexual behaviour among MSM after a negative HIV infection test. To identify effective components within interventions in reducing HIV risk-related behaviours and develop a candidate intervention. To host expert events addressing the implementation and optimisation of a candidate intervention. Data sources: All major electronic databases (British Education Index, BioMed Central, Cumulative Index to Nursing and Allied Health Literature, EMBASE, Educational Resource Index and Abstracts, Health and Medical Complete, MEDLINE, PsycARTICLES, PsycINFO, PubMed and Social Science Citation Index) were searched between January 2000 and December 2014. Review methods: A systematic review of the clinical effectiveness of individual behaviour change interventions was conducted. Interventions were examined using the behaviour change technique (BCT) taxonomy, theory coding assessment, mode of delivery and proximity to HIV infection testing. Data were summarised in narrative review and, when appropriate, meta-analysis was carried out. Supplemental analyses for the development of the candidate intervention focused on post hoc realist review method, the assessment of the sequential delivery and content of intervention components, and the social and historical context of primary studies. Expert panels reviewed the candidate intervention for issues of implementation and optimisation. Results: Overall, trials included in this review (n = 10) demonstrated that individual-level behaviour change interventions are effective in reducing key HIV infection risk-related behaviours. However, there was considerable clinical and methodological heterogeneity among the trials. Exploratory meta-analysis showed a statistically significant reduction in behaviours associated with high risk of HIV transmission (risk ratio 0.75, 95% confidence interval 0.62 to 0.91). Additional stratified analyses suggested that effectiveness may be enhanced through face-to-face contact immediately after testing, and that theory-based content and BCTs drawn from ‘goals and planning’ and ‘identity’ groups are important. All evidence collated in the review was synthesised to develop a candidate intervention. Experts highlighted overall acceptability of the intervention and outlined key ways that the candidate intervention could be optimised to enhance UK implementation. Limitations: There was a limited number of primary studies. All were from outside the UK and were subject to considerable clinical, methodological and statistical heterogeneity. The findings of the meta-analysis must therefore be treated with caution. The lack of detailed intervention manuals limited the assessment of intervention content, delivery and fidelity. Conclusions: Evidence regarding the effectiveness of behaviour change interventions suggests that they are effective in changing behaviour associated with HIV transmission. Exploratory stratified meta-analyses suggested that interventions should be delivered face to face and immediately after testing. There are uncertainties around the generalisability of these findings to the UK setting. However, UK experts found the intervention acceptable and provided ways of optimising the candidate intervention. Future work: There is a need for well-designed, UK-based trials of individual behaviour change interventions that clearly articulate intervention content and demonstrate intervention fidelity

Crossref

Directory of Open Access Journals

PubMed Central

Enlighten

ResearchOnline@GCU

University of Queensland eSpace

Low-rank regularization for high-dimensional sparse conjunctive feature spaces in information extraction

Author: Primadhanty Audi
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

One of the challenges in Natural Language Processing (NLP) is the unstructured nature of texts, in which useful information is not easily identifiable. Information Extraction (IE) aims to alleviate it by enabling automatic extraction of structured information from such text sources. The resulting structured information will facilitate easier querying, organizing, and analyzing of data from texts. In this thesis, we are interested in two IE related tasks: (i) named entity classification and (ii) template filling. Specifically, this thesis examines the problem of learning classifiers of text spans and explore its application for extracting named entities and template slot-fillers. In general, our goal is to construct a method to learn classifiers that: (i) require less supervision, (ii) work well with high-dimensional sparse feature spaces and (iii) are able to classify unseen items (i.e. named entities/slot-fillers not observed in training data). The key idea of our contribution is the utilization of unseen conjunctive features. A conjunctive feature is a combination of features from different feature sets. For example, to classify a phrase, one might have one feature set for the context and another set for the phrase itself. When learning a classifier, only a factor of these conjunctive features will be observed in the training set, leaving the rest (i.e. unseen features) unusable for predicting items in test time. We hypothesize that utilizing such unseen conjunctions is useful to address all of the aspects of the goal. We develop a general regularization framework specifically designed for sparse conjunctive feature spaces. Our strategy is based on employing tensors to represent the conjunctive feature space, and forcing the model to induce low-dimensional embeddings of the feature vectors via low-rank regularization on the tensor parameters. Such compressed representation will help prediction by generalizing to novel examples where most of the conjunctions will be unseen in the training set. We conduct experiments on learning named entity classifiers and template filling, focusing on extracting unseen items. We show that when learning classifiers under minimal supervision, our approach is more effective in controlling model capacity than standard techniques for linear classification.Uno de los retos en Procesamiento del Lenguaje Natural (NLP, del inglés Natural Language Processing) es la naturaleza no estructurada del texto, que hace que la información útil y relevante no sea fácilmente identificable. Los métodos de Extracción de Información (IE, del inglés Information Extraction) afrontan este problema mediante la extracción automática de información estructurada de dichos textos. La estructura resultante facilita la búsqueda, la organización y el análisis datos textuales. Esta tesis se centra en dos tareas relacionadas dentro de IE: (i) clasificación de entidades nombradas (NEC, del inglés Named Entity Classification), y (ii) rellenado de plantillas (en inglés, template filling). Concretamente, esta tesis estudia el problema de aprender clasificadores de secuencias textuales y explora su aplicación a la extracción de entidades nombradas y de valores para campos de plantillas. El objetivo general es desarrollar un método para aprender clasificadores que: (i) requieran poca supervisión; (ii) funcionen bien en espacios de características de alta dimensión y dispersión; y (iii) sean capaces de clasificar elementos nunca vistos (por ejemplo entidades o valores de campos que no hayan sido vistos en fase de entrenamiento). La idea principal de nuestra contribución es la utilización de características conjuntivas que no aparecen en el conjunto de entrenamiento. Una característica conjuntiva es una conjunción de características elementales. Por ejemplo, para clasificar la mención de una entidad en una oración, se utilizan características de la mención, del contexto de ésta, y a su vez conjunciones de los dos grupos de características. Cuando se aprende un clasificador en un conjunto de entrenamiento concreto, sólo se observará una fracción de estas características conjuntivas, dejando el resto (es decir, características no vistas) sin ser utilizado para predecir elementos en fase de evaluación y explotación del modelo. Nuestra hipótesis es que la utilización de estas conjunciones nunca vistas pueden ser potencialmente muy útiles, especialmente para reconocer entidades nuevas. Desarrollamos un marco de regularización general específicamente diseñado para espacios de características conjuntivas dispersas. Nuestra estrategia se basa en utilizar tensores para representar el espacio de características conjuntivas y obligar al modelo a inducir "embeddings" de baja dimensión de los vectores de características vía regularización de bajo rango en los parámetros de tensor. Dicha representación comprimida ayudará a la predicción, generalizando a nuevos ejemplos donde la mayoría de las conjunciones no han sido vistas durante la fase de entrenamiento. Presentamos experimentos sobre el aprendizaje de clasificadores de entidades nombradas, y clasificadores de valores en campos de plantillas, centrándonos en la extracción de elementos no vistos. Demostramos que al aprender los clasificadores bajo mínima supervisión, nuestro enfoque es más efectivo en el control de la capacidad del modelo que las técnicas estándar para la clasificación linea

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Low-rank regularization for high-dimensional sparse conjunctive feature spaces in information extraction

Author: Primadhanty Audi
Publication venue: Universitat Politècnica de Catalunya
Publication date: 17/11/2017
Field of study

Versió amb dues seccions retallades, per drets de l'editorOne of the challenges in Natural Language Processing (NLP) is the unstructured nature of texts, in which useful information is not easily identifiable. Information Extraction (IE) aims to alleviate it by enabling automatic extraction of structured information from such text sources. The resulting structured information will facilitate easier querying, organizing, and analyzing of data from texts. In this thesis, we are interested in two IE related tasks: (i) named entity classification and (ii) template filling. Specifically, this thesis examines the problem of learning classifiers of text spans and explore its application for extracting named entities and template slot-fillers. In general, our goal is to construct a method to learn classifiers that: (i) require less supervision, (ii) work well with high-dimensional sparse feature spaces and (iii) are able to classify unseen items (i.e. named entities/slot-fillers not observed in training data). The key idea of our contribution is the utilization of unseen conjunctive features. A conjunctive feature is a combination of features from different feature sets. For example, to classify a phrase, one might have one feature set for the context and another set for the phrase itself. When learning a classifier, only a factor of these conjunctive features will be observed in the training set, leaving the rest (i.e. unseen features) unusable for predicting items in test time. We hypothesize that utilizing such unseen conjunctions is useful to address all of the aspects of the goal. We develop a general regularization framework specifically designed for sparse conjunctive feature spaces. Our strategy is based on employing tensors to represent the conjunctive feature space, and forcing the model to induce low-dimensional embeddings of the feature vectors via low-rank regularization on the tensor parameters. Such compressed representation will help prediction by generalizing to novel examples where most of the conjunctions will be unseen in the training set. We conduct experiments on learning named entity classifiers and template filling, focusing on extracting unseen items. We show that when learning classifiers under minimal supervision, our approach is more effective in controlling model capacity than standard techniques for linear classification.Uno de los retos en Procesamiento del Lenguaje Natural (NLP, del inglés Natural Language Processing) es la naturaleza no estructurada del texto, que hace que la información útil y relevante no sea fácilmente identificable. Los métodos de Extracción de Información (IE, del inglés Information Extraction) afrontan este problema mediante la extracción automática de información estructurada de dichos textos. La estructura resultante facilita la búsqueda, la organización y el análisis datos textuales. Esta tesis se centra en dos tareas relacionadas dentro de IE: (i) clasificación de entidades nombradas (NEC, del inglés Named Entity Classification), y (ii) rellenado de plantillas (en inglés, template filling). Concretamente, esta tesis estudia el problema de aprender clasificadores de secuencias textuales y explora su aplicación a la extracción de entidades nombradas y de valores para campos de plantillas. El objetivo general es desarrollar un método para aprender clasificadores que: (i) requieran poca supervisión; (ii) funcionen bien en espacios de características de alta dimensión y dispersión; y (iii) sean capaces de clasificar elementos nunca vistos (por ejemplo entidades o valores de campos que no hayan sido vistos en fase de entrenamiento). La idea principal de nuestra contribución es la utilización de características conjuntivas que no aparecen en el conjunto de entrenamiento. Una característica conjuntiva es una conjunción de características elementales. Por ejemplo, para clasificar la mención de una entidad en una oración, se utilizan características de la mención, del contexto de ésta, y a su vez conjunciones de los dos grupos de características. Cuando se aprende un clasificador en un conjunto de entrenamiento concreto, sólo se observará una fracción de estas características conjuntivas, dejando el resto (es decir, características no vistas) sin ser utilizado para predecir elementos en fase de evaluación y explotación del modelo. Nuestra hipótesis es que la utilización de estas conjunciones nunca vistas pueden ser potencialmente muy útiles, especialmente para reconocer entidades nuevas. Desarrollamos un marco de regularización general específicamente diseñado para espacios de características conjuntivas dispersas. Nuestra estrategia se basa en utilizar tensores para representar el espacio de características conjuntivas y obligar al modelo a inducir "embeddings" de baja dimensión de los vectores de características vía regularización de bajo rango en los parámetros de tensor. Dicha representación comprimida ayudará a la predicción, generalizando a nuevos ejemplos donde la mayoría de las conjunciones no han sido vistas durante la fase de entrenamiento. Presentamos experimentos sobre el aprendizaje de clasificadores de entidades nombradas, y clasificadores de valores en campos de plantillas, centrándonos en la extracción de elementos no vistos. Demostramos que al aprender los clasificadores bajo mínima supervisión, nuestro enfoque es más efectivo en el control de la capacidad del modelo que las técnicas estándar para la clasificación linealPostprint (published version

UPCommons. Portal del coneixement obert de la UPC