67 research outputs found

    Semi-supervised incremental learning with few examples for discovering medical association rules

    Get PDF
    Background: Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data. Methods: In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps. Results: The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data. Conclusions: Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.This work has been partially supported by projects DOTT-HEALTH (PID2019-106942RB-C32, MCI/AEI/FEDER, UE). (Design of the study. Analysis and interpretation of data) and EXTRAE II (IMIENS 2019). (Design of the study. Analysis and interpretation of data. HUF corpus manual tagging. Writing of the manuscript), PI18CIII/00004 “Infobanco para uso secundario de datos basado en estándares de tecnología y conocimiento: implementación y evaluación de un infobanco de salud para CoRIS (Info-bank for the secondary use of data based on technology and knowledge standards: implementation and evaluation of a health info-bank for CoRIS) – SmartPITeS” (Data collection and HUF corpus construction), and PI18CIII/00019 - PI18/00890 - PI18/00981 “Arquitectura normalizada de datos clínicos para la generación de infobancos y su uso secundario en investigación: solución tecnológica (Clinical data normalized architecture for the genaration of info-banks and their secondary use in research: technological solution) – CAMAMA 4” (Data collection and HUF corpus construction) from Fondo de Investigación Sanitaria (FIS) Plan Nacional de I+D+i.S

    Discovering HIV related information by means of association rules and machine learning

    Get PDF
    Acquired immunodeficiency syndrome (AIDS) is still one of the main health problems worldwide. It is therefore essential to keep making progress in improving the prognosis and quality of life of affected patients. One way to advance along this pathway is to uncover connections between other disorders associated with HIV/AIDS-so that they can be anticipated and possibly mitigated. We propose to achieve this by using Association Rules (ARs). They allow us to represent the dependencies between a number of diseases and other specific diseases. However, classical techniques systematically generate every AR meeting some minimal conditions on data frequency, hence generating a vast amount of uninteresting ARs, which need to be filtered out. The lack of manually annotated ARs has favored unsupervised filtering, even though they produce limited results. In this paper, we propose a semi-supervised system, able to identify relevant ARs among HIV-related diseases with a minimal amount of annotated training data. Our system has been able to extract a good number of relationships between HIV-related diseases that have been previously detected in the literature but are scattered and are often little known. Furthermore, a number of plausible new relationships have shown up which deserve further investigation by qualified medical experts.This study has been partially supported by the Spanish Ministry of Science and Innovation within the DOTTHEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32, the OBSER-MENH Project (MCIN/AEI/10.13039/501100011033 and UE (“NextGenerationEU”/PRTR)) under Grant TED2021-130398B-C21 and the project RAICES (IMIENS 2022), PI18CIII/00004 “Infobanco para uso secundario de datos basado en estándares de tecnología y conocimiento: implementación y evaluación de un infobanco de salud para CoRIS (Info-bank for the secondary use of data based on technology and knowledge standards: implementation and evaluation of a health info-bank for CoRIS) - SmartPITeS” and PI18CIII/00019 - PI18/00890 - PI18/00981 “Arquitectura normalizada de datos clínicos para la generación de infobancos y su uso secundario en investigación: solución tecnológica (Clinical data normalized architecture for the generation of info-banks and their secondary use in research: technological solution) - CAMAMA 4” from Fondo de Investigación Sanitaria (FIS) Plan Nacional de I+D+i. The RIS cohort (CoRIS) is supported by the Instituto de Salud Carlos III through the Red Temática de Investigación Cooperativa en Sida (RD06/006, RD12/0017/0018 and RD16/0002/0006) as part of the Plan Nacional R+D+I and co-financed by ISCIII-Subdirección General de Evaluación and el Fondo Europeo de Desarrollo Regional (FEDER). The list of members of the Cohort of the Spanish HIV Research Network (CoRIS) is included in the Supplementary Material. Additional relationships between HIV-related diseases confirmed or discarded are included as Supplementary Material. This study would not have been possible without the collaboration of all patients, medical and nursing staff and data mangers who have taken part in the Project.S

    Diagnostico financiero y valoración de la empresa Jorge Lacouture Oñate, en la ciudad de Santa Marta

    Get PDF
    El presente trabajo consiste en la valoración de la empresa JORGE LACOUTURE OÑATE, un minorista dedicado a la venta de combustibles y derivados del petróleo; actualmente cuenta con dos estaciones de servicios de combustibles localizados en la ciudad de Santa Marta; la sede principal está ubicada en la vía Ciénaga - Santa Marta concretamente en el sector conocido como la “Y” de Ciénaga. La valoración de las empresas como metodología de análisis financiero es una herramienta fundamental de la planeación financiera considerando que en los últimos años la complejidad del mercado de combustibles ha generado cambios sustanciales que afectan las condiciones del mercado, y por tanto, las actividades del negocio. Además, un estudio de valoración de las empresas permite tomar decisiones estratégicas a corto, mediano y largo plazo para garantizar la sostenibilidad de ella en un mercado con alto grado de movilidad; otro aspecto por el cual se valora una Organización Comercial es determinar si durante su existencia ha generado valor o por el contrario se ha destruido valor. Para los propietarios resulta oportuno la valoración del negocio teniendo en cuenta que los constantes incrementos del petróleo a nivel mundial además de los impuestos, pueden impactar negativamente sus expectativas de crecimiento. La firma de contrato de exclusividad con un mayorista impide la diversificación del negocio para los pequeños empresarios y las normas como lo establece el Decreto 4299 de 2005, que reduce la capacidad de venta de los minoristas de combustibles y derivados del petróleo. La valoración es un proceso objetivo que cuantifica o mide los factores endógenos y exógenos que determinan el marco de referencia mediante el cual la organización se establece y se posiciona en un mercado específico. En virtud de ello, al valorar un negocio se pretende determinar el valor intrínseco de éste y no su valor de mercado, ni mucho menos su precio. Dentro de esta perspectiva, se debe tener claro que el valor es solamente un factor coyuntural, mientras que el precio es una realidad. El método empleado para la valoración objeto de este trabajo será el Flujo de Caja Libre, el cual ha sido ampliamente aplicado para la valoración de empresas en Colombia; se considera que mediante la aplicación de las herramientas de este método se puede obtener resultados precisos en el análisis financiero de las empresas. Dentro de esta perspectiva, el proceso de valoración aquí definido e identificado, es aplicable a cualquier tipo de empresas independientemente de su actividad o al sector al que pertenezca, es decir si es público o privado

    DOTT-HEALTH: Development of text-based technology to support diagnosis, prevention and health institutions management

    Get PDF
    La combinación de datos y pautas dirigidas a pacientes individuales se engloba en los Sistemas de Apoyo a la Decisión Clínica. La adopción del Informe Clínico Electrónico de forma sistemática por parte de los sistemas de salud da lugar a una recopilación masiva de datos clínicos que los profesionales no pueden procesar, dada la limitación humana para manejar una gran cantidad de información. Esto, junto con el aumento de la capacidad de procesamiento de las máquinas, conduce a un escenario en el que el análisis automático de los Informes Clínicos Electrónicos se vuelve esencial para determinar patrones, prevenir errores, mejorar la calidad, reducir costos y ahorrar tiempo a los servicios de salud. Esta propuesta aborda dos desafíos principales: el desarrollo de tecnologías para el apoyo al diagnóstico clínico y a la prevención, y la creación de tecnologías de ayuda a la gestión de los servicios médicos. Teniendo todo esto en mente, el proyecto se enfocará en desarrollar herramientas que supongan un avance de la tecnología en los sistemas de apoyo para la toma de decisiones médicas.The combination of individual patient data and guidelines is conceptualized as clinical decision support systems. The increase in the adoption of Electronic Health Records (EHR) by healthcare systems results in a collection of massive healthcare data that practitioners, having a limited capability to deal with a big amount of information, are unable to process. This, together with the increase of machine processing capabilities, leads to a scenario where automatic analysis of Electronic Health Records becomes essential to ascertain patterns, to prevent errors, improve quality, reduce costs and save time to the Health Services. This proposal addresses two main challenges: Development of technologies to support the clinical diagnosis and prevention, and to support the management of medical services.Este trabajo ha sido financiado por el proyecto DOTT-HEALTH (MCI/AEI/FEDER,UE) con referencias PID2019-106942RBC31, PID2019-106942RB-C32, PID2019-106942RB-C33.Peer ReviewedPostprint (published version

    Performance of Scheduling Policies in Adversarial Networks with Non-synchronized Clocks

    Get PDF
    In this paper we generalize the Continuous Adversarial Queuing Theory (CAQT) model (Blesa et al. in MFCS, Lecture Notes in Computer Science, vol. 3618, pp. 144–155, 2005) by considering the possibility that the router clocks in the network are not synchronized. We name the new model Non Synchronized CAQT (NSCAQT). Clearly, this new extension to the model only affects those scheduling policies that use some form of timing. In a first approach we consider the case in which although not synchronized, all clocks run at the same speed, maintaining constant differences. In this case we show that all universally stable policies in CAQT that use the injection time and the remaining path to schedule packets remain universally stable. These policies include, for instance, Shortest in System (SIS) and Longest in System (LIS). Then, we study the case in which clock differences can vary over time, but the maximum difference is bounded. In this model we show the universal stability of two families of policies related to SIS and LIS respectively (the priority of a packet in these policies depends on the arrival time and a function of the path traversed). The bounds we obtain in this case depend on the maximum difference between clocks. This is a necessary requirement, since we also show that LIS is not universally stable in systems without bounded clock difference. We then present a new policy that we call Longest in Queues (LIQ), which gives priority to the packet that has been waiting the longest in edge queues. This policy is universally stable and, if clocks maintain constant differences, the bounds we prove do not depend on them. To finish, we provide with simulation results that compare the behavior of some of these policies in a network with stochastic injection of packets

    Why Are Outcomes Different for Registry Patients Enrolled Prospectively and Retrospectively? Insights from the Global Anticoagulant Registry in the FIELD-Atrial Fibrillation (GARFIELD-AF).

    Get PDF
    Background: Retrospective and prospective observational studies are designed to reflect real-world evidence on clinical practice, but can yield conflicting results. The GARFIELD-AF Registry includes both methods of enrolment and allows analysis of differences in patient characteristics and outcomes that may result. Methods and Results: Patients with atrial fibrillation (AF) and ≥1 risk factor for stroke at diagnosis of AF were recruited either retrospectively (n = 5069) or prospectively (n = 5501) from 19 countries and then followed prospectively. The retrospectively enrolled cohort comprised patients with established AF (for a least 6, and up to 24 months before enrolment), who were identified retrospectively (and baseline and partial follow-up data were collected from the emedical records) and then followed prospectively between 0-18 months (such that the total time of follow-up was 24 months; data collection Dec-2009 and Oct-2010). In the prospectively enrolled cohort, patients with newly diagnosed AF (≤6 weeks after diagnosis) were recruited between Mar-2010 and Oct-2011 and were followed for 24 months after enrolment. Differences between the cohorts were observed in clinical characteristics, including type of AF, stroke prevention strategies, and event rates. More patients in the retrospectively identified cohort received vitamin K antagonists (62.1% vs. 53.2%) and fewer received non-vitamin K oral anticoagulants (1.8% vs . 4.2%). All-cause mortality rates per 100 person-years during the prospective follow-up (starting the first study visit up to 1 year) were significantly lower in the retrospective than prospectively identified cohort (3.04 [95% CI 2.51 to 3.67] vs . 4.05 [95% CI 3.53 to 4.63]; p = 0.016). Conclusions: Interpretations of data from registries that aim to evaluate the characteristics and outcomes of patients with AF must take account of differences in registry design and the impact of recall bias and survivorship bias that is incurred with retrospective enrolment. Clinical Trial Registration: - URL: http://www.clinicaltrials.gov . Unique identifier for GARFIELD-AF (NCT01090362)

    Risk profiles and one-year outcomes of patients with newly diagnosed atrial fibrillation in India: Insights from the GARFIELD-AF Registry.

    Get PDF
    BACKGROUND: The Global Anticoagulant Registry in the FIELD-Atrial Fibrillation (GARFIELD-AF) is an ongoing prospective noninterventional registry, which is providing important information on the baseline characteristics, treatment patterns, and 1-year outcomes in patients with newly diagnosed non-valvular atrial fibrillation (NVAF). This report describes data from Indian patients recruited in this registry. METHODS AND RESULTS: A total of 52,014 patients with newly diagnosed AF were enrolled globally; of these, 1388 patients were recruited from 26 sites within India (2012-2016). In India, the mean age was 65.8 years at diagnosis of NVAF. Hypertension was the most prevalent risk factor for AF, present in 68.5% of patients from India and in 76.3% of patients globally (P < 0.001). Diabetes and coronary artery disease (CAD) were prevalent in 36.2% and 28.1% of patients as compared with global prevalence of 22.2% and 21.6%, respectively (P < 0.001 for both). Antiplatelet therapy was the most common antithrombotic treatment in India. With increasing stroke risk, however, patients were more likely to receive oral anticoagulant therapy [mainly vitamin K antagonist (VKA)], but average international normalized ratio (INR) was lower among Indian patients [median INR value 1.6 (interquartile range {IQR}: 1.3-2.3) versus 2.3 (IQR 1.8-2.8) (P < 0.001)]. Compared with other countries, patients from India had markedly higher rates of all-cause mortality [7.68 per 100 person-years (95% confidence interval 6.32-9.35) vs 4.34 (4.16-4.53), P < 0.0001], while rates of stroke/systemic embolism and major bleeding were lower after 1 year of follow-up. CONCLUSION: Compared to previously published registries from India, the GARFIELD-AF registry describes clinical profiles and outcomes in Indian patients with AF of a different etiology. The registry data show that compared to the rest of the world, Indian AF patients are younger in age and have more diabetes and CAD. Patients with a higher stroke risk are more likely to receive anticoagulation therapy with VKA but are underdosed compared with the global average in the GARFIELD-AF. CLINICAL TRIAL REGISTRATION-URL: http://www.clinicaltrials.gov. Unique identifier: NCT01090362

    Web Spam Detection: New Classification Features Based on Qualified Link Analysis and Language Models

    No full text
    corecore