150 research outputs found

    Matching records in multiple databases using a hybridization of several technologies.

    Get PDF
    A major problem with integrating information from multiple databases is that the same data objects can exist in inconsistent data formats across databases and a variety of attribute variations, making it difficult to identify matching objects using exact string matching. In this research, a variety of models and methods have been developed and tested to alleviate this problem. A major motivation for this research is that the lack of efficient tools for patient record matching still exists for health care providers. This research is focused on the approximate matching of patient records with third party payer databases. This is a major need for all medical treatment facilities and hospitals that try to match patient treatment records with records of insurance companies, Medicare, Medicaid and the veteran\u27s administration. Therefore, the main objectives of this research effort are to provide an approximate matching framework that can draw upon multiple input service databases, construct an identity, and match to third party payers with the highest possible accuracy in object identification and minimal user interactions. This research describes the object identification system framework that has been developed from a hybridization of several technologies, which compares the object\u27s shared attributes in order to identify matching object. Methodologies and techniques from other fields, such as information retrieval, text correction, and data mining, are integrated to develop a framework to address the patient record matching problem. This research defines the quality of a match in multiple databases by using quality metrics, such as Precision, Recall, and F-measure etc, which are commonly used in Information Retrieval. The performance of resulting decision models are evaluated through extensive experiments and found to perform very well. The matching quality performance metrics, such as precision, recall, F-measure, and accuracy, are over 99%, ROC index are over 99.50% and mismatching rates are less than 0.18% for each model generated based on different data sets. This research also includes a discussion of the problems in patient records matching; an overview of relevant literature for the record matching problem and extensive experimental evaluation of the methodologies, such as string similarity functions and machine learning that are utilized. Finally, potential improvements and extensions to this work are also presented

    Persistence of tuberculosis inferred from case and contact networks in Birmingham, UK

    Get PDF
    Tuberculosis (TB) is a public health priority in urban cities of high income countries such as the UK, where the local incidence can be several times that of the national incidence. A large dataset of 63,620 individuals registered at a single centre in Birmingham, UK captured all TB treatment episodes (active disease and latent infection) and contained individual level detail on contacts of TB cases from 1980 to 2011. Exploratory analysis of the pseudonymised research dataset revealed clusters of individuals presenting as cases and contacts through time. Repeated cases originated from the pool of known individuals treated for active or latent TB with a probability of 1.5% at five years and 2.7% at ten years, but routine recording of latent TB treatment episodes is not widespread to estimate the future burden of retreatment TB. When repeated contacts were examined, their probability of being diagnosed as a case was twice that of non-repeated contacts (3.9% versus 1.6% for active disease and 10.7% versus 3.7% for latent infection) at one year. Contact repetition should be recognised but consistent recording of patient identity is lacking. In further evaluation of the role of contact structure in case detection, only the eigenvector centrality (connections to other highly scoring individuals) was associated with at least one case detection in the local network. Because networks were viewed from a static approach and network metrics may reflect effect rather than cause of the contact tracing process, further interpretation was difficult. However network visualisation identified a large cluster of 3,148 individuals, who entered the dataset at all times in the study period, that were linked through a superspreading event. Evaluating transmission was limited by a small sample of patients with mycobacterial repetitive unit-variable number tandem repeats (MIRUVNTR) typing but data available suggested that the superspreading event was nested within a risk network rather than a transmission network

    Automated image-based quality control of molecularly imprinted polymer films

    Get PDF
    We present results of applying a feature extraction process to images of coatings of molecularly imprinted polymers (MIPs) coatings on glass substrates for defect detec- tion. Geometric features such as MIP side lengths, aspect ratio, internal angles, edge regularity, and edge strength are obtained by using Hough transforms, and Canny edge detection. A Self Organizing Map (SOM) is used for classification of texture of MIP surfaces. The SOM is trained on a data set comprised of images of manufactured MIPs. The raw images are first processed using Hough transforms and Canny edge detection to extract just the MIP-coated portion of the surface, allowing for surface area estimation and reduction of training set size. The training data set is comprised of 20-dimensional feature vectors, each of which is calculated from a single section of a gray scale image of a MIP. Haralick textures are among the quantifiers used as feature vector components. The training data is then processed using principal component analysis to reduce the number of dimensions of the data set. After training, the SOM is capable of classifying texture, including defects

    Stream water quality and benthic macroinvertebrate ecology in a coal-mining, acid-sensitive region

    Get PDF
    Acid mine drainage (AMD) and acid rain are important sources of impairment to streams in the Tygart Valley and Cheat River basins in north central West Virginia, USA. Due to a network of abandoned mined lands and bond forfeiture sites in this coal-mining region, AMD represents severe, but rather localized impacts to water quality. AMD is a consequence of the chemical oxidation of reduced geological minerals (sulfides) usually associated with coal during mining operations. The reactions produce aqueous solutions high in sulfates and dissolved metals when the minerals are exposed to the oxic environment through land disturbance. In addition, the weakly buffered and mostly acid producing to circum-neutral mineral geology of this region makes surface waters susceptible to the chemical consequences of acid rain. Acid rain forms when gaseous compounds of nitrogen and sulfur from fossil fuel combustion react with atmospheric moisture.;I tested a classification system based on water chemistry in streams of these two basins. Streams of the region ranged from very good water quality (reference type) to increasingly impaired by AMD (moderate to severe AMD types). Streams with soft water had characteristics associated with the impacts from acid rain, and streams with hard water were either natural occurrences or were influenced by alkaline materials injected into water to treat acid sources. A transitional water quality type was recognized, which was very difficult to characterize because of its gradation in chemistry across the spectrum from reference and hard water types to waters increasingly influenced by AMD.;It is commonly observed that benthic macroinvertebrates in streams from unpolluted waters are distributed continuously without being organized into discrete communities. The discreteness of water quality observed in this research, however, suggests that benthic macroinvertebrates ought not to be distributed continuously, but rather should correspond discretely to water quality types as distinct communities. Therefore, I tested the expectation that macroinvertebrate communities should be distributed in concordance with water quality types in the Cheat River basin. Multivariate models suggested that water quality types significantly structured macroinvertebrates. Measures of classification strength by water quality on community composition were weak, but significant. Indicator species analysis found several important macroinvertebrate genera that were linked especially to reference and soft water quality types.;In the Cheat River mainstem, benthic macroinvertebrate communities and a measure of stream ecosystem health were highly correlated to spatial and temporal inputs of AMD and thermal effluent. However, when these stressors occurred simultaneously, stream health and community structure did not recover with downstream improvements in water quality as they did when stressors occurred singly. In the Cheat River mainstem overall, AMD was responsible for most degradation, but AMD in combination with thermal effluent was also responsible for extensive loss of ecological integrity in the Cheat Canyon region. Consequently, local water chemistry accounts for the distributions of benthic macroinvertebrates in the Cheat basin. Therefore, macroinvertebrates may respond in predictable ways to restoration efforts that reduce harmful chemical constituents associated with acidic impacts. Large, watershed-scale attributes may be needed to explain variation in benthic macroinvertebrate communities not captured by local water quality types

    Segmentation of the breast region with pectoral muscle suppression and automatic breast density classification

    Get PDF
    Projecte final de carrera fet en col.laboració amb Université catholique de Louvain. Ecole Polytechnique de LouvainBreast cancer is one of the major causes of death among women. Nowadays screening mammography is the most adopted technique to perform an early breast cancer detection ahead other procedures like screen film mammography (SFM) or ultrasound scan. Computer assisted diagnosis (CAD) of mammograms attempts to help radiologists providing an automatic procedure to detect possible cancers in mammograms. Suspicious breast cancers appear as white spots in mammograms, indicating small clusters of micro-calcifications. Mammogram sensitivity decreases due some factors like density of the breast, presence of labels, artifacts or even pectoral muscle. The pre-processing of mammogram images is a very important step in the breast cancer analysis and detection because it might reduce the number of false positives. In this thesis we propose a method to segment and classify automatically mammograms according to their density. We perform several procedures including pre-processing (enhancement of the image, noise reduction, orientation finding or borders removal) and segmentation (separate the breast from the background, labels and pectoral muscle present in the mammograms) in order to increase the sensitivity of our CAD system. The final goal is the classification for diagnosis, in other words, finding the density class for an incoming mammography in order tot determine if more tests are needed to find possible cancers in the image. This functionality will be included in a new clinical imaging annotation system for computer aided breast cancer screening developed by the Communications and Remote Sensing Department at the Université Catholique de Louvain. The source code for the pre-processing and segmentation step has been programmed in C++ using the library of image processing ITK and CMake compiler. The performed method has been applied to medio-lateral oblique-view (MLO) mammograms as well as on caniocauldal mammograms (CC) belonging to different databases. The classification step has been implemented in Matlab. We have tested our pre-processing method obtaining a rate of 100% success removing labels and artifacts from mammograms of mini-MIAS database. The pectoral removal rate has been evaluated subjectively obtaining a rate of good removal of 57.76%. Finally, for the classification step, the best recognition rate that we have obtained was 76.25% using only pixel values, and 77.5% adding texture features, classifying images belonging to mini-MIAS database into 3 different density types. These results can be compared with the actual state of the art in segmentation and classification of biomedical images.El cáncer de mama es una de las mayores causas de muerte entre las mujeres. Actualmente, las mamografías digitales son la técnica más adoptada para realizar una previa detección de estos cánceres antes que otros procedimientos como "screen film mammography (SFM)" o escáneres de ultrasonidos. Los programas de diagnóstico automático (CAD) ayudan a los radiólogos proveyéndolos de un procedimiento automático para detectar posibles cánceres en las mamografías. Posibles cánceres aparecen en las mamografías como puntos blancos indicando pequeños grupos de micro-calcificaciones. La sensibilidad de las mamografias decrece debido a algunos factores como la densidad del pecho, presencia de etiquetas o artefactos o incluso de músculo pectoral. El pre-procesado de las mamografías es un paso muy importante en la detección de posibles cánceres de mama ya que puede reducir el número de falsos positivos. En esta tesis proponemos un método para segmentar y clasificar automáticamente las mamografías según su densidad. Hemos realizado varios procedimientos incluyendo, pre-procesado (realce de la imagen, reducción de ruido, descubrimiento de la orientación o supresión de bordes) y segmentación (separar el pecho de fondo, etiquetas y músculo pectoral presente en mamografías) para incrementar la sensibilidad de nuestro sistema CAD. El objetivo final es la clasificación para diagnosis, en otras palabras, encontrar la clase de densidad para una mamografía entrante y determinar si son necesarios más pruebas para encontrar posibles cánceres en las imágenes. Esta funcionalidad va a ser incluida en una nueva aplicación ara anotación de imágenes clínicas desarrollada por el Departamento de Comunicación y Detección Remota de la Universidad Católica de Lovaina. El código fuente para el pre-procesado y la segmentación ha sido desarrollado en C++ utilizando la librería de procesado de imagen ITK y el compilador CMake. El método implementado puede ser aplicado a tanto medio-lateral (MLO) como a caniocauldal mamografías (CC) pertenecientes a diferentes bases de datos. El método de clasificación ha sido implementado en Matlab. Hemos testeado nuestro método de pre-procesado obteniendo una tasa de suceso próxima al 100% en la eliminación de etiquetas y artefactos de la base de datos de mamografías mini-MIAS. La tasa de supresión de músculo pectoral ha sido evaluada de forma subjetiva obteniendo un 57.76%. Finalmente, en el método de clasificación se ha obtenido un 76.25% usando sólo información de los píxeles y un 77.5% usando información de texturas. Los resultados pueden ser comparados con el actual estado del arte en segmentación y clasificación de imágenes biomédicas.El càncer de mama és una de les majors causes de mort entre les dones. Actualment, les mamografies digitals són la tècnica més utilitzada per realitzar una prèvia detecció d'aquests càncers abans que altres procediments com "screen film mammography (SFM)" o escàners d'ultrasons. Els programes de diagnòstic automàtic (CAD) ajuden als radiòlegs proveïnt d'un procediment automàtic per detectar possibles càncers a les mamografies. Possibles càncers apareixen en les mamografies com punts blancs indicant petits grups de micro-calcificacions. La sensibilitat de les mamografies decreix a causa d'alguns factors com la densitat del pit, presència d'etiquetes o artefactes o fins i tot de múscul pectoral. El pre-processat de les mamografies és un pas molt important en la detecció de possibles càncers de mama ja que pot reduir el nombre de falsos positius. En aquesta tesi proposem un mètode per segmentar i classificar automàticament les mamografies segons la seva densitat. Hem realitzat diversos procediments incloent, pre-processat (realç de la imatge, reducció de soroll, descobriment de l'orientació o supressió de vores) i segmentació (separar el pit de fons, etiquetes i múscul pectoral present en mamografies) per incrementar la sensibilitat de nostre sistema CAD. L'objectiu final és la classificació per diagnosi, en altres paraules, trobar la classe de densitat per a una mamografia entrant i determinar si són necessaris més proves per trobar possibles càncers en les imatges. Aquesta funcionalitat serà inclosa en una nova aplicació ara anotació d'imatges clíniques desenvolupada pel Departament de Comunicació i Detecció Remota de la Universitat Catòlica de Lovaina. El codi font per al pre-processat i la segmentació ha estat desenvolupat en C + + utilitzant la llibreria de processat d'imatge ITK i el compilador CMake. El mètode implementat pot ser aplicat a tant mediolateral (MLO) com a caniocauldal mamografies (CC) pertanyents a diferents bases de dades. El mètode de classificació ha estat implementat en Matlab. Hem testejat el nostre mètode de pre-processat obtenint una taxa de succés propera al 100% en l'eliminació d'etiquetes i artefactes de la base de dades de mamografies mini-MIAS. La taxa de supressió de múscul pectoral ha estat avaluada de manera subjectiva obtenint un 57.76%. Finalment, en el mètode de classificació s'ha obtingut un 76.25% usant només informació dels píxels i un 77.5% usant informació de textures. Els resultats poden ser comparats amb l'actual estat de l'art en segmentació i classificació d'imatges biomèdiques

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Journal of Asian Finance, Economics and Business, v. 4, no. 3

    Get PDF
    corecore