399 research outputs found

    Data driven Xpath generation

    Get PDF
    The XPath query language offers a standard for information extraction from HTML documents. Therefore, the DOM tree represen- tation is typically used, which models the hierarchical structure of the document. One of the key aspects of HTML is the separation of data and the structure that is used to represent it. A consequence thereof is that data extraction algorithms usually fail to identify data if the structure of a document is changed. In this paper, it is investigated how a set of tab- ular oriented XPath queries can be adapted in such a way it deals with modifications in the DOM tree of an HTML document. The basic idea is hereby that if data has already been extracted in the past, it could be used to reconstruct XPath queries that retrieve the data from a different DOM tree. Experimental results show the accuracy of our method

    Supporting User-Defined Functions on Uncertain Data

    Get PDF
    Uncertain data management has become crucial in many sensing and scientific applications. As user-defined functions (UDFs) become widely used in these applications, an important task is to capture result uncertainty for queries that evaluate UDFs on uncertain data. In this work, we provide a general framework for supporting UDFs on uncertain data. Specifically, we propose a learning approach based on Gaussian processes (GPs) to compute approximate output distributions of a UDF when evaluated on uncertain input, with guaranteed error bounds. We also devise an online algorithm to compute such output distributions, which employs a suite of optimizations to improve accuracy and performance. Our evaluation using both real-world and synthetic functions shows that our proposed GP approach can outperform the state-of-the-art sampling approach with up to two orders of magnitude improvement for a variety of UDFs. 1

    Monitoring System for Traffic Analysis Using Twitter Stream

    Get PDF
    Social networks are often utilized as a supply of data for event detection like road holdup and automobile accidents. Existing system present a period of time observance system for traffic event detection from twitter. The system fetches tweets from twitter and then; processes tweets victimisation text mining techniques. Last performs the classification of tweets. The aim of the system is to assign the suitable category label to every tweet, whether or not it's associated with a traffic event or not. System utilized the support vector machine as a classification model. The projected system uses the system supported semi-supervised approach, which provides coaching victimisation traffic connected dataset. we have a tendency to propose a bunch approach for classification of the tweets in traffic connected and non- traffic connected tweets. We use a geometer distance to calculate the similarity between the tweets

    U.S. prevalence of endocrine therapy-naive locally advanced or metastatic breast cancer

    Get PDF
    Background: Variations in treatment choice, or late stage at first diagnosis, mean that, despite guideline recommendations, not all patients with hormone receptor (hr)-positive locally advanced or metastatic breast cancer (la/mbca) will have received endocrine therapy before disease progression. In the present study, we aimed to estimate the proportion of women with postmenopausal hr-positive la/mbca in the United States who are endocrine therapy-naive. Methods: Women in the Optum Electronic Health Record (ehr) database with a breast cancer (bca) diagnosis (January 2008-March 2015) were included. Patient and malignancy characteristics were identified using structured data fields and natural-language processing of free-text clinical notes. The proportion of women with postmenopausal hr-positive, human epidermal growth factor 2 (her2)-negative (or unknown) la/mbca who had not received prior endocrine therapy was determined. Results were extrapolated to the entire U.S. population using the U.S. National Cancer Institute\u27s Surveillance, Epidemiology, and End Results database. Results are presented descriptively. Results: In the ehr database, 11,831 women with bca had discernible information on postmenopausal status, hr status, and disease stage. Of those women, 1923 (16.3%) had postmenopausal hr-positive, her2-negative (or unknown) la/mbca, and 70.7% of those 1923 patients (n = 1360) had not received prior endocrine therapy, accounting for 11.5% of the overall population. Extrapolating those estimates nationally suggests an annual incidence of 14,784 cases, and a 5-year limited duration prevalence of 50,638 cases. Conclusions: A substantial proportion of women with postmenopausal hr-positive la/mbca in the United States could be endocrine therapy-naive

    Querying Factorized Probabilistic Triple Databases

    Full text link
    Abstract. An increasing amount of data is becoming available in the form of large triple stores, with the Semantic Web’s linked open data cloud (LOD) as one of the most prominent examples. Data quality and completeness are key issues in many community-generated data stores, like LOD, which motivates probabilistic and statistical approaches to data representation, reasoning and querying. In this paper we address the issue from the perspective of probabilistic databases, which account for uncertainty in the data via a probability distribution over all database instances. We obtain a highly compressed representation using the re-cently developed RESCAL approach and demonstrate experimentally that efficient querying can be obtained by exploiting inherent features of RESCAL via sub-query approximations of deterministic views

    Correlation between CD4 counts of HIV patients and enteric protozoan in different seasons – An experience of a tertiary care hospital in Varanasi (India)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protozoan infections are the most serious among all the superimposed infections in HIV patients and claim a number of lives every year. The line of treatment being different for diverse parasites necessitates a definitive diagnosis of the etiological agents to avoid empirical treatment. Thus, the present study has been aimed to elucidate the associations between diarrhoea and CD4 counts and to study the effect of HAART along with management of diarrhoea in HIV positive patients. This study is the first of its kind in this area where an attempt was made to correlate seasonal variation and intestinal protozoan infestations.</p> <p>Methods</p> <p>The study period was from January 2006 to October 2007 wherein stool samples were collected from 366 HIV positive patients with diarrhea attending the ART centre, inpatient department and ICTC of S.S. hospital, I.M.S., B.H.U., Varanasi. Simultaneously, CD4 counts were recorded to assess the status of HIV infection vis-à-vis parasitic infection. The identification of pathogens was done on the basis of direct microscopy and different staining techniques.</p> <p>Results</p> <p>Of the 366 patients, 112 had acute and 254 had chronic diarrhea. The percentages of intestinal protozoa detected were 78.5% in acute and 50.7% in chronic cases respectively. Immune restoration was observed in 36.6% patients after treatment on the basis of clinical observation and CD4 counts. In 39.8% of HIV positive cases <it>Cryptosporidium </it>spp. was detected followed by <it>Microsporidia </it>spp. (26.7%). The highest incidence of intestinal infection was in the rainy season. However, infection with <it>Cyclospora </it>spp. was at its peak in the summer. Patients with chronic diarrhea had lower CD4 cell counts. The maximum parasitic isolation was in the patients whose CD4 cell counts were below 200 cells/μl.</p> <p>Conclusion</p> <p>There was an inverse relation between the CD4 counts and duration of diarrhea. <it>Cryptosporidium </it>spp. was isolated maximum among all the parasites in the HIV patients. The highest incidence of infection was seen in the rainy season.</p
    corecore