11 research outputs found

    Website boundary detection via machine learning

    Get PDF
    This thesis describes research undertaken in the field of web data mining. More specifically this research is directed at investigating solutions to the Website Boundary Detection (WBD) problem. WBD is the problem of identifying the collection of all web pages that are part of a single website, which is an open problem. Potential solutions to WBD can be beneficial with respect to tasks such as archiving web content and the automated construction of web directories. A pre-requisite to any WBD approach is that of a definition of a website. This thesis commences with a discussion of previous definitions of a website, and subsequently proposes a definition of a website which is used with respect to the WBD solution approaches presented later in this thesis. The WBD problem may be addressed in either the static or the dynamic context. Both are considered in this thesis. Static approaches require all web page data to be available a priori in order to make a decision on what pages are within a website boundary. While dynamic approaches make decisions on portions of the web data, and incrementally build a representation of the pages within a website boundary. There are three main approaches to the WBD problem presented in this thesis; the first two are static approaches, and the final one is a dynamic approach. The first static approach presented in this thesis concentrates on the types of features that can be used to represent web pages. This approach presents a practical solution to the WBD problem by applying clustering algorithms to various combinations of features. Further analysis investigates the ``best'' combination of features to be used in terms of WBD performance. The second static approach investigates graph partitioning techniques based on the structural properties of the web graph in order to produce WBD solutions. Two variations of the approach are considered, a hierarchical graph partitioning technique, and a method based on minimum cuts of flow networks. The final approach for the evaluation of WBD solutions presented in this research considers the dynamic context. The proposed dynamic approach uses both structural properties and various feature representations of web pages in order to incrementally build a website boundary as the pages of the web graph are traversed. The evaluation of the approaches presented in this thesis was conducted using web graphs from four academic departments hosted by the University of Liverpool. Both the static and dynamic approaches produce appropriate WBD solutions, however. The reported evaluation suggests that the dynamic approach to resolving the WBD problem offers additional benefits over a static approach due to the lower resource cost of gathering and processing typically smaller amounts of web data

    Incidence of sight-threatening diabetic retinopathy in an established urban screening programme: An 11-year cohort study

    Get PDF
    Aims: systematic annual screening to detect sight-threatening diabetic retinopathy (STDR) is established in the United Kingdom. We designed an observational cohort study to provide up-to-date data for policy makers and clinical researchers on incidence of key screening endpoints in people with diabetes attending one screening programme running for over 30 years.Methods: all people with diabetes aged ≥12 years registered with general practices in the Liverpool health district were offered inclusion. Data sources comprised: primary care (demographics, systemic risk factors), Liverpool Diabetes Eye Screening Programme (retinopathy grading), Hospital Eye Services (slit lamp biomicroscopy assessment of screen positives).Results: 133,366 screening episodes occurred in 28,384 people over 11 years. Overall incidences were: screen positive 6.7% (95% CI 6.5–6.8), screen positive for retinopathy 3.1% (3.0–3.1), unassessable images 2.6% (2.5–2.7), other significant eye diseases 1.0% (1.0–1.1). 1.6% (1.6–1.7) had sight-threatening retinopathy confirmed by slit lamp biomicroscopy. The annual incidence of screen positive and screen positive for retinopathy showed consistent declines from 8.8%–10.6% and 4.4%–4.6% in 2007/09 to 4.4%–6.8% and 2.3%–2.9% in 2013/17, respectively. Rates of STDR (true positive) were consistently below 2% after 2008/09. Screen positive rates were higher in first time attenders (9.9% [9.4–10.2] vs. 6.1% [6.0–6.2]) in part due to ungradeable images (4.1% vs. 2.3%) and other eye disease (2.4% vs. 0.8%). 4.5% (3.9–5.2) of previous non-attenders had sight-threatening retinopathy. Compared with people with type 2 diabetes, those with type 1 disease demonstrated higher rates of screen positive (11.9% vs. 6.0%) and STDR (6.4% vs. 1.2%). Overall prevalence of any retinopathy was 27.2% (27.0–27.4).Conclusions: in an established screening programme with a stable population screen, positive rates show a consistent fall over time to a low level. Of those who are screen positive, fewer than 50% are screen positive for diabetic retinopathy. Most are due to sight threatening maculopathy. The annual incidence of STDR is under 2% suggesting future work on redefining screen positive and supporting extended intervals for people at low risk. Higher rates of screen positive and STDR are seen in first time attenders. Those who have never attended for screening should be specifically targeted.</p

    A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks

    No full text
    Abstract-This paper presents an investigation into the Website Boundary Detection (WBD) problem in the dynamic context. In the dynamic context (as opposed to the static context) the web data to be considered is not fully available prior to the start of the website boundary detection process. The dynamic approaches presented in this paper are all probabilistic and based on the concept of random walks; three variations are considered: (i) the standard Random Walk (RW), (ii) a Self Avoiding RW and (iii) the Metropolis Hastings RW. The reported evaluation demonstrates that the proposed technique produces good WBD solutions while at the same time reducing the amount of &quot;noise&quot; pages visited. The best performing variation was found to be a Metropolis Hastings RW

    Individualised variable-interval risk-based screening in diabetic retinopathy: the ISDR research programme including RCT

    No full text
    Background Systematic annual screening for sight-threatening diabetic retinopathy is established in several countries but is resource intensive. Personalised (individualised) medicine offers the opportunity to extend screening intervals for people at low risk of progression and to target high-risk groups. However, significant concern exists among all stakeholders around the safety of changing programmes. Evidence to guide decisions is limited, with, to the best of our knowledge, no randomised controlled trials to date. Objectives To develop an individualised approach to screening for sight-threatening diabetic retinopathy and test its acceptability, safety, efficacy and cost-effectiveness. To estimate the changing incidence of patient-centred outcomes. Design A risk calculation engine; a randomised controlled trial, including a within-trial cost-effectiveness study; a qualitative acceptability study; and an observational epidemiological cohort study were developed. A patient and public group was involved in design and interpretation. Setting A screening programme in an English health district of around 450,000 people. Participants People with diabetes aged ≥ 12 years registered with primary care practices in Liverpool. Interventions The risk calculation engine estimated each participant’s risk at each visit of progression to screen-positive diabetic retinopathy (individualised intervention group) and allocated their next appointment at 6, 12 or 24 months (high, medium or low risk, respectively). Main outcome measures The randomised controlled trial primary outcome was attendance at first follow-up assessing the safety of individualised compared with usual screening. Secondary outcomes were overall attendance, rates of screen-positive and sight-threatening diabetic retinopathy, and measures of visual impairment. Cost-effectiveness outcomes were cost/quality-adjusted life year and incremental cost savings. Cohort study outcomes were rates of screen-positive diabetic retinopathy and sight-threatening diabetic retinopathy. Data sources Local screening programme (retinopathy), primary care (demographic, clinical) and hospital outcomes. Methods A seven-person patient and public involvement group was recruited. Data were linked into a purpose-built dynamic data warehouse. In the risk assessment, the risk calculation engine used patient-embedded covariate data, a continuous Markov model, 5-year historical local population data, and most recent individual demographic, retina and clinical data to predict risk of future progression to screen-positive. The randomised controlled trial was a masked, two-arm, parallel assignment, equivalence randomised controlled trial, with an independent trials unit and 1 : 1 allocation to individualised screening (6, 12 or 24 months, determined by risk calculation engine at each visit) or annual screening (control). Cost-effectiveness was assessed using a within-trial analysis over a 2-year time horizon, including NHS and societal perspectives and costs directly observed within the randomised controlled trial. Acceptability was assessed by purposive sampling of 60 people with diabetes and 21 healthcare professionals with semistructured interviews analysed thematically; this was a constant comparative method until saturation. The cohort was an 11-year retrospective/prospective screening population data set. Results In the randomised controlled trial, 4534 participants were randomised: 2097 out of 2265 in the individualised arm (92.6%) and 2224 out of 2269 in the control arm (98.0%) remained after withdrawals. Attendance rates at first follow-up were equivalent (individualised 83.6%, control 84.7%) (difference –1.0%, 95% confidence interval –3.2% to 1.2%). Sight-threatening diabetic retinopathy detection rates were non-inferior: individualised 1.4%, control 1.7% (difference –0.3%, 95% confidence interval –1.1% to 0.5%). In the cost-effectiveness analysis, the mean differences in complete-case quality-adjusted life years (EuroQol-5 Dimensions, five-level version, and Health Utilities Index Mark 3) did not significantly differ from zero. Incremental cost savings per person not including treatment costs were from the NHS perspective £17.34 (confidence interval £17.02 to £17.67) and the societal perspective £23.11 (confidence interval £22.73 to £23.53). In the individualised arm, 43.2% fewer screening appointments were required. In terms of acceptability, changing to variable intervals was acceptable for the majority of people with diabetes and health-care professionals. Annual screening was perceived as unsustainable and an inefficient use of resources. Many people with diabetes and healthcare professionals expressed concerns that 2-year screening intervals may detect referable eye disease too late and might have a negative effect on perceptions about the importance of attendance and diabetes care. The 6-month interval was perceived positively. Among people with dementia, there was considerable misunderstanding about eye-related appointments and care. In the cohort study, the numbers of participants (total 28,384) rose over the 11 years (2006/7, n = 6637; 2016/17, n = 14,864). Annual incidences ranged as follows: screen-positive 4.4–10.6%, due to diabetic retinopathy 2.3–4.6% and sight-threatening diabetic retinopathy 1.3–2.2%. The proportions of screen-positive fell steadily but sight-threatening diabetic retinopathy rates remained stable. Limitations Our findings apply to a single city-wide established English screening programme of mostly white people with diabetes. The cost-effectiveness analysis was over a short timeline for a long-standing disease; the study, however, was designed to test the safety and effectiveness of the screening regimen, not the cost-effectiveness of screening compared with no screening. Cohort data collection was partly retrospective: data were unavailable on people who had developed sight-threatening diabetic retinopathy or died prior to 2013. Conclusions Our randomised controlled trial can reassure stakeholders involved in diabetes care that extended intervals and personalised screening is feasible, where data linkage is possible, and can be safely introduced in established screening programmes with potential cost savings compared with annual screening. Rates of screen-positive diabetic retinopathy and sight-threatening diabetic retinopathy are low and show consistent falls over time. Involvement of patients in research is crucial to success. Future work Future work could include external validation with other programmes followed by scale-up of individualised screening outside a research setting and economic modelling beyond the 2-year time horizon. Trial registration This trial is registered as ISRCTN87561257. Funding This project was funded by the National Institute for Health and Care Research (NIHR) Programme Grants for Applied Research programme and will be published in full in Programme Grants for Applied Research; Vol. 11, No. 6. See the NIHR Journals Library website for further project information. Plain language summary Diabetic retinopathy remains a leading cause of vision loss for people with diabetes. Annual photographic screening allows early detection and prompt treatment in many countries. A new approach is to vary how often this is undertaken at each visit, after calculating each person’s risk of progression; we have called this ‘individualised’ screening. We tested this variable-interval risk-based approach against annual screening in a randomised controlled trial of 4534 people in Liverpool. Six-, 12- or 24-month intervals represented high, medium and low risks, respectively, of becoming screen-positive by the next appointment. With our patient and public involvement group, we designed a computer-based risk-calculation system using personal information from screening, general practitioners and hospitals. Attendance rates at the next screening appointment were similar in the individualised (84%) and annual (control) (85%) groups. Similar amounts of sight-threatening diabetic retinopathy were detected in the two arms (1.4% individualised, 1.7% control), with 43% fewer visits. In this study in a single geographical region, savings estimated to accrue to the NHS per person over 2 years were £17.34 and £23.11 for wider society, but estimates did not include treatment costs. During interviews, 60 people with diabetes and 21 healthcare professionals said that individualised screening would be acceptable because of increasing rates of diabetes and the chance to target high-risk people. Patients had anxieties about the reliability of the risk-calculation and restricting access, requesting opportunities for earlier screening if risks changed. Varying screening intervals based on a person’s own risk of progression appears feasible and safe. Low-risk people would be spared unnecessary appointments. Introducing an individualised approach could now move to wider testing and validation in other UK and international settings. The trial only ran for 2 years and was in a long-established programme with low rates of disease, so monitoring of attendance and retinopathy rates should be included as part of wider testing. Scientific summary Background Diabetic retinopathy remains the most common cause of visual impairment in working populations worldwide. Systematic annual screening to detect and treat sight-threatening diabetic retinopathy (STDR) is established in several countries including the United Kingdom (UK) and has greatly improved the detection of treatable disease. With the rapid increase in the prevalence of diabetes, new approaches to screening are required. Risk stratification, personalised (individualised) medicine, and clinical bioinformatics offer opportunities to extend the screening interval for people at low risk of progression and to target those at high risk. However there is significant concern among people with diabetes, healthcare professionals and health commissioners in the UK around the safety and acceptability of changing an established and successful programme. To ensure that health care is available for all it is imperative that it is not only efficacious but cost-efficient, that is, it delivers care at an affordable cost for commissioners. Evidence to guide these decisions is very limited and there are no randomised controlled trials (RCTs). Much of the data on prevalence and incidence of diabetic retinopathy and progression to more severe stages of disease come from the 1980s to 1990s and before the introduction of systematic screening and improvements in diabetes and blood pressure control. Data on the current progression rates in screening populations are limited and are reported variably. We conducted a programme of quantitative and qualitative research to develop and test the acceptability, safety, efficacy and cost-effectiveness of an individualised approach to screening for STDR. To provide data for future planning of early detection programmes and the design of future research studies we designed a longitudinal observational cohort study of the population of people with diabetes in Liverpool. We aimed to investigate the prevalence and incidence of the stages in the natural history of diabetic retinopathy, namely screen-positive, sight-threatening and treatable disease, and visual impairment, all key patient-centred outcomes in the disease. Changes to the programme and their implications A number of changes to the programme occurred in the early years. A systematic review was published at around the time of the programme start. We used these data to design aspects of the study and conducted a literature review instead. Data for the cohort study proved not to be available on people who had died prior to programme start. We adopted a mixed retrospective and prospective analysis but this added some difficulty in interpreting the findings. In 2016 resources were repurposed from the cohort study to support recruitment of the RCT, which was behind schedule. The scope of the cohort study was narrowed to focus only on the data available from the screening programme, meaning that we were unable to complete data collection on the patient-centred outcomes after sight-threatening retinopathy had been confirmed, namely treatment and visual impairment. Recruitment of non-attenders proved beyond the resources available and was abandoned. Further research is needed to address these unanswered questions. Methods The programme included the development of a bespoke data warehouse, the development of a risk-calculation engine, a RCT, a within-trial cost-effectiveness study, a qualitative study of acceptability, and an observational cohort study. We recruited and fully embedded a seven-person patient and public involvement group which was maintained throughout the programme. The setting was the screening programme of a single English health district (clinical commissioning group). All people with diabetes aged 12 years and over registered with general practices in the Liverpool city area were invited by letter to participate in the research programme. Consent was through an opt-out process. A purpose-built dynamic data warehouse was developed to support all aspects of the programme including bespoke processes addressing inconsistency in routinely collected data, multiple data platforms and lack of technical manuals. Data on eligible participants were sourced from primary care (demographic, clinical), Liverpool Diabetes Eye Screening Programme (retinopathy) and hospital outcomes (true screen-positive, STDR). Data were cross referenced against opt-out records before further use within the programme. A risk-calculation engine was developed for the assessment of individual risk of progressing to screen-positive using patient-focused covariate selection, a continuous-time Markov model, 5-year historical local population data, and the most recent individual demographic, retina and clinical data. The risk-calculation engine was linked to the data warehouse. The Patient and Public Involvement group determined the risk criterion of < 2.5% and screening intervals of 6, 12 and 24 months. These were set as high-, medium- and low-risk for progression to screen-positive respectively. We conducted a masked, two-arm, parallel assignment, equivalence, RCT with design input, monitoring and analysis by an independent trial unit. Recruitment took place at screening clinics with invitations given to all attenders. After informed (opt-in) consent, participants were allocated 1 : 1 to either annual (control) or variable-interval risk-based screening (individualised, intervention arm) at 6, 12 or 24 months determined by the risk-calculation engine. At each visit the risk was recalculated and the interval reassigned. Our primary hypothesis was that attendance rates at first follow-up (primary outcome) would be equivalent in the two arms with a 5% equivalence margin. We allowed up to 90 days for attendance after invitation. The estimated minimum sample size was 4460 for 90% statistical power, a 2.5% one-sided type 1 error, assuming the same attendance rate in both arms and allowing for 6% per annum loss over 24 months. Our secondary hypothesis was that detection of STDR was non-inferior in the individualised arm at a prespecified margin of 1.5%. Our analyses followed a per-protocol approach supported by secondary intention-to-treat and multiple imputation analyses. Other secondary outcomes measured efficacy (rates of screen-positive and STDR) and cost-effectiveness [cost/quality-adjusted life year (QALY), incremental cost savings]. A sample (n = 868) of the first participants enrolled into the RCT completed a set of health economics questionnaires. These included a detailed health resource use questionnaire and quality-of-life measures (EQ5D-5L and HUI3). We measured the current cost of screening and the cost of non-attendance using a detailed micro-costing approach to establish the NHS cost of diabetic retinopathy screening. The analysis was within the trial. Clinic costs were observed for each service delivery setting across a sample of clinics. Costs were also directly measured from the patient-based trial case record forms (CRF) and detailed NHS and hospital costs for areas such as photography and grading. As such this was a mixed-methods approach to obtain an accurate picture of screening costs. We developed a bespoke questionnaire to measure participant and companion costs and undertook a detailed workplace analysis to measure resources and staff time. We estimated the additional costs of running the risk-calculation engine using a screen population size of 22,000 (Liverpool). The within-trial cost-effectiveness of the options was investigated over a 2-year time horizon assessed as cost per QALY for NHS and societal costs using the EQ5D-5L and HUI3 instruments at baseline and follow-up visits including ingredient-based costing and patient resource-use data. Incremental cost savings per person screened were calculated. Multiple imputation was conducted with chained equations using available case data. For the qualitative study, semi-structured interviews were used with 60 people with diabetes (30 before and 30 after introduction of individualised screening) and 21 healthcare professional participants involved in eye screening identified using purposive sampling. Recruitment was by specific letter and included an additional ‘opt-in’ consent process. Interview data were analysed thematically using the constant comparative method until saturation. The observational cohort study was conducted on data in the data warehouse listed above after exclusion of people who opted out. The values for time-dependent variables closest to the screening episode were used. Incidences of screen-positive, screen-positive due to diabetic retinopathy and STDR in the population undergoing active screening were estimated. STDR was determined from hospital records of slit lamp examinations within 90 days of the screening episode. Data were corrected for the effects of censoring prior to the commencement of the programme in 2013 and recruitment to the individualised arm of the RCT. Reporting followed CONSORT 2016 guidelines for non-inferiority and equivalence trials, and CHEERS, COREQ and STROBE guidelines. Results Data warehouse: Accessing data from multiple platforms was challenging due to poor or absent documentation. Quality of data entry was highly variable, requiring bespoke solutions. Data were available by the end of the programme (January 2020) on 28,384 individuals with 318,053,075 data items. RCT: 4534 participants were randomised: 2097/2265 (individualised arm) and 2224/2269 (control arm) remained after withdrawals. Attendance rates at first follow-up were equivalent (control 84.7%, individualised 83.6%) (difference –1.0, 95% confidence interval –3.2 to 1.2). Within the individualised arm equivalence was found for the low-risk group [control 85.7%, individualised 85.1%, difference –0.6% (–2.9 to 1.7)]. For the medium-risk group the difference in attendance was small but equivalence was not confirmed due to the relatively wide confidence interval. In the high-risk group attendance was lower in the individualised group [control 77.3%, individualised 72.3%, difference 5.0% (–13.6 to 3.5)]. STDR detection rates were non-inferior: individualised 1.4%, control 1.7% (–0.3, –1.1 to 0.5). Sensitivity analyses confirmed these findings. No clinically significant worsening of diabetes control was detected and no effect on rates of visual acuity or visual impairment. 43.2% fewer screening appointments were required in the individualised arm. Within the individualised arm the high-risk group had the highest screen-positive rate [high 10.72% (34/317), medium 6.02% (15/249), low 3.7% (53/1442)]. The risk-calculation engine and data warehouse were stable. Cost-effectiveness study: Summary costs (2019/2020 values) associated with the screening programme to the NHS were £28.73 per attendance and £12.73 per non-attendance, while additional productivity losses and out-of-pocket payments by the patient accounted for £9.00. There was dominance of individualised screening in terms of QALYs gained (non-significant) and cost savings. Mean differences in complete case QALYs did not significantly differ from zero. Incremental cost savings per person (not including treatment costs) were: £17.34 (17.02 to 17.67), NHS perspective; £23.11 (22.73 to 23.53), societal perspective. Qualitative study: For the majority of both people with diabetes and healthcare professionals changing to variable intervals was perceived as acceptable. Annual screening was perceived as unsustainable against the increasing diabetes prevalence and to be an inefficient use of resources. Many people with diabetes and healthcare professionals expressed concerns that 2-year screening intervals might detect referable disease too late and would have a negative effect upon perceptions about the importance of attendance and diabetes care. The 6-month interval for the high-risk group was perceived positively as medical reassurance. Among people with diabetes, there was considerable conflation and misunderstanding about different eye-related appointments and care. Observational cohort study: Numbers of participants rose from 6637 (2006/7) to 14,796 (2016/17). After exclusions/ineligibility (opt-out was 7.1% over the programme) data from 28,384 PWD were available for analysis. Annual incidences (%) in the screened population were: screen-positive [4.4–10.6, due to diabetic retinopathy (2.3–4.6)], STDR (1.3–2.2). Rates of screen-positive and screen-positive from diabetic retinopathy dropped over the 11 years. STDR remained stable with 53.8% being true-positive. R
    corecore