30,787 research outputs found

    Web Content Extraction - a Meta-Analysis of its Past and Thoughts on its Future

    Full text link
    In this paper, we present a meta-analysis of several Web content extraction algorithms, and make recommendations for the future of content extraction on the Web. First, we find that nearly all Web content extractors do not consider a very large, and growing, portion of modern Web pages. Second, it is well understood that wrapper induction extractors tend to break as the Web changes; heuristic/feature engineering extractors were thought to be immune to a Web site's evolution, but we find that this is not the case: heuristic content extractor performance also tends to degrade over time due to the evolution of Web site forms and practices. We conclude with recommendations for future work that address these and other findings.Comment: Accepted for publication in SIGKDD Exploration

    ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

    Full text link
    Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US

    Digital support interventions for the self-management of low back pain: a systematic review

    Get PDF
    Background: Low back pain (LBP) is a common cause of disability and is ranked as the most burdensome health condition globally. Self-management, including components on increased knowledge, monitoring of symptoms, and physical activity, are consistently recommended in clinical guidelines as cost-effective strategies for LBP management and there is increasing interest in the potential role of digital health. Objective: The study aimed to synthesize and critically appraise published evidence concerning the use of interactive digital interventions to support self-management of LBP. The following specific questions were examined: (1) What are the key components of digital self-management interventions for LBP, including theoretical underpinnings? (2) What outcome measures have been used in randomized trials of digital self-management interventions in LBP and what effect, if any, did the intervention have on these? and (3) What specific characteristics or components, if any, of interventions appear to be associated with beneficial outcomes? Methods: Bibliographic databases searched from 2000 to March 2016 included Medline, Embase, CINAHL, PsycINFO, Cochrane Library, DoPHER and TRoPHI, Social Science Citation Index, and Science Citation Index. Reference and citation searching was also undertaken. Search strategy combined the following concepts: (1) back pain, (2) digital intervention, and (3) self-management. Only randomized controlled trial (RCT) protocols or completed RCTs involving adults with LBP published in peer-reviewed journals were included. Two reviewers independently screened titles and abstracts, full-text articles, extracted data, and assessed risk of bias using Cochrane risk of bias tool. An independent third reviewer adjudicated on disagreements. Data were synthesized narratively. Results: Of the total 7014 references identified, 11 were included, describing 9 studies: 6 completed RCTs and 3 protocols for future RCTs. The completed RCTs included a total of 2706 participants (range of 114-1343 participants per study) and varied considerably in the nature and delivery of the interventions, the duration/definition of LBP, the outcomes measured, and the effectiveness of the interventions. Participants were generally white, middle aged, and in 5 of 6 RCT reports, the majority were female and most reported educational level as time at college or higher. Only one study reported between-group differences in favor of the digital intervention. There was considerable variation in the extent of reporting the characteristics, components, and theories underpinning each intervention. None of the studies showed evidence of harm. Conclusions: The literature is extremely heterogeneous, making it difficult to understand what might work best, for whom, and in what circumstances. Participants were predominantly female, white, well educated, and middle aged, and thus the wider applicability of digital self-management interventions remains uncertain. No information on cost-effectiveness was reported. The evidence base for interactive digital interventions to support patient self-management of LBP remains weak

    Semantic learning webs

    Get PDF
    By 2020, microprocessors will likely be as cheap and plentiful as scrap paper,scattered by the millions into the environment, allowing us to place intelligent systems everywhere. This will change everything around us, including the nature of commerce, the wealth of nations, and the way we communicate, work, play, and live. This will give us smart homes, cars, TVs , jewellery, and money. We will speak to our appliances, and they will speak back. Scientists also expect the Internet will wire up the entire planet and evolve into a membrane consisting of millions of computer networks, creating an “intelligent planet.” The Internet will eventually become a “Magic Mirror” that appears in fairy tales, able to speak with the wisdom of the human race. Michio Kaku, Visions: How Science Will Revolutionize the Twenty - First Century, 1998 If the semantic web needed a symbol, a good one to use would be a Navaho dream-catcher: a small web, lovingly hand-crafted, [easy] to look at, and rumored to catch dreams; but really more of a symbol than a reality. Pat Hayes, Catching the Dreams, 2002 Though it is almost impossible to envisage what the Web will be like by the end of the next decade, we can say with some certainty that it will have continued its seemingly unstoppable growth. Given the investment of time and money in the Semantic Web (Berners-Lee et al., 2001), we can also be sure that some form of semanticization will have taken place. This might be superficial - accomplished simply through the addition of loose forms of meta-data mark-up, or more principled – grounded in ontologies and formalised by means of emerging semantic web standards, such as RDF (Lassila and Swick, 1999) or OWL (Mc Guinness and van Harmelen, 2003). Whatever the case, the addition of semantic mark-up will make at least part of the Web more readily accessible to humans and their software agents and will facilitate agent interoperability. If current research is successful there will also be a plethora of e-learning platforms making use of a varied menu of reusable educational material or learning objects. For the learner, the semanticized Web will, in addition, offer rich seams of diverse learning resources over and above the course materials (or learning objects) specified by course designers. For instance, the annotation registries, which provide access to marked up resources, will enable more focussed, ontologically-guided (or semantic) search. This much is already in development. But we can go much further. Semantic technologies make it possible not only to reason about the Web as if it is one extended knowledge base but also to provide a range of additional educational semantic web services such as summarization, interpretation or sense-making, structure-visualization, and support for argumentation

    The clinical effectiveness of individual behaviour change interventions to reduce risky sexual behaviour after a negative human immunodeficiency virus test in men who have sex with men: systematic and realist reviews and intervention development

    Get PDF
    Background: Men who have sex with men (MSM) experience significant inequalities in health and well-being. They are the group in the UK at the highest risk of acquiring a human immunodeficiency virus (HIV) infection. Guidance relating to both HIV infection prevention, in general, and individual-level behaviour change interventions, in particular, is very limited. Objectives: To conduct an evidence synthesis of the clinical effectiveness of behaviour change interventions to reduce risky sexual behaviour among MSM after a negative HIV infection test. To identify effective components within interventions in reducing HIV risk-related behaviours and develop a candidate intervention. To host expert events addressing the implementation and optimisation of a candidate intervention. Data sources: All major electronic databases (British Education Index, BioMed Central, Cumulative Index to Nursing and Allied Health Literature, EMBASE, Educational Resource Index and Abstracts, Health and Medical Complete, MEDLINE, PsycARTICLES, PsycINFO, PubMed and Social Science Citation Index) were searched between January 2000 and December 2014. Review methods: A systematic review of the clinical effectiveness of individual behaviour change interventions was conducted. Interventions were examined using the behaviour change technique (BCT) taxonomy, theory coding assessment, mode of delivery and proximity to HIV infection testing. Data were summarised in narrative review and, when appropriate, meta-analysis was carried out. Supplemental analyses for the development of the candidate intervention focused on post hoc realist review method, the assessment of the sequential delivery and content of intervention components, and the social and historical context of primary studies. Expert panels reviewed the candidate intervention for issues of implementation and optimisation. Results: Overall, trials included in this review (n = 10) demonstrated that individual-level behaviour change interventions are effective in reducing key HIV infection risk-related behaviours. However, there was considerable clinical and methodological heterogeneity among the trials. Exploratory meta-analysis showed a statistically significant reduction in behaviours associated with high risk of HIV transmission (risk ratio 0.75, 95% confidence interval 0.62 to 0.91). Additional stratified analyses suggested that effectiveness may be enhanced through face-to-face contact immediately after testing, and that theory-based content and BCTs drawn from ‘goals and planning’ and ‘identity’ groups are important. All evidence collated in the review was synthesised to develop a candidate intervention. Experts highlighted overall acceptability of the intervention and outlined key ways that the candidate intervention could be optimised to enhance UK implementation. Limitations: There was a limited number of primary studies. All were from outside the UK and were subject to considerable clinical, methodological and statistical heterogeneity. The findings of the meta-analysis must therefore be treated with caution. The lack of detailed intervention manuals limited the assessment of intervention content, delivery and fidelity. Conclusions: Evidence regarding the effectiveness of behaviour change interventions suggests that they are effective in changing behaviour associated with HIV transmission. Exploratory stratified meta-analyses suggested that interventions should be delivered face to face and immediately after testing. There are uncertainties around the generalisability of these findings to the UK setting. However, UK experts found the intervention acceptable and provided ways of optimising the candidate intervention. Future work: There is a need for well-designed, UK-based trials of individual behaviour change interventions that clearly articulate intervention content and demonstrate intervention fidelity
    • 

    corecore