99 research outputs found
Document boundary determination using structural and lexical analysis
A method of sequentially presented document determination using parallel analyses from various facets of structural document understanding and information retrieval is proposed in this thesis. Specifically, the method presented here intends to serve as a trainable system when determining where one document ends and another begins. Content analysis methods include use of the Vector Space Model, as well as targeted analysis of content on the margins of document fragments. Structural analysis for this implementation has been limited to simple and ubiquitous entities, such as software-generated zones, simple format-specific lines, and the appearance of page numbers. Analysis focuses on change in similarity between comparisons, with the emphasis placed on the fact that the extremities of documents tend to contain significant structural and lexical changes that can be observed and quantified. We combine the various features using nonlinear approximation (neural network) and experimentally test the usefulness of the combinations
Disrupting White Fragility and Colorblind Racism: Using Games to Measure How Race and Ethnicity Courses Change Students’ Racial Ideologies
This research provides instructors teaching race and ethnicity a tool to assess the racial ideologies of their students in the form of “race talk.” In particular, Bonilla-Silva’s (2010) concepts denoting colorblindness and DiAngelo’s (2018) concept of white fragility were measured before and after completing one race and ethnicity course by having students play a live version of the game “Guess Who” (Hasbro Co.). At the end of the course, student responses during the game, and their subsequent reflections, revealed a significant decrease in white fragility. Using this game, instructors can assess students’ racial ideologies and whether or not they have acquired an improved understanding of systemic inequalities by analyzing changes in students’ race talk
Quantifying Benthic Exchange of Fine Sediment via Continuous, Noninvasive Measurements of Settling Velocity and Bed Erodibility
Virginia Institute of Marine Scienc
Evidence for models of diagnostic service provision in the community: literature mapping exercise and focused rapid reviews
Background
Current NHS policy favours the expansion of diagnostic testing services in community and primary care settings.
Objectives
Our objectives were to identify current models of community diagnostic services in the UK and internationally and to assess the evidence for quality, safety and clinical effectiveness of such services. We were also interested in whether or not there is any evidence to support a broader range of diagnostic tests being provided in the community.
Review methods
We performed an initial broad literature mapping exercise to assess the quantity and nature of the published research evidence. The results were used to inform selection of three areas for investigation in more detail. We chose to perform focused reviews on logistics of diagnostic modalities in primary care (because the relevant issues differ widely between different types of test); diagnostic ultrasound (a key diagnostic technology affected by developments in equipment); and a diagnostic pathway (assessment of breathlessness) typically delivered wholly or partly in primary care/community settings. Databases and other sources searched, and search dates, were decided individually for each review. Quantitative and qualitative systematic reviews and primary studies of any design were eligible for inclusion.
Results
We identified seven main models of service that are delivered in primary care/community settings and in most cases with the possible involvement of community/primary care staff. Not all of these models are relevant to all types of diagnostic test. Overall, the evidence base for community- and primary care-based diagnostic services was limited, with very few controlled studies comparing different models of service. We found evidence from different settings that these services can reduce referrals to secondary care and allow more patients to be managed in primary care, but the quality of the research was generally poor. Evidence on the quality (including diagnostic accuracy and appropriateness of test ordering) and safety of such services was mixed.
Conclusions
In the absence of clear evidence of superior clinical effectiveness and cost-effectiveness, the expansion of community-based services appears to be driven by other factors. These include policies to encourage moving services out of hospitals; the promise of reduced waiting times for diagnosis; the availability of a wider range of suitable tests and/or cheaper, more user-friendly equipment; and the ability of commercial providers to bid for NHS contracts. However, service development also faces a number of barriers, including issues related to staffing, training, governance and quality control.
Limitations
We have not attempted to cover all types of diagnostic technology in equal depth. Time and staff resources constrained our ability to carry out review processes in duplicate. Research in this field is limited by the difficulty of obtaining, from publicly available sources, up-to-date information about what models of service are commissioned, where and from which providers.
Future work
There is a need for research to compare the outcomes of different service models using robust study designs. Comparisons of ‘true’ community-based services with secondary care-based open-access services and rapid access clinics would be particularly valuable. There are specific needs for economic evaluations and for studies that incorporate effects on the wider health system. There appears to be no easy way of identifying what services are being commissioned from whom and keeping up with local evaluations of new services, suggesting a need to improve the availability of information in this area.
Funding
The National Institute for Health Research Health Services and Delivery Research programme
Template Induction over Unstructured Email Corpora
Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. machine generated HTML emails. However much less work has been done in performing the same task over unstructured email data. We propose a technique for inducing high quality templates from plain text emails at scale based on the suffix array data structure. We evaluate this method against an industry-standard approach for finding similar content based on shingling, running both algorithms over two corpora: a synthetically created email corpus for a high level of experimental control, as well as user-generated emails from the well-known Enron email corpus. Our experimental results show that the proposed method is more robust to variations in cluster quality than the baseline and templates contain more text from the emails, which would benefit extraction tasks by identifying transient parts of the emails. Our study indicates templates induced using suffix arrays contain approximately half as much noise (measured as entropy) as templates induced using shingling. Furthermore, the suffix array approach is substantially more scalable, proving to be an order of magnitude faster than shingling even for modestly-sized training clusters. Public corpus analysis shows that email clusters contain on average 4 segments of common phrases, where each of the segments contains on average 9 words, thus showing that templatization could help users reduce the email writing effort by an average of 35 words per email in an assistance or auto-reply related task
Declarative Experimentation in Information Retrieval Using PyTerrier
The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures. We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design. Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines. Further, we can automatically optimise the retrieval pipelines to increase their efficiency to suite a particular IR platform backend. Our experiments, conducted on TREC Robust and ClueWeb09 test collections, demonstrate the efficiency benefits of these optimisations for retrieval pipelines involving both the Anserini and Terrier IR platforms
- …
