101,787 research outputs found
frances: a deep learning NLP and text mining web tool to unlock historical digital collections : a case study on the Encyclopaedia Britannica
Funding: This work was supported by the NLS Digital Fellowship and by the Google Cloud Platform research credit program.This work presents frances, an integrated text mining tool that combines information extraction, knowledge graphs, NLP, deep learning, parallel processing and Semantic Web techniques to unlock the full value of historical digital textual collections, offering new capabilities for researchers to use powerful analysis methods without being distracted by the technology and middleware details. To demonstrate these capabilities, we use the first eight editions of the Encyclopaedia Britannica offered by the National Library of Scotland (NLS) as an example digital collection to mine and analyse. We have developed novel parallel heuristics to extract terms from the original collection (alongside metadata), which provides a mix of unstructured and semi-structured input data, and populated a new knowledge graph with this information. Our Natural Language Processing models enable frances to perform advanced analyses that go significantly beyond simple search using the information stored in the knowledge graph. Furthermore, frances also allows for creating and running complex text mining analyses at scale. Our results show that the novel computational techniques developed within frances provide a vehicle for researchers to formalize and connect findings and insights derived from the analysis of large-scale digital corpora such as the Encyclopaedia Britannica.Postprin
Digital methods to enhance the usefulness of patient experience data in services for long-term conditions: the DEPEND mixed-methods study
Background
Collecting NHS patient experience data is critical to ensure the delivery of high-quality services. Data are obtained from multiple sources, including service-specific surveys and widely used generic surveys. There are concerns about the timeliness of feedback, that some groups of patients and carers do not give feedback and that free-text feedback may be useful but is difficult to analyse.
Objective
To understand how to improve the collection and usefulness of patient experience data in services for people with long-term conditions using digital data capture and improved analysis of comments.
Design
The DEPEND study is a mixed-methods study with four parts: qualitative research to explore the perspectives of patients, carers and staff; use of computer science text-analytics methods to analyse comments; co-design of new tools to improve data collection and usefulness; and implementation and process evaluation to assess use of the tools and any impacts.
Setting
Services for people with severe mental illness and musculoskeletal conditions at four sites as exemplars to reflect both mental health and physical long-terms conditions: an acute trust (site A), a mental health trust (site B) and two general practices (sites C1 and C2).
Participants
A total of 100 staff members with diverse roles in patient experience management, clinical practice and information technology; 59 patients and 21 carers participated in the qualitative research components.
Interventions
The tools comprised a digital survey completed using a tablet device (kiosk) or a pen and paper/online version; guidance and information for patients, carers and staff; text-mining programs; reporting templates; and a process for eliciting and recording verbal feedback in community mental health services.
Results
We found a lack of understanding and experience of the process of giving feedback. People wanted more meaningful and informal feedback to suit local contexts. Text mining enabled systematic analysis, although challenges remained, and qualitative analysis provided additional insights. All sites managed to collect feedback digitally; however, there was a perceived need for additional resources, and engagement varied. Observation indicated that patients were apprehensive about using kiosks but often would participate with support. The process for collecting and recording verbal feedback in mental health services made sense to participants, but was not successfully adopted, with staff workload and technical problems often highlighted as barriers. Staff thought that new methods were insightful, but observation did not reveal changes in services during the testing period.
Conclusions
The use of digital methods can produce some improvements in the collection and usefulness of feedback. Context and flexibility are important, and digital methods need to be complemented with alternative methods. Text mining can provide useful analysis for reporting on large data sets within large organisations, but qualitative analysis may be more useful for small data sets and in small organisations.
Limitations
New practices need time and support to be adopted and this study had limited resources and a limited testing time.
Future work
Further research is needed to improve text-analysis methods for routine use in services and to evaluate the impact of methods (digital and non-digital) on service improvement in varied contexts and among diverse patients and carers.
Funding
This project was funded by the NIHR Health Services and Delivery Research programme and will be published in full in Health Services and Delivery Research; Vol. 8, No. 28. See the NIHR Journals Library website for further project information
Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd
Template Mining for Information Extraction from Digital Documents
published or submitted for publicatio
Text Analytics for Android Project
Most advanced text analytics and text mining tasks include text classification, text clustering, building ontology, concept/entity extraction, summarization, deriving patterns within the structured data, production of granular taxonomies, sentiment and emotion analysis, document summarization, entity relation modelling, interpretation of the output. Already existing text analytics and text mining cannot develop text material alternatives (perform a multivariant design), perform multiple criteria analysis,
automatically select the most effective variant according to different aspects (citation index of papers (Scopus, ScienceDirect, Google Scholar) and authors (Scopus, ScienceDirect, Google Scholar), Top 25 papers, impact factor of journals, supporting phrases, document name and contents, density of keywords), calculate utility degree and market value. However, the Text Analytics for Android Project can perform the aforementioned functions. To the best of the knowledge herein, these functions have not been previously implemented; thus this is the first attempt to do so. The Text Analytics for Android Project is briefly described in this article
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE
Presented at the 2006 ACM/IEEE Joint Conference on Digital Library (JCDL 2006), June 11-15, 2006, Chapel Hill, NC, USA. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Conference-papers/JCDL06.pdf.Document clustering has been used for better document retrieval,
document browsing, and text mining in digital library. In this
paper, we perform a comprehensive comparison study of various
document clustering approaches such as three hierarchical
methods (single-link, complete-link, and complete link), Bisecting
K-means, K-means, and Suffix Tree Clustering in terms of the
efficiency, the effectiveness, and the scalability. In addition, we
apply a domain ontology to document clustering to investigate if
the ontology such as MeSH improves clustering qualify for
MEDLINE articles. Because an ontology is a formal, explicit
specification of a shared conceptualization for a domain of
interest, the use of ontologies is a natural way to solve traditional
information retrieval problems such as synonym/hypernym/
hyponym problems. We conducted fairly extensive experiments
based on different evaluation metrics such as misclassification
index, F-measure, cluster purity, and Entropy on very large article
sets from MEDLINE, the largest biomedical digital library in
biomedicine
Recommended from our members
Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software, and Other Digital Artifacts
From manuscript catalogues to a handbook of Syriac literature: Modeling an infrastructure for Syriaca.org
Despite increasing interest in Syriac studies and growing digital
availability of Syriac texts, there is currently no up-to-date infrastructure
for discovering, identifying, classifying, and referencing works of Syriac
literature. The standard reference work (Baumstark's Geschichte) is over ninety
years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be
accessed only through disparate catalogues and databases. The present article
proposes a tentative data model for Syriaca.org's New Handbook of Syriac
Literature, an open-access digital publication that will serve as both an
authority file for Syriac works and a guide to accessing their manuscript
representations, editions, and translations. The authors hope that by
publishing a draft data model they can receive feedback and incorporate
suggestions into the next stage of the project.Comment: Part of special issue: Computer-Aided Processing of Intertextuality
in Ancient Languages. 15 pages, 4 figure
The European Commission's public consultation on the review of EU copyright rules: a response by the CREATe Centre
No abstract available
- …