311 research outputs found
Evolving Lucene search queries for text classification
We describe a method for generating accurate, compact, human
understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct
Lucene search queries. Genetic programs acquire fitness by
producing queries that are effective binary classifiers for a
particular category when evaluated against a set of training
documents. We describe a set of functions and terminals and
provide results from classification tasks
A tool for creating and visualising formal concept trees
This paper presents a tool for creating and visualising formal concept trees. The concept tree provides an alternative visualisation to the more commonly known concept lattice. The tool described here is an extension of the In-Close formal concept mining program, where concepts
are output in a format that can be visualised in a Web Browser using the Collapsible Tree Layout from the D3.js JavaScript library. Because the visualisation is expandable and collapsible, the tool is able to deal with large
trees and the user is able to explore branches with single mouse clicks and by panning and zooming the tree. So-called ‘iceberg trees’ can also be produced, by specifying a minimum support for objects
A comparison of Lucene search queries evolved as text classifiers
In this article, we use a genetic algorithm to evolve seven
different types of Lucene search query with the objective of
generating accurate and readable text classifiers. We compare
the effectiveness of each of the different types of query using
three commonly used text datasets. We vary the number of
words available for classification and compare results for 4, 8,
and 16 words per category. The generated queries can also be
viewed as labels for the categories and there is a benefit to a
human analyst in being able to read and tune the classifier.
The evolved queries also provide an explanation of the classification
process. We consider the consistency of the classifiers
and compare their performance on categories of different
complexities. Finally, various approaches to the analysis of
the results are briefly explored
Document clustering with evolved search queries
Search queries define a set of documents located in a collection and can be used to rank the documents by assigning each document a score according to their closeness to the query in the multidimensional space of weighted terms. In this paper, we describe a system whereby an island model genetic algorithm (GA) creates individuals which can generate a set of Apache Lucene search queries for the purpose of text document clustering. A cluster is specified by the documents returned by a single query in the set. Each document that is included in only one of the clusters adds to the fitness of the individual and each document that is included in more than one cluster will reduce the fitness. The method can be refined by using the ranking score of each document in the fitness test. The system has a number of advantages; in particular, the final search queries are easily understood and offer a simple explanation of the clusters, meaning that an extra cluster labelling stage is not required. We describe how the GA can be used to build queries and show results for clustering on various data sets and with different query sizes. Results are also compared with clusters built using the widely used k-means algorithm
Optimal Insulin Delivery
Insulin therapy is only effective if it is delivered into the right tissue in the right way. Exogenous insulin is intended for the subcutaneous (SC) tissue, not the muscle or skin. If delivered into the latter, its absorption (pharmacokinetics (PK)) and action (pharmacodynamics (PD)) are unpredictable, which often leads to poor glucose control. Correct insulin therapy begins with matching the insulin to the site used. Typically, four sites are used for insulin injection or infusion: the abdomen lateral to the umbilicus all the way to the flanks, the anterior lateral upper half of the thigh, the deltoid region of the arm, and the upper outer quadrant of the buttocks. Regular insulin and neutral protamine Hagedorn (NPH) are both absorbed more rapidly from the arm and abdominal sites and more slowly from the thigh and buttocks. The newer insulin analogs, both rapid- and slow-acting, do not appear to be influenced by the site used for injection. In order to avoid intramuscular (IM) injections, patients should use the shortest needles currently available (the 4-mm pen needle and the 6-mm syringe needle). Very young children should raise a skin fold and inject into it even when using the 4-mm needle. Giving injections with the 6-mm needle at a 45° angle converts this needle into the equivalent of the 4 mm. Injection sites should be rigorously rotated, with the new injection being approximately 1 cm from previous injections. This measure helps prevent the most common complication of injection therapy, lipohypertrophy (LH). Injecting into LH leads to unstable PK and PD and deregulated glucose control, manifested as unexpected hypoglycemia, glycemic variability, and elevated HbA1c values. Comprehensive insulin deliver recommendations have recently been published
Elimination of pain improves specificity of clinical diagnostic criteria for adult chronic rhinosinusitis
Objective
Determine whether the elimination of pain improves accuracy of clinical diagnostic criteria for adult chronic rhinosinusitis. Study Design
Retrospective cohort study. Methods
History, symptoms, nasal endoscopy, and computed tomography (CT) results were analyzed for 1,186 adults referred to an academic otolaryngology clinic with presumptive diagnosis of chronic rhinosinusitis. Clinical diagnosis was rendered using the 1997 Rhinosinusitis Taskforce (RSTF) Guidelines and a modified version eliminating facial pain, ear pain, dental pain, and headache. Results
Four hundred seventy-nine subjects (40%) met inclusion criteria. Among subjects positive by RSTF guidelines, 45% lacked objective evidence of sinonasal inflammation by CT, 48% by endoscopy, and 34% by either modality. Applying modified RSTF diagnostic criteria, 39% lacked sinonasal inflammation by CT, 38% by endoscopy, and 24% by either modality. Using either abnormal CT or endoscopy as the reference standard, modified diagnostic criteria yielded a statistically significant increase in specificity from 37.1% to 65.1%, with a nonsignificant decrease in sensitivity from 79.2% to 70.3%. Analysis of comorbidities revealed temporomandibular joint disorder, chronic cervical pain, depression/anxiety, and psychiatric medication use to be negatively associated with objective inflammation on CT or endoscopy. Conclusion
Clinical diagnostic criteria overestimate the prevalence of chronic rhinosinusitis. Removing facial pain, ear pain, dental pain, and headache increased specificity without a concordant loss in sensitivity. Given the high prevalence of sinusitis, improved clinical diagnostic criteria may assist primary care providers in more accurately predicting the presence of inflammation, thereby reducing inappropriate antibiotic use or delayed referral for evaluation of primary headache syndromes. Level of Evidence4. Laryngoscope, 127:1011-1016, 201
Towards a social media research methodology: Defining approaches and ethical concerns
Social media research and suitable methodologies and ethical approaches for analysing social
media data are still emerging. This paper presents a
methodology for projects using social media data alongside
consideration of ethics within the social media analysis
context. Earlier stages of the methodology will be expanded
to develop a strategy for examining ethics alongside
consideration of the relevant analysis techniques that may be employed. This will provide a comprehensive methodology
that will provide a springboard for the clear and ethically
sound scrutiny of social media data. We aim to present the
challenges of using social media data, while the inclusion of ethical and legal aspects in this paper aim to draw
researchers' attention to the peculiarity issues involved with dealing with social media data
Towards a cloud migration decision support system for Small and Medium enterprises in Tamil Nadu
Cloud computing is a promising computing paradigm which has the potential to speed up Information Technology adoption among SMEs in developing economies like India. The user friendly, pay per use cloud computing model offers SMEs access to highly scalable and reliable cloud infrastructure without having to invest on buying and maintaining expensive Information Technology resources. However, moving data and application to a cloud infrastructure is not straightforward and can be very challenging as decision makers need to consider numerous aspects before deciding to adopt cloud infrastructure. A review of the literature reveals that there are frameworks available to support cloud migration. However, there are no frameworks, models or tools available to support the whole cloud migration process. This research aims to fill that gap by proposing a conceptual framework for cloud migration decision support system targeted for SMEs in Tamil Nadu
Evolving text classification rules with genetic programming
We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications
- …