5 research outputs found

    Text mining with exploitation of user\u27s background knowledge : discovering novel association rules from text

    Get PDF
    The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments. This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user\u27s background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (useroriented novelty). The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more the shorter the distance. The latter considers the common connections of with others in the concept hierarchy. It is defined as the length of the connecting the two keywords in the concept hierarchy: the longer the path, distance. The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the useroriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules

    Web-based strategies in the manufacturing industry

    Get PDF
    The explosive growth of Internet-based architectures is allowing an efficient access to information resources over geographically dispersed areas. This fact is exerting a major influence on current manufacturing practices. Business activities involving customers, partners, employees and suppliers are being rapidly and efficiently integrated through networked information management environments. Therefore, efforts are required to take advantage of distributed infrastructures that can satisfy information integration and collaborative work strategies in corporate environments. In this research, Internet-based distributed solutions focused on the manufacturing industry are proposed. Three different systems have been developed for the tooling sector, specifically for the company Seco Tools UK Ltd (industrial collaborator). They are summarised as follows. SELTOOL is a Web-based open tool selection system involving the analysis of technical criteria to establish appropriate selection of inserts, toolholders and cutting data for turning, threading and grooving operations. It has been oriented to world-wide Seco customers. SELTOOL provides an interactive and crossed-way of searching for tooling parameters, rather than conventional representation schemes provided by catalogues. Mechanisms were developed to filter, convert and migrate data from different formats to the database (SQL-based) used by SELTOOL.TTS (Tool Trials System) is a Web-based system developed by the author and two other researchers to support Seco sales engineers and technical staff, who would perform tooling trials in geographically dispersed machining centres and benefit from sharing data and results generated by these tests. Through TTS tooling engineers (authorised users) can submit and retrieve highly specific technical tooling data for both milling and turning operations. Moreover, it is possible for tooling engineers to avoid the execution of new tool trials knowing the results of trials carried out in physically distant places, when another engineer had previously executed these trials. The system incorporates encrypted security features suitable for restricted use on the World Wide Web. An urgent need exists for tools to make sense of raw data, extracting useful knowledge from increasingly large collections of data now being constructed and made available from networked information environments. This explosive growth in the availability of information is overwhelming the capabilities of traditional information management systems, to provide efficient ways of detecting anomalies and significant patterns in large sets of data. Inexorably, the tooling industry is generating valuable experimental data. It is a potential and unexplored sector regarding the application of knowledge capturing systems. Hence, to address this issue, a knowledge discovery system called DISKOVER was developed. DISKOVER is an integrated Java-application consisting of five data mining modules, able to be operated through the Internet. Kluster and Q-Fast are two of these modules, entirely developed by the author. Fuzzy-K has been developed by the author in collaboration with another research student in the group at Durham. The final two modules (R-Set and MQG) have been developed by another member of the Durham group. To develop Kluster, a complete clustering methodology was proposed. Kluster is a clustering application able to combine the analysis of quantitative as well as categorical data (conceptual clustering) to establish data classification processes. This module incorporates two original contributions. Specifically, consistent indicators to measure the quality of the final classification and application of optimisation methods to the final groups obtained. Kluster provides the possibility, to users, of introducing case-studies to generate cutting parameters for particular Input requirements. Fuzzy-K is an application having the advantages of hierarchical clustering, while applying fuzzy membership functions to support the generation of similarity measures. The implementation of fuzzy membership functions helped to optimise the grouping of categorical data containing missing or imprecise values. As the tooling database is accessed through the Internet, which is a relatively slow access platform, it was decided to rely on faster Information retrieval mechanisms. Q-fast is an SQL-based exploratory data analysis (EDA) application, Implemented for this purpose

    Improving the Quality and Utility of Electronic Health Record Data through Ontologies

    Get PDF
    The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors’ rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs

    Unlocking the mysterious interconnection between the gut microbiota and catestatin

    Get PDF
    The mammalian gut harbors a diverse microbial community with a vast metabolic capacity, collectively referred to as the gut microbiome. A unique interplay occurs between a subtype of cells of the gut epithelium, namely, enteroendocrine cells and the microbiota. Enteroendocrine cells secrete an extensive number of hormones and bioactive peptides, which play a key role in modulating mucosal immunity, gut motility, and metabolism. Among these bioactive peptides is catestatin (CST), which is derived from the prohormone chromogranin-A. CST has been shown to regulate innate mucosal immunity, gut permeability, hypertension, as well as a potential diabetes treatment.In this thesis, we showed that CST selects for the colonization of gut bacterial communities that are capable of resisting its antimicrobial effect, which in turn, impacts the levels of butyrate and other short-chain fatty acids and subsequently gut motility. Furthermore, we showed via fecal microbial transplantation, that the altered gut microbiota in mouse models with depletion in their CST play a causal role in the colonic dysfunction observed in those mice, characterized by increased gut permeability, fibrosis, and altered immune- and metabolic-related cellular pathways. Overall, this thesis exemplifies the pivotal interplay between the gut microbiota and CST, in the establishment and maintenance of mucosal homeostasis and opens new avenues for microbiota-targeted therapeutics in humans for specific diseases associated with altered levels of CST

    Towards a Metaquery Language for Mining the Web of Data

    No full text
    corecore