249 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationBiomedical data are a rich source of information and knowledge. Not only are they useful for direct patient care, but they may also offer answers to important population-based questions. Creating an environment where advanced analytics can be performed against biomedical data is nontrivial, however. Biomedical data are currently scattered across multiple systems with heterogeneous data, and integrating these data is a bigger task than humans can realistically do by hand; therefore, automatic biomedical data integration is highly desirable but has never been fully achieved. This dissertation introduces new algorithms that were devised to support automatic and semiautomatic integration of heterogeneous biomedical data. The new algorithms incorporate both data mining and biomedical informatics techniques to create "concept bags" that are used to compute similarity between data elements in the same way that "word bags" are compared in data mining. Concept bags are composed of controlled medical vocabulary concept codes that are extracted from text using named-entity recognition software. To test the new algorithm, three biomedical text similarity use cases were examined: automatically aligning data elements between heterogeneous data sets, determining degrees of similarity between medical terms using a published benchmark, and determining similarity between ICU discharge summaries. The method is highly configurable and 5 different versions were tested. The concept bag method performed particularly well aligning data elements and outperformed the compared algorithms by iv more than 5%. Another configuration that included hierarchical semantics performed particularly well at matching medical terms, meeting or exceeding 30 of 31 other published results using the same benchmark. Results for the third scenario of computing ICU discharge summary similarity were less successful. Correlations between multiple methods were low, including between terminologists. The concept bag algorithms performed consistently and comparatively well and appear to be viable options for multiple scenarios. New applications of the method and ideas for improving the algorithm are being discussed for future work, including several performance enhancements, configuration-based enhancements, and concept vector weighting using the TF-IDF formulas

    Use of IBM Collaborative Lifecycle Management Solution to Demonstrate Traceability for Small, Real-World Software Development Project

    Get PDF
    The Standish Group Study of 1994 showed that 53 percent of software projects failed outright and another 31 percent were challenged by extreme budget and/or time overrun. Since then different responses to the high rate of software project failures have been proposed. SEI’s CMMI, the ISO’s 9001:2000 for software development, and the IEEE’s JSTD-016 are some examples of such responses. Traceability is the one common feature that these software development standards impose. Over the last decade, software and system engineering communities have been researching subjects such as developing more sophisticated tooling, applying information retrieval techniques capable of semi-automating the trace creation and maintenance process, developing new trace query languages and visualization techniques that use trace links, applying traceability in specific domains such as Model Driven Development, product line systems and agile project environment. These efforts have not been in vain. The 2012 CHAOS results show an increase in project success rate of 39% (delivered on time, on budget, with required features and functions), and a decrease of 18% in the number of failures (cancelled prior to completion or delivered and never used). Since research has shown traceability can improve a project’s success rate, the main purpose of this thesis is to demonstrate traceability for a small, real-world software development project using IBM Collaborative Lifecycle Management. The objective of this research was fulfilled since the case study of traceability was described in detail as applied to the design and development of the Value Adjustment Board Project (VAB) of City of Jacksonville using the scrum development approach within the IBM Rational Collaborative Lifecycle Management Solution. The results may benefit researchers and practitioners who are looking for evidence to use the IBM CLM solution to trace artifacts in a small project

    Semantic multimedia modelling & interpretation for annotation

    Get PDF
    The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems

    Metainformation scenarios in Digital Humanities: Characterization and conceptual modelling strategies

    Get PDF
    Requirements for the analysis, interpretation and reuse of information are becoming more and more ambitious as we generate larger and more complex datasets. This is leading to the development and widespread use of information about information, often called metainformation (or metadata) in most disciplines. The Digital Humanities are not an exception. We often assume that metainformation helps us in documenting information for future reference by recording who has created it, when and how, among other aspects. We also assume that recording metainformation will facilitate the tasks of interpreting information at later stages. However, some works have identified some issues with existing metadata approaches, related to 1) the proliferation of too many “standards” and difficulties to choose between them; 2) the generalized assumption that metadata and data (or metainformation and information) are essentially different, and the subsequent development of separate sets of languages and tools for each (introducing redundant models); and 3) the combination of conceptual and implementation concerns within most approaches, violating basic engineering principles of modularity and separation of concerns. Some of these problems are especially relevant in Digital Humanities. In addition, we argue here that the lack of characterization of the scenarios in which metainformation plays a relevant role in humanistic projects often results in metainformation being recorded and managed without a specific purpose in mind. In turn, this hinders the process of decision making on issues such as what metainformation must be recorded in a specific project, and how it must be conceptualized, stored and managed. This paper presents a review of the most used metadata approaches in Digital Humanities and, taking a conceptual modelling perspective, analyses their major issues as outlined above. It also describes what the most common scenarios for the use of metainformation in Digital Humanities are, presenting a characterization that can assist in the setting of goals for metainformation recording and management in each case. Based on these two aspects, a new approach is proposed for the conceptualization, recording and management of metainformation in the Digital Humanities, using the ConML conceptual modelling language, and adopting the overall view that metainformation is not essentially different to information. The proposal is validated in Digital Humanities scenarios through case studies employing real-world datasetsThis work was partially supported by Spanish Ministry of Economy, Industry and Competitiveness under its Competitive Juan de la Cierva Postdoctoral Research Programme (FJCI-2016-28032)S

    TEXTUAL DATA MINING FOR NEXT GENERATION INTELLIGENT DECISION MAKING IN INDUSTRIAL ENVIRONMENT: A SURVEY

    Get PDF
    This paper proposes textual data mining as a next generation intelligent decision making technology for sustainable knowledge management solutions in any industrial environment. A detailed survey of applications of Data Mining techniques for exploiting information from different data formats and transforming this information into knowledge is presented in the literature survey. The focus of the survey is to show the power of different data mining techniques for exploiting information from data. The literature surveyed in this paper shows that intelligent decision making is of great importance in many contexts within manufacturing, construction and business generally. Business intelligence tools, which can be interpreted as decision support tools, are of increasing importance to companies for their success within competitive global markets. However, these tools are dependent on the relevancy, accuracy and overall quality of the knowledge on which they are based and which they use. Thus the research work presented in the paper uncover the importance and power of different data mining techniques supported by text mining methods used to exploit information from semi-structured or un-structured data formats. A great source of information is available in these formats and when exploited by combined efforts of data and text mining tools help the decision maker to take effective decision for the enhancement of business of industry and discovery of useful knowledge is made for next generation of intelligent decision making. Thus the survey shows the power of textual data mining as the next generation technology for intelligent decision making in the industrial environment

    Semi-Automated Development of Conceptual Models from Natural Language Text

    Get PDF
    The process of converting natural language specifications into conceptual models requires detailed analysis of natural language text, and designers frequently make mistakes when undertaking this transformation manually. Although many approaches have been used to help designers translate natural language text into conceptual models, each approach has its limitations. One of the main limitations is the lack of a domain-independent ontology that can be used as a repository for entities and relationships, thus guiding the transition from natural language processing into a conceptual model. Such an ontology is not currently available because it would be very difficult and time consuming to produce. In this thesis, a semi-automated system for mapping natural language text into conceptual models is proposed. The model, which is called SACMES, combines a linguistic approach with an ontological approach and human intervention to achieve the task. The model learns from the natural language specifications that it processes, and stores the information that is learnt in a conceptual model ontology and a user history knowledge database. It then uses the stored information to improve performance and reduce the need for human intervention. The evaluation conducted on SACMES demonstrates that (1) designers’ creation of conceptual models is improved when using the system comparing with not using any system, and that (2) the performance of the system is improved by processing more natural language requirements, and thus, the need for human intervention has decreased. However, these advantages may be improved further through development of the learning and retrieval techniques used by the system

    Proceedings

    Get PDF
    Proceedings of the Workshop CHAT 2011: Creation, Harmonization and Application of Terminology Resources. Editors: Tatiana Gornostay and Andrejs Vasiļjevs. NEALT Proceedings Series, Vol. 12 (2011). © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16956
    corecore