3 research outputs found

    Enhancing Natural-Hazard Exposure Modeling Using Natural Language Processing: a Case-Study for Maltese Planning Applications

    Get PDF
    The algorithmic processing of written language for tools such as predictive text, sentiment analysis, and translation services has become commonplace. The segment of computer science concerned with the interpretation of human language, NLP (Natural Language Processing), is a versatile and fast-developing field. In this paper, NLP is deployed unconventionally to gather insights into a building's multi-hazard exposure characteristics consistent with the GED4ALL attributes. NLP is used in this study to "read" the contents of digitally-submitted planning applications made on the Maltese archipelago. Maltese architects/engineers submit a concise but detailed description of the proposed works on any given site as part of a planning process. It is suggested that valuable insights exist within this description that can assist in classifying buildings within the bounds of the GED4ALL taxonomy. NLP can be used to layer additional, building-by-building information onto existing exposure models based on more conventional data. Although the results of this study are preliminary, NLP may prove a valuable tool for enhancing exposure modeling for multi-hazard risk quantification and management

    Can a Machine Replace Humans in Building Regular Expressions? A Case Study

    Get PDF
    Regular expressions are routinely used in a variety of different application domains. But building a regular expression involves a considerable amount of skill, expertise, and creativity. In this work, the authors investigate whether a machine can surrogate these qualities and automatically construct regular expressions for tasks of realistic complexity. They discuss a large-scale experiment involving more than 1,700 users on 10 challenging tasks. The authors compare the solutions constructed by these users to those constructed by a tool based on genetic programming that they recently developed and made publicly available. The quality of automatically constructed solutions turned out to be similar to the quality of those constructed by the most skilled user group; the time for automatic construction was likewise similar to the time required by human users

    Doctor of Philosophy

    Get PDF
    dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone
    corecore