2,605 research outputs found

    Towards a Business Process Complexity Analysis Framework Based on Textual Data and Event Logs

    Get PDF
    Being an established discipline, Business Process Management (BPM) confronts various challenges related to digitization and rapid penetration of technologies into business processes (BPs). As a result, both generated and used data, such as textual data and event logs, grow exponentially, complicating the decision-making. Event logs are typically used to analyze BPs from several perspectives, including complexity. Recent approaches to BP complexity analyses focus on BP models and event logs, limiting the consideration of textual data. Hence, we propose a BP complexity analysis framework combining textual data and event logs. The framework has been conceptualized based on the IT Service Management (ITSM) case study of an international telecom provider and further developed in the IT department of an academic institution. The latter has also been used to investigate the value of the framework. Our preliminary findings show that the framework can enable comprehensive process redesign and improvements

    Durham - a word sense disambiguation system

    Get PDF
    Ever since the 1950's when Machine Translation first began to be developed, word sense disambiguation (WSD) has been considered a problem to developers. In more recent times, all NLP tasks which are sensitive to lexical semantics potentially benefit from WSD although to what extent is largely unknown. The thesis presents a novel approach to the task of WSD on a large scale. In particular a novel knowledge source is presented named contextual information. This knowledge source adopts a sub-symbolic training mechanism to learn information from the context of a sentence which is able to aid disambiguation. The system also takes advantage of frequency information and these two knowledge sources are combined. The system is trained and tested on SEMCOR. A novel disambiguation algorithm is also developed. The algorithm must tackle the problem of a large possible number of sense combinations in a sentence. The algorithm presented aims to make an appropriate choice between accuracy and efficiency. This is performed by directing the search at a word level. The performance achieved on SEMCOR is reported and an analysis of the various components of the system is performed. The results achieved on this test data are pleasing, but are difficult to compare with most of the other work carried out in the field. For this reason the system took part in the SENSEVAL evaluation which provided an excellent opportunity to extensively compare WSD systems. SENSEVAL is a small scale WSD evaluation using the HECTOR lexicon. Despite this, few adaptations to the system were required. The performance of the system on the SENSEVAL task are reported and have also been presented in [Hawkins, 2000]

    Automated Readability Assessment for Spanish e-Government Information

    Get PDF
    This paper automatically evaluates the readability of Spanish e-government websites. Specifically, the websites collected explain e-government administrative procedures. The evaluation is carried out through the analysis of different linguistic characteristics that are presumably associated with a better understanding of these resources. To this end, texts from websites outside the government websites have been collected. These texts clarify the procedures published on the Spanish Government"s websites. These websites constitute the part of the corpus considered as the set of easy documents. The rest of the corpus has been completed with counterpart documents from government websites. The text of the documents has been processed, and the difficulty is evaluated through different classic readability metrics. At a later stage, automatic learning methods are used to apply algorithms to predict the difficulty of the text. The results of the study show that government web pages show high values for comprehension difficulty. This work proposes a new Spanish-language corpus of official e-government websites. In addition, a large number of combined linguistic attributes are applied, which improve the identification of the level of comprehensibility of a text with respect to classic metrics.Work supported by the Spanish Ministry of Economy, Industry and Competitiveness, (CSO2017-86747-R)

    State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

    Get PDF
    Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

    Three Essays on Big Data Consumer Analytics in E-Commerce

    Get PDF
    Consumers are increasingly spending more time and money online. Business to consumer e-commerce is growing on average of 20 percent each year and has reached 1.5 trillion dollars globally in 2014. Given the scale and growth of consumer online purchase and usage data, firms\u27 ability to understand and utilize this data is becoming an essential competitive strategy. But, large-scale data analytics in e-commerce is still at its nascent stage and there is much to be learned in all aspects of e-commerce. Successful analytics on big data often require a combination of both data mining and econometrics: data mining to reduce or structure (from unstructured data such as text, photo, and video) large-scale data and econometric analyses to truly understand and assign causality to interesting patterns. In my dissertation, I study how firms can better utilize big data analytics and specific applications of machine learning techniques for improved e-commerce using theory-driven econometrical and experimental studies. I show that e-commerce managers can now formulate data-driven strategies for many aspect of business including cross-selling via recommenders on sales sites to increasing brand awareness and leads via social media content-engineered-marketing. These results are readily actionable with far-reaching economical consequences

    Doctor of Philosophy

    Get PDF
    dissertationManual annotation of clinical texts is often used as a method of generating reference standards that provide data for training and evaluation of Natural Language Processing (NLP) systems. Manually annotating clinical texts is time consuming, expensive, and requires considerable cognitive effort on the part of human reviewers. Furthermore, reference standards must be generated in ways that produce consistent and reliable data but must also be valid in order to adequately evaluate the performance of those systems. The amount of labeled data necessary varies depending on the level of analysis, the complexity of the clinical use case, and the methods that will be used to develop automated machine systems for information extraction and classification. Evaluating methods that potentially reduce cost, manual human workload, introduce task efficiencies, and reduce the amount of labeled data necessary to train NLP tools for specific clinical use cases are active areas of research inquiry in the clinical NLP domain. This dissertation integrates a mixed methods approach using methodologies from cognitive science and artificial intelligence with manual annotation of clinical texts. Aim 1 of this dissertation identifies factors that affect manual annotation of clinical texts. These factors are further explored by evaluating approaches that may introduce efficiencies into manual review tasks applied to two different NLP development areas - semantic annotation of clinical concepts and identification of information representing Protected Health Information (PHI) as defined by HIPAA. Both experiments integrate iv different priming mechanisms using noninteractive and machine-assisted methods. The main hypothesis for this research is that integrating pre-annotation or other machineassisted methods within manual annotation workflows will improve efficiency of manual annotation tasks without diminishing the quality of generated reference standards
    • …
    corecore