756 research outputs found

    Recommender System Based on Process Mining

    Get PDF
    Automation of repetitive tasks can be achieved with Robotic Process Automation (RPA) using scripts that encode fine-grained interactions with software applications on desktops and the web. Automating these processes can be achieved through several applications. It is possible for users to record desktop activity, including metadata, with these tools. The very fine-grained steps in the processes contain details about very small steps that the user takes. Several steps are involved in this process, including clicking on buttons, typing text, selecting the text, and changing the focus. Automating these processes requires connectors connecting them to the appropriate applications. Currently, users choose these connectors manually rather than automatically being linked to processes. In this thesis, we propose a method for recommending the top-k suitable connectors based on event logs for each process. This method indicates that we can use process discovery, create the process models of the train processes with identified connectors, and calculate the conformance checking between the process models and test event logs (unknown connectors). Then we select top-k maximum values of the conformance checking results and observe that we have the suitable connector with 80% accuracy among the top-3 recommended connectors. This solution can be configurable by changing the parameters and the methods of process discovery and conformance checking.Automation of repetitive tasks can be achieved with Robotic Process Automation (RPA) using scripts that encode fine-grained interactions with software applications on desktops and the web. Automating these processes can be achieved through several applications. It is possible for users to record desktop activity, including metadata, with these tools. The very fine-grained steps in the processes contain details about very small steps that the user takes. Several steps are involved in this process, including clicking on buttons, typing text, selecting the text, and changing the focus. Automating these processes requires connectors connecting them to the appropriate applications. Currently, users choose these connectors manually rather than automatically being linked to processes. In this thesis, we propose a method for recommending the top-k suitable connectors based on event logs for each process. This method indicates that we can use process discovery, create the process models of the train processes with identified connectors, and calculate the conformance checking between the process models and test event logs (unknown connectors). Then we select top-k maximum values of the conformance checking results and observe that we have the suitable connector with 80% accuracy among the top-3 recommended connectors. This solution can be configurable by changing the parameters and the methods of process discovery and conformance checking

    Big data analytics for large-scale wireless networks: Challenges and opportunities

    Full text link
    © 2019 Association for Computing Machinery. The wide proliferation of various wireless communication systems and wireless devices has led to the arrival of big data era in large-scale wireless networks. Big data of large-scale wireless networks has the key features of wide variety, high volume, real-time velocity, and huge value leading to the unique research challenges that are different from existing computing systems. In this article, we present a survey of the state-of-art big data analytics (BDA) approaches for large-scale wireless networks. In particular, we categorize the life cycle of BDA into four consecutive stages: Data Acquisition, Data Preprocessing, Data Storage, and Data Analytics. We then present a detailed survey of the technical solutions to the challenges in BDA for large-scale wireless networks according to each stage in the life cycle of BDA. Moreover, we discuss the open research issues and outline the future directions in this promising area

    Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users

    Full text link
    [EN] The electricity sector is currently undergoing a process of liberalization and separation of roles, which is being implemented under the regulatory auspices of each Member State of the European Union and, therefore, with different speeds, perspectives and objectives that must converge on a common horizon, where Europe will benefit from an interconnected energy market in which producers and consumers can participate in free competition. This process of liberalization and separation of roles involves two consequences or, viewed another way, entails a major consequence from which other immediate consequence, as a necessity, is derived. The main consequence is the increased complexity in the management and supervision of a system, the electrical, increasingly interconnected and participatory, with connection of distributed energy sources, much of them from renewable sources, at different voltage levels and with different generation capacity at any point in the network. From this situation the other consequence is derived, which is the need to communicate information between agents, reliably, safely and quickly, and that this information is analyzed in the most effective way possible, to form part of the processes of decision taking that improve the observability and controllability of a system which is increasing in complexity and number of agents involved. With the evolution of Information and Communication Technologies (ICT), and the investments both in improving existing measurement and communications infrastructure, and taking the measurement and actuation capacity to a greater number of points in medium and low voltage networks, the availability of data that informs of the state of the network is increasingly higher and more complete. All these systems are part of the so-called Smart Grids, or intelligent networks of the future, a future which is not so far. One such source of information comes from the energy consumption of customers, measured on a regular basis (every hour, half hour or quarter-hour) and sent to the Distribution System Operators from the Smart Meters making use of Advanced Metering Infrastructure (AMI). This way, there is an increasingly amount of information on the energy consumption of customers, being stored in Big Data systems. This growing source of information demands specialized techniques which can take benefit from it, extracting a useful and summarized knowledge from it. This thesis deals with the use of this information of energy consumption from Smart Meters, in particular on the application of data mining techniques to obtain temporal patterns that characterize the users of electrical energy, grouping them according to these patterns in a small number of groups or clusters, that allow evaluating how users consume energy, both during the day and during a sequence of days, allowing to assess trends and predict future scenarios. For this, the current techniques are studied and, proving that the current works do not cover this objective, clustering or dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users are developed. These techniques are tested and validated on a database of hourly energy consumption values for a sample of residential customers in Spain during years 2008 and 2009. The results allow to observe both the characterization in consumption patterns of the different types of residential energy consumers, and their evolution over time, and to assess, for example, how the regulatory changes that occurred in Spain in the electricity sector during those years influenced in the temporal patterns of energy consumption.[ES] El sector elĂ©ctrico se halla actualmente sometido a un proceso de liberalizaciĂłn y separaciĂłn de roles, que estĂĄ siendo aplicado bajo los auspicios regulatorios de cada Estado Miembro de la UniĂłn Europea y, por tanto, con distintas velocidades, perspectivas y objetivos que deben confluir en un horizonte comĂșn, en donde Europa se beneficiarĂĄ de un mercado energĂ©tico interconectado, en el cual productores y consumidores podrĂĄn participar en libre competencia. Este proceso de liberalizaciĂłn y separaciĂłn de roles conlleva dos consecuencias o, visto de otra manera, conlleva una consecuencia principal de la cual se deriva, como necesidad, otra consecuencia inmediata. La consecuencia principal es el aumento de la complejidad en la gestiĂłn y supervisiĂłn de un sistema, el elĂ©ctrico, cada vez mĂĄs interconectado y participativo, con conexiĂłn de fuentes distribuidas de energĂ­a, muchas de ellas de origen renovable, a distintos niveles de tensiĂłn y con distinta capacidad de generaciĂłn, en cualquier punto de la red. De esta situaciĂłn se deriva la otra consecuencia, que es la necesidad de comunicar informaciĂłn entre los distintos agentes, de forma fiable, segura y rĂĄpida, y que esta informaciĂłn sea analizada de la forma mĂĄs eficaz posible, para que forme parte de los procesos de toma de decisiones que mejoran la observabilidad y controlabilidad de un sistema cada vez mĂĄs complejo y con mĂĄs agentes involucrados. Con el avance de las TecnologĂ­as de InformaciĂłn y Comunicaciones (TIC), y las inversiones tanto en mejora de la infraestructura existente de medida y comunicaciones, como en llevar la obtenciĂłn de medidas y la capacidad de actuaciĂłn a un mayor nĂșmero de puntos en redes de media y baja tensiĂłn, la disponibilidad de datos sobre el estado de la red es cada vez mayor y mĂĄs completa. Todos estos sistemas forman parte de las llamadas Smart Grids, o redes inteligentes del futuro, un futuro ya no tan lejano. Una de estas fuentes de informaciĂłn proviene de los consumos energĂ©ticos de los clientes, medidos de forma periĂłdica (cada hora, media hora o cuarto de hora) y enviados hacia las Distribuidoras desde los contadores inteligentes o Smart Meters, mediante infraestructura avanzada de medida o Advanced Metering Infrastructure (AMI). De esta forma, cada vez se tiene una mayor cantidad de informaciĂłn sobre los consumos energĂ©ticos de los clientes, almacenada en sistemas de Big Data. Esta cada vez mayor fuente de informaciĂłn demanda tĂ©cnicas especializadas que sepan aprovecharla, extrayendo un conocimiento Ăștil y resumido de la misma. La presente Tesis doctoral versa sobre el uso de esta informaciĂłn de consumos energĂ©ticos de los contadores inteligentes, en concreto sobre la aplicaciĂłn de tĂ©cnicas de minerĂ­a de datos (data mining) para obtener patrones temporales que caractericen a los usuarios de energĂ­a elĂ©ctrica, agrupĂĄndolos segĂșn estos mismos patrones en un nĂșmero reducido de grupos o clusters, que permiten evaluar la forma en que los usuarios consumen la energĂ­a, tanto a lo largo del dĂ­a como durante una secuencia de dĂ­as, permitiendo evaluar tendencias y predecir escenarios futuros. Para ello se estudian las tĂ©cnicas actuales y, comprobando que los trabajos actuales no cubren este objetivo, se desarrollan tĂ©cnicas de clustering o segmentaciĂłn dinĂĄmica aplicadas a curvas de carga de consumo elĂ©ctrico diario de clientes domĂ©sticos. Estas tĂ©cnicas se prueban y validan sobre una base de datos de consumos energĂ©ticos horarios de una muestra de clientes residenciales en España durante los años 2008 y 2009. Los resultados permiten observar tanto la caracterizaciĂłn en consumos de los distintos tipos de consumidores energĂ©ticos residenciales, como su evoluciĂłn en el tiempo, y permiten evaluar, por ejemplo, cĂłmo influenciaron en los patrones temporales de consumos los cambios regulatorios que se produjeron en España en el sector elĂ©ctrico durante esos años.[CA] El sector elĂšctric es troba actualment sotmĂšs a un procĂ©s de liberalitzaciĂł i separaciĂł de rols, que s'estĂ  aplicant davall els auspicis reguladors de cada estat membre de la UniĂł Europea i, per tant, amb distintes velocitats, perspectives i objectius que han de confluir en un horitzĂł comĂș, on Europa es beneficiarĂ  d'un mercat energĂštic interconnectat, en el qual productors i consumidors podran participar en lliure competĂšncia. Aquest procĂ©s de liberalitzaciĂł i separaciĂł de rols comporta dues conseqĂŒĂšncies o, vist d'una altra manera, comporta una conseqĂŒĂšncia principal de la qual es deriva, com a necessitat, una altra conseqĂŒĂšncia immediata. La conseqĂŒĂšncia principal Ă©s l'augment de la complexitat en la gestiĂł i supervisiĂł d'un sistema, l'elĂšctric, cada vegada mĂ©s interconnectat i participatiu, amb connexiĂł de fonts distribuĂŻdes d'energia, moltes d'aquestes d'origen renovable, a distints nivells de tensiĂł i amb distinta capacitat de generaciĂł, en qualsevol punt de la xarxa. D'aquesta situaciĂł es deriva l'altra conseqĂŒĂšncia, que Ă©s la necessitat de comunicar informaciĂł entre els distints agents, de forma fiable, segura i rĂ pida, i que aquesta informaciĂł siga analitzada de la manera mĂ©s eficaç possible, perquĂš forme part dels processos de presa de decisions que milloren l'observabilitat i controlabilitat d'un sistema cada vegada mĂ©s complex i amb mĂ©s agents involucrats. Amb l'avanç de les tecnologies de la informaciĂł i les comunicacions (TIC), i les inversions, tant en la millora de la infraestructura existent de mesura i comunicacions, com en el trasllat de l'obtenciĂł de mesures i capacitat d'actuaciĂł a un nombre mĂ©s gran de punts en xarxes de mitjana i baixa tensiĂł, la disponibilitat de dades sobre l'estat de la xarxa Ă©s cada vegada major i mĂ©s completa. Tots aquests sistemes formen part de les denominades Smart Grids o xarxes intel·ligents del futur, un futur ja no tan llunyĂ . Una d'aquestes fonts d'informaciĂł provĂ© dels consums energĂštics dels clients, mesurats de forma periĂČdica (cada hora, mitja hora o quart d'hora) i enviats cap a les distribuĂŻdores des dels comptadors intel·ligents o Smart Meters, per mitjĂ  d'infraestructura avançada de mesura o Advanced Metering Infrastructure (AMI). D'aquesta manera, cada vegada es tĂ© una major quantitat d'informaciĂł sobre els consums energĂštics dels clients, emmagatzemada en sistemes de Big Data. Aquesta cada vegada major font d'informaciĂł demanda tĂšcniques especialitzades que sĂ piguen aprofitar-la, extraient-ne un coneixement Ăștil i resumit. La present tesi doctoral versa sobre l'Ășs d'aquesta informaciĂł de consums energĂštics dels comptadors intel·ligents, en concret sobre l'aplicaciĂł de tĂšcniques de mineria de dades (data mining) per a obtenir patrons temporals que caracteritzen els usuaris d'energia elĂšctrica, agrupant-los segons aquests mateixos patrons en una quantitat reduĂŻda de grups o clusters, que permeten avaluar la forma en quĂš els usuaris consumeixen l'energia, tant al llarg del dia com durant una seqĂŒĂšncia de dies, i que permetent avaluar tendĂšncies i predir escenaris futurs. Amb aquesta finalitat, s'estudien les tĂšcniques actuals i, en comprovar que els treballs actuals no cobreixen aquest objectiu, es desenvolupen tĂšcniques de clustering o segmentaciĂł dinĂ mica aplicades a corbes de cĂ rrega de consum elĂšctric diari de clients domĂšstics. Aquestes tĂšcniques es proven i validen sobre una base de dades de consums energĂštics horaris d'una mostra de clients residencials a Espanya durant els anys 2008 i 2009. Els resultats permeten observar tant la caracteritzaciĂł en consums dels distints tipus de consumidors energĂštics residencials, com la seua evoluciĂł en el temps, i permeten avaluar, per exemple, com van influenciar en els patrons temporals de consums els canvis reguladors que es van produir a Espanya en el sector elĂšctric durant aquests anys.BenĂ­tez SĂĄnchez, IJ. (2015). Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users [Tesis doctoral no publicada]. Universitat PolitĂšcnica de ValĂšncia. https://doi.org/10.4995/Thesis/10251/59236TESI

    Data-driven conceptual modeling: how some knowledge drivers for the enterprise might be mined from enterprise data

    Get PDF
    As organizations perform their business, they analyze, design and manage a variety of processes represented in models with different scopes and scale of complexity. Specifying these processes requires a certain level of modeling competence. However, this condition does not seem to be balanced with adequate capability of the person(s) who are responsible for the task of defining and modeling an organization or enterprise operation. On the other hand, an enterprise typically collects various records of all events occur during the operation of their processes. Records, such as the start and end of the tasks in a process instance, state transitions of objects impacted by the process execution, the message exchange during the process execution, etc., are maintained in enterprise repositories as various logs, such as event logs, process logs, effect logs, message logs, etc. Furthermore, the growth rate in the volume of these data generated by enterprise process execution has increased manyfold in just a few years. On top of these, models often considered as the dashboard view of an enterprise. Models represents an abstraction of the underlying reality of an enterprise. Models also served as the knowledge driver through which an enterprise can be managed. Data-driven extraction offers the capability to mine these knowledge drivers from enterprise data and leverage the mined models to establish the set of enterprise data that conforms with the desired behaviour. This thesis aimed to generate models or knowledge drivers from enterprise data to enable some type of dashboard view of enterprise to provide support for analysts. The rationale for this has been started as the requirement to improve an existing process or to create a new process. It was also mentioned models can also serve as a collection of effectors through which an organization or an enterprise can be managed. The enterprise data refer to above has been identified as process logs, effect logs, message logs, and invocation logs. The approach in this thesis is to mine these logs to generate process, requirement, and enterprise architecture models, and how goals get fulfilled based on collected operational data. The above a research question has been formulated as whether it is possible to derive the knowledge drivers from the enterprise data, which represent the running operation of the enterprise, or in other words, is it possible to use the available data in the enterprise repository to generate the knowledge drivers? . In Chapter 2, review of literature that can provide the necessary background knowledge to explore the above research question has been presented. Chapter 3 presents how process semantics can be mined. Chapter 4 suggest a way to extract a requirements model. The Chapter 5 presents a way to discover the underlying enterprise architecture and Chapter 6 presents a way to mine how goals get orchestrated. Overall finding have been discussed in Chapter 7 to derive some conclusions

    Developing Cyberspace Data Understanding: Using CRISP-DM for Host-based IDS Feature Mining

    Get PDF
    Current intrusion detection systems generate a large number of specific alerts, but do not provide actionable information. Many times, these alerts must be analyzed by a network defender, a time consuming and tedious task which can occur hours or days after an attack occurs. Improved understanding of the cyberspace domain can lead to great advancements in Cyberspace situational awareness research and development. This thesis applies the Cross Industry Standard Process for Data Mining (CRISP-DM) to develop an understanding about a host system under attack. Data is generated by launching scans and exploits at a machine outfitted with a set of host-based data collectors. Through knowledge discovery, features are identified within the data collected which can be used to enhance host-based intrusion detection. By discovering relationships between the data collected and the events, human understanding of the activity is shown. This method of searching for hidden relationships between sensors greatly enhances understanding of new attacks and vulnerabilities, bolstering our ability to defend the cyberspace domain

    A Survey on Concept Drift Adaptation

    Get PDF
    Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art

    Mining app reviews to support software engineering

    Get PDF
    The thesis studies how mining app reviews can support software engineering. App reviews —short user reviews of an app in app stores— provide a potentially rich source of information to help software development teams maintain and evolve their products. Exploiting this information is however difficult due to the large number of reviews and the difficulty in extracting useful actionable information from short informal texts. A variety of app review mining techniques have been proposed to classify reviews and to extract information such as feature requests, bug descriptions, and user sentiments but the usefulness of these techniques in practice is still unknown. Research in this area has grown rapidly, resulting in a large number of scientific publications (at least 182 between 2010 and 2020) but nearly no independent evaluation and description of how diverse techniques fit together to support specific software engineering tasks have been performed so far. The thesis presents a series of contributions to address these limitations. We first report the findings of a systematic literature review in app review mining exposing the breadth and limitations of research in this area. Using findings from the literature review, we then present a reference model that relates features of app review mining tools to specific software engineering tasks supporting requirements engineering, software maintenance and evolution. We then present two additional contributions extending previous evaluations of app review mining techniques. We present a novel independent evaluation of opinion mining techniques using an annotated dataset created for our experiment. Our evaluation finds lower effectiveness than initially reported by the techniques authors. A final part of the thesis, evaluates approaches in searching for app reviews pertinent to a particular feature. The findings show a general purpose search technique is more effective than the state-of-the-art purpose-built app review mining techniques; and suggest their usefulness for requirements elicitation. Overall, the thesis contributes to improving the empirical evaluation of app review mining techniques and their application in software engineering practice. Researchers and developers of future app mining tools will benefit from the novel reference model, detailed experiments designs, and publicly available datasets presented in the thesis
    • 

    corecore