60 research outputs found

    Big Data Security (Volume 3)

    Get PDF
    After a short description of the key concepts of big data the book explores on the secrecy and security threats posed especially by cloud based data storage. It delivers conceptual frameworks and models along with case studies of recent technology

    MevaL: A Visual Machine Learning Model Evaluation Tool for Financial Crime Detection

    Get PDF
    Data Science and Machine Learning are two valuable allies to fight financial crime,the domain where Feedzai seeks to leverage its value proposition in support of its mission:to make banking and commerce safe. Data is at the core of both fields and this domain, sostructuring instances for visual consumption provides an effective way of understandingthe data and communicating insights.The development of a solution for each project and use case requires a careful andeffective Machine Learning Model Evaluation stage, as it is the major source of feedbackbefore deployment. The tooling for this stage available at Feedzai can be improved,accelerated, visually supported, and diversified to enable data scientists to boost theirdaily work and the quality of the models.In this work, I propose to collect and compile internal and external input, in terms ofworkflow and Model Evaluation, in a proposal hierarchically segmented by well-definedobjectives and tasks, to instantiate the proposal in a Python package, and to iteratively val-idate the package with Feedzai’s data scientists. Therefore, the first contribution is MevaL,a Python package for Model Evaluation with visual support, integrated into Feedzai’s DataScience environment by design. In fact, MevaL is already being leveraged as a visualization package on two internal reporting projects that are serving some of Feedzai’s majorclients.In addition to MevaL, the second contribution of this work is the Model EvaluationTopology developed to ensure clear communication and design of features.A Ciência de Dados e a Aprendizagem Automática [277] são duas valiosas aliadas no combate à criminalidade económico-financeira, o domínio em que a Feedzai procura potenciar a sua proposta de valor em prol da sua missão: tornar o sistema bancário e o comércio seguros. Além disso, os dados estão no centro das duas áreas e deste domínio.Assim, a estruturação visual dos mesmos fornece uma maneira eficaz de os entender e transmitir informação.O desenvolvimento de uma solução para cada projeto e caso de uso requer um estágiocuidadoso e eficaz de Avaliação de Modelos de Aprendizagem Automática, pois esteestágio coincide com a principal fonte de retorno (feedback) antes da implementaçãoda solução. As ferramentas de Avaliação de Modelos disponíveis na Feedzai podem seraprimoradas, aceleradas, suportadas visualmente e diversificadas para permitir que oscientistas de dados impulsionem o seu trabalho diário e a qualidade destes modelos.Neste trabalho, proponho a recolha e compilação de informação interna e externa, em termos de fluxo de trabalho e Avaliação de Modelos, numa proposta hierarquicamente segmentada por objetivos e tarefas bem definidas, a instanciação desta proposta num pacote Python e a validação iterativa deste pacote em colaboração com os cientistas de dados da Feedzai. Posto isto, a primeira contribuição deste trabalho é o MevaL, um pacote Python para Avaliação de Modelos com suporte visual, integrado no ambiente de Ciência de Dados da Feedzai. Na verdade, o MevaL já está a ser utilizado como um pacote de visualização em dois projetos internos de preparação de relatórios automáticos para alguns dos principais clientes da Feedzai.Além do MevaL, a segunda contribuição deste trabalho é a Topologia de Avaliação de Modelos desenvolvida para garantir uma comunicação clara e o design enquadrado das diferentes funcionalidades

    Multi-sensor Evolution Analysis: an advanced GIS for interactive time series analysis and modelling based on satellite data

    Get PDF
    Archives of Earth remote sensing data, acquired from orbiting satellites, contain large amounts of information that can be used both for research activities and decision support. Thematic categorization is one method to extract from satellite data meaningful information that humans can directly comprehend. An interactive system that permits to analyse geo-referenced thematic data and its evolution over time is proposed as a tool to efficiently exploit such vast and growing amount of data. This thesis describes the approach used in building the system, the data processing methodology, details architectural elements and graphical interfaces. Finally, this thesis provides an evaluation of potential uses of the features provided, performance levels and usability of an implementation hosting an archive of 15 years moderate resolution (1 Km, from the ATSR instrument) thematic data

    Lääketieteellisen tiedon visualisointi web-teknologioin

    Get PDF
    Over the years, the medical expenses caused by inpatients have been increasing all around the world. One reason for this is increase in treatment costs. Some new medicines and methods may be cheaper, while some can cost significantly more. Unless hospital funding is increased accordingly, ways to save money must be found. One way to reduce the costs and free hospital resources is to discharge patient as soon as possible. Problem in this is how to do it without risking the patient’s health and possibly life by discharging her too early. After patients are discharged, they also are almost invisible to hospital systems, making it hard to monitor their recovery. Traditionally monitoring the recovery means patient visiting the hospital for checkup, or possibly counselling via phone. Checkups, however, can put strain on the patient especially if she lives further away from hospital, while through phone counselling it is hard to get full grasp on the situation. To solve these problems, patient remote monitoring solutions have been developed. Remote monitoring allows the patient to stay at home while a doctor or a nurse monitors her vital signs and other data at the hospital. In this thesis, a patient remote monitoring application for demonstration purposes was implemented using modern web technologies. The application was written mostly in JavaScript, using AngularJS framework, cascading style sheets and HTML5’s canvas element and server-sent events. Bootstrap CSS and JavaScript framework was also used to some extent. A generic Internet of Things cloud was used to store data and to retrieve it. Additionally, the data sent to the cloud is relayed to the application using server-sent events. For the evaluation, mocked sensor data was sent to the cloud back-end. The results were mostly positive. In extreme situation AngularJS begun to slow down and depending on the platform and setup, the CPU usage of the canvas rendering using the custom visualization library was considered too high. The end-to-end latency was mostly good, though because of occasional latency spikes, it cannot be used in critical situations where latency must be consistent. Overall, the application and the end-to-end system worked well. The work can be considered successful. Even though the application has some performance issues, some of them are very unlikely to occur in real usage. Therefore, it can be said that the application performed well and achieved its goal. It provides a good basis for further development and optimizations and proves that web technologies can be used in medical domain, possibly even in hospital wards

    Glosarium Matematika

    Get PDF
    273 p.; 24 cm

    Glosarium Matematika

    Get PDF

    Deep R Programming

    Full text link
    Deep R Programming is a comprehensive course on one of the most popular languages in data science (statistical computing, graphics, machine learning, data wrangling and analytics). It introduces the base language in-depth and is aimed at ambitious students, practitioners, and researchers who would like to become independent users of this powerful environment. This textbook is a non-profit project. Its online and PDF versions are freely available at . This early draft is distributed in the hope that it will be useful.Comment: Draft: v0.2.1 (2023-04-27

    Contributions to the science of controlled transformation

    Get PDF
    writing completed in april 2013My research activities pertain to "Informatics" and in particular "Interactive Graphics" i.e. dynamic graphics on a 2D screen that a user can interact with by means of input devices such as a mouse or a multitouch surface. I have conducted research on Interactive Graphics along three themes: interactive graphics development (how should developers design the architecture of the code corresponding to graphical interactions?), interactive graphic design (what graphical interactions should User Experience (UX) specialists use in their system?) and interactive graphics design process (how should UX specialists design? Which method should they apply?) I invented the MDPC architecture that relies on Picking views and Inverse transforms. This improves the modularity of programs and improves the usability of the specification and the implementation of interactive graphics thanks to the simplification of description. In order to improve the performance of rich-graphic software using this architecture, I explored the concepts of graphical compilers and led a PhD thesis on the topic. The thesis explored the approach and contributed both in terms of description simplification and of software engineering facilitation. Finally, I have applied the simplification of description principles to the problem of shape covering avoidance by relying on new efficient hardware support for parallelized and memory-based algorithms. Together with my colleagues, we have explored the design and assessment of expanding targets, animation and sound, interaction with numerous tangled trajectories, multi-user interaction and tangible interaction. I have identified and defined Structural Interaction, a new interaction paradigm that follows the steps of the direct and instrumental interaction paradigms. I directed a PhD thesis on this topic and together with my student we designed and assessed interaction techniques for structural interaction. I was involved in the design of the "Technology Probes" concept i.e. runnable prototypes to feed the design process. Together with colleagues, I designed VideoProbe, one such Technology Probe. I became interested in more conceptual tools targeted at graphical representation. I led two PhD theses on the topic and explored the characterization of visualization, how to design representations with visual variables or ecological perception and how to design visual interfaces to improve visual scanning. I discovered that those conceptual tools could be applied to programming languages and showed how the representation of code, be it textual or "visual" undergoes visual perception phenomena. This has led me to consider our discipline as the "Science of Controlled Transformations". The fifth chapter is an attempt at providing this new account of "Informatics" based on what users, programmers and researchers actually do with interactive systems. I also describe how my work can be considered as contributing to the science of controlled transformations

    Gridfields: Model-Driven Data Transformation in the Physical Sciences

    Get PDF
    Scientists\u27 ability to generate and store simulation results is outpacing their ability to analyze them via ad hoc programs. We observe that these programs exhibit an algebraic structure that can be used to facilitate reasoning and improve performance. In this dissertation, we present a formal data model that exposes this algebraic structure, then implement the model, evaluate it, and use it to express, optimize, and reason about data transformations in a variety of scientific domains. Simulation results are defined over a logical grid structure that allows a continuous domain to be represented discretely in the computer. Existing approaches for manipulating these gridded datasets are incomplete. The performance of SQL queries that manipulate large numeric datasets is not competitive with that of specialized tools, and the up-front effort required to deploy a relational database makes them unpopular for dynamic scientific applications. Tools for processing multidimensional arrays can only capture regular, rectilinear grids. Visualization libraries accommodate arbitrary grids, but no algebra has been developed to simplify their use and afford optimization. Further, these libraries are data dependent—physical changes to data characteristics break user programs. We adopt the grid as a first-class citizen, separating topology from geometry and separating structure from data. Our model is agnostic with respect to dimension, uniformly capturing, for example, particle trajectories (1-D), sea-surface temperatures (2-D), and blood flow in the heart (3-D). Equipped with data, a grid becomes a gridfield. We provide operators for constructing, transforming, and aggregating gridfields that admit algebraic laws useful for optimization. We implement the model by analyzing several candidate data structures and incorporating their best features. We then show how to deploy gridfields in practice by injecting the model as middleware between heterogeneous, ad hoc file formats and a popular visualization library. In this dissertation, we define, develop, implement, evaluate and deploy a model of gridded datasets that accommodates a variety of complex grid structures and a variety of complex data products. We evaluate the applicability and performance of the model using datasets from oceanography, seismology, and medicine and conclude that our model-driven approach offers significant advantages over the status quo
    corecore