642 research outputs found

    Structuring visual exploratory analysis of skill demand

    No full text
    The analysis of increasingly large and diverse data for meaningful interpretation and question answering is handicapped by human cognitive limitations. Consequently, semi-automatic abstraction of complex data within structured information spaces becomes increasingly important, if its knowledge content is to support intuitive, exploratory discovery. Exploration of skill demand is an area where regularly updated, multi-dimensional data may be exploited to assess capability within the workforce to manage the demands of the modern, technology- and data-driven economy. The knowledge derived may be employed by skilled practitioners in defining career pathways, to identify where, when and how to update their skillsets in line with advancing technology and changing work demands. This same knowledge may also be used to identify the combination of skills essential in recruiting for new roles. To address the challenges inherent in exploring the complex, heterogeneous, dynamic data that feeds into such applications, we investigate the use of an ontology to guide structuring of the information space, to allow individuals and institutions to interactively explore and interpret the dynamic skill demand landscape for their specific needs. As a test case we consider the relatively new and highly dynamic field of Data Science, where insightful, exploratory data analysis and knowledge discovery are critical. We employ context-driven and task-centred scenarios to explore our research questions and guide iterative design, development and formative evaluation of our ontology-driven, visual exploratory discovery and analysis approach, to measure where it adds value to users’ analytical activity. Our findings reinforce the potential in our approach, and point us to future paths to build on

    Technical Research Priorities for Big Data

    Get PDF
    To drive innovation and competitiveness, organisations need to foster the development and broad adoption of data technologies, value-adding use cases and sustainable business models. Enabling an effective data ecosystem requires overcoming several technical challenges associated with the cost and complexity of management, processing, analysis and utilisation of data. This chapter details a community-driven initiative to identify and characterise the key technical research priorities for research and development in data technologies. The chapter examines the systemic and structured methodology used to gather inputs from over 200 stakeholder organisations. The result of the process identified five key technical research priorities in the areas of data management, data processing, data analytics, data visualisation and user interactions, and data protection, together with 28 sub-level challenges. The process also highlighted the important role of data standardisation, data engineering and DevOps for Big Data

    Using data analysis and Information visualization techniques to support the effective analysis of large financial data sets

    Get PDF
    There have been a number of technological advances in the last ten years, which has resulted in the amount of data generated in organisations increasing by more than 200% during this period. This rapid increase in data means that if financial institutions are to derive significant value from this data, they need to identify new ways to analyse this data effectively. Due to the considerable size of the data, financial institutions also need to consider how to effectively visualise the data. Traditional tools such as relational database management systems have problems processing large amounts of data due to memory constraints, latency issues and the presence of both structured and unstructured data The aim of this research was to use data analysis and information visualisation techniques (IV) to support the effective analysis of large financial data sets. In order to visually analyse the data effectively, the underlying data model must produce results that are reliable. A large financial data set was identified, and used to demonstrate that IV techniques can be used to support the effective analysis of large financial data sets. A review of the literature on large financial data sets, visual analytics, existing data management and data visualisation tools identified the shortcomings of existing tools. This resulted in the determination of the requirements for the data management tool, and the IV tool. The data management tool identified was a data warehouse and the IV toolkit identified was Tableau. The IV techniques identified included the Overview, Dashboards and Colour Blending. The IV tool was implemented and published online and can be accessed through a web browser interface. The data warehouse and the IV tool were evaluated to determine their accuracy and effectiveness in supporting the effective analysis of the large financial data set. The experiment used to evaluate the data warehouse yielded positive results, showing that only about 4% of the records had incorrect data. The results of the user study were positive and no major usability issues were identified. The participants found the IV techniques effective for analysing the large financial data set

    Enabling data-driven decision-making for a Finnish SME: a data lake solution

    Get PDF
    In the era of big data, data-driven decision-making has become a key success factor for companies of all sizes. Technological development has made it possible to store, process and analyse vast amounts of data effectively. The availability of cloud computing services has lowered the costs of data analysis. Even small businesses have access to advanced technical solutions, such as data lakes and machine learning applications. Data-driven decision-making requires integrating relevant data from various sources. Data has to be extracted from distributed internal and external systems and stored into a centralised system that enables processing and analysing it for meaningful insights. Data can be structured, semi-structured or unstructured. Data lakes have emerged as a solution for storing vast amounts of data, including a growing amount of unstructured data, in a cost-effective manner. The rise of the SaaS model has led to companies abandoning on-premise software. This blurs the line between internal and external data as the company’s own data is actually maintained by a third-party. Most enterprise software targeted for small businesses are provided through the SaaS model. Small businesses are facing the challenge of adopting data-driven decision-making, while having limited visibility to their own data. In this thesis, we study how small businesses can take advantage of data-driven decision-making by leveraging cloud computing services. We found that the report- ing features of SaaS based business applications used by our case company, a sales oriented SME, were insufficient for detailed analysis. Data-driven decision-making required aggregating data from multiple systems, causing excessive manual labour. A cloud based data lake solution was found to be a cost-effective solution for creating a centralised repository and automated data integration. It enabled management to visualise customer and sales data and to assess the effectiveness of marketing efforts. Better skills at data analysis among the managers of the case company would have been detrimental to obtaining the full benefits of the solution

    Reinventing the Social Scientist and Humanist in the Era of Big Data

    Get PDF
    This book explores the big data evolution by interrogating the notion that big data is a disruptive innovation that appears to be challenging existing epistemologies in the humanities and social sciences. Exploring various (controversial) facets of big data such as ethics, data power, and data justice, the book attempts to clarify the trajectory of the epistemology of (big) data-driven science in the humanities and social sciences

    MAJOR TECHNOLOGIES AND PRACTICAL ASPECTS OF THE DIGITAL TRANSFORMATION OF BUSINESS IN A BIG DATA ENVIRONMENT

    Get PDF
    In contemporary business development, digital transformation has become a major challenge to big data management. In an environment of increasing volumes of data, the major factors that are crucial to business development relate to big data processing and analysis. Open-source data processing technologies employ an innovative approach to the design of data processing and data analytics tools in cooperation among developers, which ensures the transparency, accessibility and continuous improvement of those tools. The main objective of this article is to review popular open-source technologies for big data processing and identify trends in employing big data in business applications

    Web technologies for environmental big data

    Get PDF
    Recent evolutions in computing science and web technology provide the environmental community with continuously expanding resources for data collection and analysis that pose unprecedented challenges to the design of analysis methods, workflows, and interaction with data sets. In the light of the recent UK Research Council funded Environmental Virtual Observatory pilot project, this paper gives an overview of currently available implementations related to web-based technologies for processing large and heterogeneous datasets and discuss their relevance within the context of environmental data processing, simulation and prediction. We found that, the processing of the simple datasets used in the pilot proved to be relatively straightforward using a combination of R, RPy2, PyWPS and PostgreSQL. However, the use of NoSQL databases and more versatile frameworks such as OGC standard based implementations may provide a wider and more flexible set of features that particularly facilitate working with larger volumes and more heterogeneous data sources

    Industrial IT security management supported by an asset management database

    Get PDF
    Managing the security of Information Technology (IT) systems and assets throughout their lifecycle is a very complex and important task for organisations, where the number of external threats and vulnerabilities in industrial systems continues to grow. Managing the complexity and having a clear system overview of all assets within industrial infrastructures are key challenges. Beyond that, there is no welldefined data structure to organize all data about IT assets from different industrial sources. This study describes a solution to support security management of industrial IT assets, developed in collaboration with the Siemens Corporate Technology Security Life-Cycle department. The database support tool aims to integrate asset data present on Component Object Server (COMOS) engineering tool and the Siemens Extensible Security Testing Appliance (SiESTA) security scanner software, using a Neo4j graph database to store and visualize the relationships between the assets, as well as their relevant security attributes. It was performed a comparison between five different database models, to assess which database model was more appropriate for the defined database requirements. It was defined a data model, based on National Institute Standards Technology (NIST) Asset Identification Specification 1.1. The objects and relationships of the data model, were determined to support the following use cases: Host Discovery, Port Scanning and Vulnerability Management. Using as input a network blueprint exported from COMOS software, the database support tool enabled to import and export data from the database, and to automate the creation of input files to enable SiESTA to perform scan tests on industrial networks. The proposed solution was validated through a questionnaire conducted to IT Security consultants, obtaining positive feedback on the tool usefulness in managing assets on industrial environments.Gerir a segurança dos sistemas e activos de Tecnologias da Informação (TI) ao longo do seu ciclo de vida é uma tarefa muito complexa e importante para as organizações, onde o número de ameaças externas e de vulnerabilidades nos sistemas industriais continua a aumentar. Gerir a complexidade e ter uma visão geral clara de todos os ativos dentro das infra-estruturas industriais são desafios chave. Além disso, não há uma estrutura de dados bem definida para organizar todos os dados sobre ativos de TI de diferentes fontes industriais. Este estudo descreve uma solução para auxiliar a gestão da segurança de ativos de TI industriais, desenvolvida em colaboração com o departamento de Security Life-Cycle da Siemens CT. A ferramenta desenvolvida tem como objetivo integrar dados de ativos presentes na ferramenta de engenharia Component Object Server (COMOS) e no software de segurança Siemens Extensible Security Testing Appliance (SiESTA), utilizando uma base de dados Neo4j para armazenar e visualizar as relações entre os ativos, bem como os seus atributos relevantes para segurança. Foi realizada uma comparação entre cinco diferentes modelos de bases de dados com o objetivo de avaliar qual o mais adequado para os requisitos de base de dados definidos. Foi definido um modelo de dados, baseado na National Institute Standards Technology (NIST) Asset Identification Specification 1.1. Os objetos e relações do modelo de dados foram determinados para dar suporte aos casos de uso: Descoberta de Hosts , Análise de Portas e Gestão de Vulnerabilidades. Ao usar como input um ficheiro com a configuração de redes exportado do software COMOS, a ferramenta de suporte à base de dados permite importar e exportar dados da base de dados, e automatizar a criação de ficheiros para o SiESTA realizar testes de segurança em redes industriais. A solução proposta foi validada através de um questionário conduzido aos consultores de Segurança de TI, obtendo opiniões positivas sobre a utilidade da ferramenta na gestão de ativos em ambientes industriais

    Web technologies for environmental Big Data

    No full text
    • …
    corecore