7 research outputs found

    Duomenų tyrybos sistemos, pagrįstos saityno paslaugomis

    Get PDF
    Straipsnis skirtas duomenų tyrybos, pagrįstos saityno paslaugomis, analizei. Apibrėžiamos pagrindinės su saityno paslaugomis susijusios sąvokos. Pristatomos paskirstytosios duomenų tyrybos galimybės bei jų įgyvendinimo priemonės – Grid, Hadoop. Atliekama duomenų tyrybos sistemų, pagrįstų saityno paslaugomis, analitinė apžvalga. Parenkami sistemų palyginimo kriterijai. Pagal šiuos kriterijus atliekama populiariausių duomenų tyrybos sistemų, pagrįstų saityno paslaugomis, lyginamoji analizė. Nustatoma, kurios sistemos įvertinamos geriausiai, o kurios neatitinka daugumos kriterijų.Data mining systems, based on Web services Olga Kurasova, Virginijus Marcinkevičius, Viktor Medvedev, Aurimas Rapečka SummaryIn the paper, data mining systems, based on web services, are analysed. The main notation related with web services is described. The possibilities of distributed data mining and their implementation tools – Grid, Hadoop are introduced. An analytical review of the data mining systems, based on web services, is provided. Some comparison criteria are selected. According to the criteria, a comparative analysis of the popular data mining systems, based on web services, is made. The paper illustrates, which systems are best for evaluating and which do not satisfy most of the criteria.11pt; line-height: 115%; font-family: Calibri, sans-serif;">&nbsp

    Concurrent software architectures for exploratory data analysis

    Get PDF
    Decades ago, increased volume of data made manual analysis obsolete and prompted the use of computational tools with interactive user interfaces and rich palette of data visualizations. Yet their classic, desktop-based architectures can no longer cope with the ever-growing size and complexity of data. Next-generation systems for explorative data analysis will be developed on client–server architectures, which already run concurrent software for data analytics but are not tailored to for an engaged, interactive analysis of data and models. In explorative data analysis, the key is the responsiveness of the system and prompt construction of interactive visualizations that can guide the users to uncover interesting data patterns. In this study, we review the current software architectures for distributed data analysis and propose a list of features to be included in the next generation frameworks for exploratory data analysis. The new generation of tools for explorative data analysis will need to address integrated data storage and processing, fast prototyping of data analysis pipelines supported by machine-proposed analysis workflows, preemptive analysis of data, interactivity, and user interfaces for intelligent data visualizations. The systems will rely on a mixture of concurrent software architectures to meet the challenge of seamless integration of explorative data interfaces at client site with management of concurrent data mining procedures on the servers

    Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

    Full text link
    Abstract Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.http://deepblue.lib.umich.edu/bitstream/2027.42/134522/1/13742_2016_Article_117.pd

    Developing ontology-based decision-making framework for Middle Eastern region HEIs

    Get PDF
    Decision making is one of the most challenging processes that higher education institutions continuously experience worldwide. Most educational decisions rely mainly on evaluating the academic profile of staff members, which usually includes the academic and research activities of the teacher. The massive amount of scattered educational data, if represented in traditional forms, causes the problem of ambiguity and inaccuracy of decisions. Educational institutions have recently been attempting to apply emerging technologies in the data engineering field to solve as many challenges as possible. In addition, online libraries continuously produce an enormous amount of open scholarly data, including publications, citations, and other research activity records, which could effectively improve the quality of academic decisions when linked with the local data of universities. This thesis presents the academic profiles and course records semantically, and employs them with a scientific knowledge graph as linked data to enrich the internal data and support the decision-making process within universities. The proposed approach is applied to assign courses to the most qualified academic staff as a proof-of-concept experiment. Traditionally, this process is performed manually by heads of departments and is considered time-consuming, especially when the data are in textual format. This research aims to address this challenge. To this end, courses and academic profiles are represented semantically in RDF format, in order to improve the quality of the institutional data. To ensure the efficiency of this process, a survey is conducted to identify the key factors that influence decision making during the distribution of courses among staff members, which was successfully distributed to the heads of departments who actively participated and provided their variable insights into this matter. The survey results indicated that the research areas of academic staff and whether they had taught the course before are the most important factors that are usually considered in this type of decision. Furthermore, this study proves the importance of generating links between local data and external repositories with updated research records to improve the course–teacher assignment process. Linked data technology is applied to combine all the possible information affecting the course–teacher assignment decision from different resources, and the sufficiency of the linked data and the selection of external data are examined using data mining techniques. Two prediction models are developed to predict the most qualified academic teacher to teach each course, with the results being associated with 314 academic teachers and 119 courses from the Faculty of Computing and Information Technology at King Abdulaziz University. According to the obtained accuracy of the models, it is suggested that the performance is improved when the data are enriched with external scholarly open data using LD, with the accuracy increasing from 80.95% to 93.26% after applying LD techniques. Additionally, adding research records of the academic member improved the sensitivity of the models to 89.11% and 97.76%. These improvements demonstrate the importance of considering the research activities of academic members when distributing courses, especially when extracted from external repositories using LD

    NEW ARTIFACTS FOR THE KNOWLEDGE DISCOVERY VIA DATA ANALYTICS (KDDA) PROCESS

    Get PDF
    Recently, the interest in the business application of analytics and data science has increased significantly. The popularity of data analytics and data science comes from the clear articulation of business problem solving as an end goal. To address limitations in existing literature, this dissertation provides four novel design artifacts for Knowledge Discovery via Data Analytics (KDDA). The first artifact is a Snail Shell KDDA process model that extends existing knowledge discovery process models, but addresses many existing limitations. At the top level, the KDDA Process model highlights the iterative nature of KDDA projects and adds two new phases, namely Problem Formulation and Maintenance. At the second level, generic tasks of the KDDA process model are presented in a comparative manner, highlighting the differences between the new KDDA process model and the traditional knowledge discovery process models. Two case studies are used to demonstrate how to use KDDA process model to guide real world KDDA projects. The second artifact, a methodology for theory building based on quantitative data is a novel application of KDDA process model. The methodology is evaluated using a theory building case from the public health domain. It is not only an instantiation of the Snail Shell KDDA process model, but also makes theoretical contributions to theory building. It demonstrates how analytical techniques can be used as quantitative gauges to assess important construct relationships during the formative phase of theory building. The third artifact is a data mining ontology, the DM3 ontology, to bridge the semantic gap between business users and KDDA expert and facilitate analytical model maintenance and reuse. The DM3 ontology is evaluated using both criteria-based approach and task-based approach. The fourth artifact is a decision support framework for MCDA software selection. The framework enables users choose relevant MCDA software based on a specific decision making situation (DMS). A DMS modeling framework is developed to structure the DMS based on the decision problem and the users\u27 decision preferences and. The framework is implemented into a decision support system and evaluated using application examples from the real-estate domain
    corecore