4,584 research outputs found

    Data Mining Using Relational Database Management Systems

    Get PDF
    Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time

    Database Systems - Present and Future

    Get PDF
    The database systems have nowadays an increasingly important role in the knowledge-based society, in which computers have penetrated all fields of activity and the Internet tends to develop worldwide. In the current informatics context, the development of the applications with databases is the work of the specialists. Using databases, reach a database from various applications, and also some of related concepts, have become accessible to all categories of IT users. This paper aims to summarize the curricular area regarding the fundamental database systems issues, which are necessary in order to train specialists in economic informatics higher education. The database systems integrate and interfere with several informatics technologies and therefore are more difficult to understand and use. Thus, students should know already a set of minimum, mandatory concepts and their practical implementation: computer systems, programming techniques, programming languages, data structures. The article also presents the actual trends in the evolution of the database systems, in the context of economic informatics.database systems - DBS, database management systems – DBMS, database – DB, programming languages, data models, database design, relational database, object-oriented systems, distributed systems, advanced database systems

    Obvious: a meta-toolkit to encapsulate information visualization toolkits. One toolkit to bind them all

    Get PDF
    This article describes “Obvious”: a meta-toolkit that abstracts and encapsulates information visualization toolkits implemented in the Java language. It intends to unify their use and postpone the choice of which concrete toolkit(s) to use later-on in the development of visual analytics applications. We also report on the lessons we have learned when wrapping popular toolkits with Obvious, namely Prefuse, the InfoVis Toolkit, partly Improvise, JUNG and other data management libraries. We show several examples on the uses of Obvious, how the different toolkits can be combined, for instance sharing their data models. We also show how Weka and RapidMiner, two popular machine-learning toolkits, have been wrapped with Obvious and can be used directly with all the other wrapped toolkits. We expect Obvious to start a co-evolution process: Obvious is meant to evolve when more components of Information Visualization systems will become consensual. It is also designed to help information visualization systems adhere to the best practices to provide a higher level of interoperability and leverage the domain of visual analytics

    Digital system of quarry management as a SAAS solution: mineral deposit module

    Get PDF
    Purpose. Improving the efficiency of functioning the mining enterprises and aggregation of earlier obtained results into a unified digital system of designing and operative management by quarry operation. Methods. Both the traditional (analysis of scientific and patent literature, analytical methods of deposit parameters research, analysis of experience and exploitation of quarries, conducting the passive experiment and processing the statistical data) and new forms of scientific research - deposit modeling on the basis of classical and neural network methods of approximation – are used in the work. For the purpose of the software product realization on the basis of cloud technologies, there were used: for back-end implementation – server-based scripting language php; for the front-end – multi-paradigm programming language javascript, javascript framework jQuery and asynchronous data exchange technology Ajax. Findings. The target audience of the system has been identified, SWOT-analysis has been carried out, conceptual directions of 3D-quarry system development have been defined. The strategies of development and promotion of the software product, as well as the strategies of safety and reliability of the application both for the client and the owner of the system have been formulated. The modular structure of the application has been developed, and the system functions have been divided to implement both back-end and front-end applications. The Mineral Deposit Module has been developed: the geological structure of the deposit has been simulated and its block model has been constructed. It has been proved that the use of neural network algorithms does not give an essential increase in the accuracy of the block model for the deposits of 1 and 2 groups in terms of the geological structure complexity. The possibility and prospects of constructing the systems for subsoil users on the basis of cloud technologies and the concept of SaaS have been substantiated. Originality. For the first time, the modern software products for solving the problems of designing and operational management of mining operations have been successfully developed on the basis of the SaaS concept. Practical implications. The results are applicable for enterprises-subsoil users, working with deposits of 1 and 2 groups in terms of the geological structure complexity: design organizations, as well as mining and processing plants.Мета. Підвищення ефективності функціонування гірничорудних підприємств та агрегація раніше отриманих результатів в єдину цифрову систему проектування і оперативного управління роботою кар’єрів. Методика. У роботі використані як традиційні (аналіз науково-патентної літератури, аналітичні методи дослідження параметрів родовища, аналіз досвіду й експлуатації кар’єрів, проведення пасивного експерименту та статистичної обробки даних), так і нові форми наукового дослідження – моделювання родовища на основі класичних і нейромережевих методів апроксимації. Для реалізації програмного продукту на основі хмарних технологій використані: для реалізації back-end – серверна скриптова мова програмування php; для front-end – мультипарадігменна мова програмування javascript, javascript framework jQuery і технологія асинхронного обміну даними Ajax. Результати. Виявлено цільову аудиторію системи, проведено SWOT-аналіз, визначено концептуальні напрями розвитку системи 3D-кар’єр, розроблені стратегії розвитку та просування програмного продукту, розроблені стратегії безпеки й надійності додатки як для клієнта, так і власника системи. Розроблено модульну структуру програми, вироблено розподіл функцій системи для реалізації як back-end і front-end додатки. Розроблено модуль “Родовище”: проведено моделювання геологічної структури родовища та побудована його блокова модель. Доведено, що використання нейромережевих алгоритмів не дає принципового підвищення точності блокової моделі для родовищ 1 і 2 груп за складністю геологічної будови. Виявлено недоліки нейромережевих алгоритмів, такі як високі витрати обчислювальних ресурсів сервера і проблеми візуалізації великих масивів геоданих при використанні web-рішень, знайдені шляхи їх вирішення. Доведено можливість і перспективність побудови систем для надрокористувачів на основі хмарних технологій і концепції SaaS. Наукова новизна. Вперше на основі концепції ASP успішно побудовані сучасні програмні продукти для вирішення завдань проектування та оперативного керування гірничими роботами. Практична значимість. Результати корисні для підприємств-надрокористувачів, які працюють з родовищами 1 і 2 груп за складністю геологічної будови – проектних організацій і ГЗК.Цель. Повышение эффективности функционирования горнорудных предприятий и агрегация ранее полученных результатов в единую цифровую систему проектирования и оперативного управления работой карьеров. Методика. В работе использованы как традиционные (анализ научно-патентной литературы, аналитические методы исследования параметров месторождения, анализ опыта и эксплуатации карьеров, проведение пассивного эксперимента и статистической обработкой данных), так и новые формы научного исследования – моделирование месторождения на основе классических и нейросетевых методов аппроксимации. Для реализации программного продукта на основе облачных технологий использованы: для реализации back-end – серверный скриптовый язык программирования php; для front-end – мультипарадигменный язык программирования javascript, javascript framework jQuery и технология асинхронного обмена данными Ajax. Результаты. Выявлена целевая аудитория системы, проведен SWOT-анализ, определены концептуальные направления развития системы 3D-карьер, разработаны стратегии развития и продвижения программного продукта, разработаны стратегии безопасности и надежности приложения как для клиента, так и владельца системы. Разработана модульная структура приложения, произведено деление функций системы для реализации как back-end и front-end приложения. Разработан модуль “Месторождение”: проведено моделирование геологической структуры месторождения и построена его блочная модель. Доказано, что использование нейросетевых алгоритмов не дает принципиального повышения точности блочной модели для месторождений 1 и 2 групп по сложности геологического строения. Выявлены недостатки нейросетевых алгоритмов, такие как высокие затраты вычислительных ресурсов сервера и проблемы визуализации больших массивов геоданных при использовании web-решений, найдены пути их решения. Доказана возможность и перспективность построения систем для недропользователей на основе облачных технологий и концепции SaaS. Научная новизна. Впервые на основе концепции ASP успешно построены современные программные продукты для решения задач проектирования и оперативного управления горными работами. Практическая значимость. Результаты применимы для предприятий-недропользователей, работающих с месторождениями 1 и 2 групп по сложности геологического строения – проектных организаций и ГОКов.We express our profound gratitude to A.B. Naizabekov for his assistance in scientific research, to A.F. Tsekhovoy, P.A. Tsekhovoy, D.Sh. Akhmedov, V. V. Yankovenko and D.V. Nikitas for scientific advice in implementation of the program code. The research was carried out within the framework of the initiative research theme “Improving the Efficiency of Mining Enterprises” on the basis of the RSE at the Rudny Industrial Institute of the Ministry of Education and Science of the Republic of Kazakhstan

    Relational Algebra for In-Database Process Mining

    Get PDF
    The execution logs that are used for process mining in practice are often obtained by querying an operational database and storing the result in a flat file. Consequently, the data processing power of the database system cannot be used anymore for this information, leading to constrained flexibility in the definition of mining patterns and limited execution performance in mining large logs. Enabling process mining directly on a database - instead of via intermediate storage in a flat file - therefore provides additional flexibility and efficiency. To help facilitate this ideal of in-database process mining, this paper formally defines a database operator that extracts the 'directly follows' relation from an operational database. This operator can both be used to do in-database process mining and to flexibly evaluate process mining related queries, such as: "which employee most frequently changes the 'amount' attribute of a case from one task to the next". We define the operator using the well-known relational algebra that forms the formal underpinning of relational databases. We formally prove equivalence properties of the operator that are useful for query optimization and present time-complexity properties of the operator. By doing so this paper formally defines the necessary relational algebraic elements of a 'directly follows' operator, which are required for implementation of such an operator in a DBMS

    Challenging Ubiquitous Inverted Files

    Get PDF
    Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach

    Open issues in semantic query optimization in relational DBMS

    Get PDF
    After two decades of research into Semantic Query Optimization (SQO) there is clear agreement as to the efficacy of SQO. However, although there are some experimental implementations there are still no commercial implementations. We first present a thorough analysis of research into SQO. We identify three problems which inhibit the effective use of SQO in Relational Database Management Systems(RDBMS). We then propose solutions to these problems and describe first steps towards the implementation of an effective semantic query optimizer for relational databases
    corecore