2,656 research outputs found

    Visualizing Decision Trees and Forests using Radial Trees

    Get PDF
    Data visualization has become a big representation of many companyā€™s data and schedules. Now people are not using just simple bar graphs and pie charts in business meetings but utilizing other fields of study and even more complex graphs. By using multiple visualizations to display their results and projects, it is letting more outside people understand what they are working on and can lead to more viewpoints on the topic being displayed. Also, schedules for projects are now being displayed visually so the workers can see how much time each part of their project is going to take. With this increase in visualization, decision trees are now starting to become visualized. Decision trees zero in on object classification and find a way to label or group those objects. In this paper, complex decision trees that can be hard to understand for everyone will be visualized using radial trees. The program will take the advantages that radial trees offer for data and create an interactive display for users of decision trees and forests

    The history, evolution, and future of big data & analytics:A bibliometric analysis of its relationship to performance in organizations

    Get PDF
    Big data and analytics (BDA) are gaining momentum, particularly in the practitioner world. Research linking BDA to improved organizational performance seems scarce and widely dispersed though, with the majority focused on specific domains and/or macro-level relationships. In order to synthesize past research and advance knowledge of the potential organizational value of BDA, the authors obtained a data set of 327 primary studies and 1252 secondary cited papers. This paper reviews this body of research, using three bibliometric methods. First, it elucidates its intellectual foundations via co-citation analysis. Second, it visualizes the historical evolution of BDA and performance research and its substreams through algorithmic historiography. Third, it provides insights into the field's potential evolution via bibliographic coupling. The results reveal that the academic attention for the BDA-performance link has been increasing rapidly. The study uncovered ten research clusters that form the field's foundation. While research seems to have evolved following two main, isolated streams, the past decade has witnessed more cross-disciplinary collaborations. Moreover, the study identified several research topics undergoing focused development, including financial and customer risk management, text mining and evolutionary algorithms. The review concludes with a discussion of the implications for different functional management domains and the gaps for both research and practice.</p

    Predicting parking space availability based on heterogeneous data using Machine Learning techniques

    Get PDF
    Abstract. These days, smart cities are focused on improving their services and bringing quality to everyday life, leveraging modern ICT technologies. For this reason, the data from connected IoT devices, environmental sensors, economic platforms, social networking sites, governance systems, and others can be gathered for achieving such goals. The rapid increase in the number of vehicles in major cities of the world has made mobility in urban areas difficult, due to traffic congestion and parking availability issues. Finding a suitable parking space is often influenced by various factors such as weather conditions, traffic flows, and geographical information (markets, hospitals, parks, and others). In this study, a predictive analysis has been performed to estimate the availability of parking spaces using heterogeneous data from Cork County, Ireland. However, accumulating, processing, and analysing the produced data from heterogeneous sources is itself a challenge, due to their diverse nature and different acquisition frequencies. Therefore, a data lake has been proposed in this study to collect, process, analyse, and visualize data from disparate sources. In addition, the proposed platform is used for predicting the available parking spaces using the collected data from heterogeneous sources. The study includes proposed design and implementation details of data lake as well as the developed parking space availability prediction model using machine learning techniques

    A Data Science Course for Undergraduates: Thinking with Data

    Get PDF
    Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be non-traditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students with a variety of techniques to analyze small, neat, and clean data sets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that is considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms.Comment: 21 pages total including supplementary material

    A Study of Text Mining Framework for Automated Classification of Software Requirements in Enterprise Systems

    Get PDF
    abstract: Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of classification of software requirements statements into categories easily comprehensible for developers for faster development and delivery, which till now was mostly done manually by software engineers - indeed a tedious job. However, most of the research was focused on classification of Non-functional requirements pertaining to intangible features such as security, reliability, quality and so on. It is indeed a challenging task to automatically classify functional requirements, those pertaining to how the system will function, especially those belonging to different and large enterprise systems. This requires exploitation of text mining capabilities. This thesis aims to investigate results of text classification applied on functional software requirements by creating a framework in R and making use of algorithms and techniques like k-nearest neighbors, support vector machine, and many others like boosting, bagging, maximum entropy, neural networks and random forests in an ensemble approach. The study was conducted by collecting and visualizing relevant enterprise data manually classified previously and subsequently used for training the model. Key components for training included frequency of terms in the documents and the level of cleanliness of data. The model was applied on test data and validated for analysis, by studying and comparing parameters like precision, recall and accuracy.Dissertation/ThesisMasters Thesis Engineering 201
    • ā€¦
    corecore