6,633 research outputs found

    Handling Concept Drifts in Regression Problems -- the Error Intersection Approach

    Get PDF
    Machine learning models are omnipresent for predictions on big data. One challenge of deployed models is the change of the data over time, a phenomenon called concept drift. If not handled correctly, a concept drift can lead to significant mispredictions. We explore a novel approach for concept drift handling, which depicts a strategy to switch between the application of simple and complex machine learning models for regression tasks. We assume that the approach plays out the individual strengths of each model, switching to the simpler model if a drift occurs and switching back to the complex model for typical situations. We instantiate the approach on a real-world data set of taxi demand in New York City, which is prone to multiple drifts, e.g. the weather phenomena of blizzards, resulting in a sudden decrease of taxi demand. We are able to show that our suggested approach outperforms all regarded baselines significantly

    The emergence of information systems: a communication-based theory

    Get PDF
    An information system is more than just the information technology; it is the system that emerges from the complex interactions and relationships between the information technology and the organization. However, what impact information technology has on an organization and how organizational structures and organizational change influence information technology remains an open question. We propose a theory to explain how communication structures emerge and adapt to environmental changes. We operationalize the interplay of information technology and organization as language communities whose members use and develop domain-specific languages for communication. Our theory is anchored in the philosophy of language. In developing it as an emergent perspective, we argue that information systems are self-organizing and that control of this ability is disseminated throughout the system itself, to the members of the language community. Information technology influences the dynamics of this adaptation process as a fundamental constraint leading to perturbations for the information system. We demonstrate how this view is separated from the entanglement in practice perspective and show that this understanding has far-reaching consequences for developing, managing, and examining information systems

    Engineering Crowdsourced Stream Processing Systems

    Full text link
    A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

    A Framework for Discovery and Diagnosis of Behavioral Transitions in Event-streams

    Get PDF
    Date stream mining techniques can be used in tracking user behaviors as they attempt to achieve their goals. Quality metrics over stream-mined models identify potential changes in user goal attainment. When the quality of some data mined models varies significantly from nearby models—as defined by quality metrics—then the user’s behavior is automatically flagged as a potentially significant behavioral change. Decision tree, sequence pattern and Hidden Markov modeling being used in this study. These three types of modeling can expose different aspect of user’s behavior. In case of decision tree modeling, the specific changes in user behavior can automatically characterized by differencing the data-mined decision-tree models. The sequence pattern modeling can shed light on how the user changes his sequence of actions and Hidden Markov modeling can identifies the learning transition points. This research describes how model-quality monitoring and these three types of modeling as a generic framework can aid recognition and diagnoses of behavioral changes in a case study of cognitive rehabilitation via emailing. The date stream mining techniques mentioned are used to monitor patient goals as part of a clinical plan to aid cognitive rehabilitation. In this context, real time data mining aids clinicians in tracking user behaviors as they attempt to achieve their goals. This generic framework can be widely applicable to other real-time data-intensive analysis problems. In order to illustrate this fact, the similar Hidden Markov modeling is being used for analyzing the transactional behavior of a telecommunication company for fraud detection. Fraud similarly can be considered as a potentially significant transaction behavioral change
    • …
    corecore