6,633 research outputs found
Handling Concept Drifts in Regression Problems -- the Error Intersection Approach
Machine learning models are omnipresent for predictions on big data. One
challenge of deployed models is the change of the data over time, a phenomenon
called concept drift. If not handled correctly, a concept drift can lead to
significant mispredictions. We explore a novel approach for concept drift
handling, which depicts a strategy to switch between the application of simple
and complex machine learning models for regression tasks. We assume that the
approach plays out the individual strengths of each model, switching to the
simpler model if a drift occurs and switching back to the complex model for
typical situations. We instantiate the approach on a real-world data set of
taxi demand in New York City, which is prone to multiple drifts, e.g. the
weather phenomena of blizzards, resulting in a sudden decrease of taxi demand.
We are able to show that our suggested approach outperforms all regarded
baselines significantly
The emergence of information systems: a communication-based theory
An information system is more than just the information technology; it is the system that emerges from the complex interactions and relationships between the information technology and the organization. However, what impact information technology has on an organization and how organizational structures and organizational change influence information technology remains an open question. We propose a theory to explain how communication structures emerge and adapt to environmental changes. We operationalize the interplay of information technology and organization as language communities whose members use and develop domain-specific languages for communication. Our theory is anchored in the philosophy of language. In developing it as an emergent perspective, we argue that information systems are self-organizing and that control of this ability is disseminated throughout the system itself, to the members of the language community. Information technology influences the dynamics of this adaptation process as a fundamental constraint leading to perturbations for the information system. We demonstrate how this view is separated from the entanglement in practice perspective and show that this understanding has far-reaching consequences for developing, managing, and examining information systems
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
A Framework for Discovery and Diagnosis of Behavioral Transitions in Event-streams
Date stream mining techniques can be used in tracking user behaviors as they attempt to achieve their goals. Quality metrics over stream-mined models identify potential changes in user goal attainment. When the quality of some data mined models varies significantly from nearby models—as defined by quality metrics—then the user’s behavior is automatically flagged as a potentially significant behavioral change. Decision tree, sequence pattern and Hidden Markov modeling being used in this study. These three types of modeling can expose different aspect of user’s behavior. In case of decision tree modeling, the specific changes in user behavior can automatically characterized by differencing the data-mined decision-tree models. The sequence pattern modeling can shed light on how the user changes his sequence of actions and Hidden Markov modeling can identifies the learning transition points. This research describes how model-quality monitoring and these three types of modeling as a generic framework can aid recognition and diagnoses of behavioral changes in a case study of cognitive rehabilitation via emailing. The date stream mining techniques mentioned are used to monitor patient goals as part of a clinical plan to aid cognitive rehabilitation. In this context, real time data mining aids clinicians in tracking user behaviors as they attempt to achieve their goals. This generic framework can be widely applicable to other real-time data-intensive analysis problems. In order to illustrate this fact, the similar Hidden Markov modeling is being used for analyzing the transactional behavior of a telecommunication company for fraud detection. Fraud similarly can be considered as a potentially significant transaction behavioral change
- …