66,057 research outputs found
A Workflow-oriented Language for Scalable Data Analytics
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014.Data in digital repositories are everyday more and more massive and distributed. Therefore analyzing them requires efficient data analysis techniques and scalable storage and computing platforms. Cloud computing infrastructures offer an effective support for addressing both the computational and data storage needs of big data mining and parallel knowledge discovery applications. In fact, complex data mining tasks involve data- and compute-intensive algorithms that require large and efficient storage facilities together with high performance processors to get results in acceptable times. In this paper we describe a Data Mining Cloud Framework (DMCF) designed for developing and executing distributed data analytics applications as workflows of services. We describe also a workflow-oriented language, called JS4Cloud, to support the design and execution of script-based data analysis workflows on DMCF. We finally present a data analysis application developed with JS4Cloud, and the scalability achieved executing it on DMCF.The work presented in this paper has been partially supported by EU under the COST programme Action IC1305, ’Network for Sustainable Ultrascale Computing (NESUS)’
Sequential Behavior Pattern Discovery with Frequent Episode Mining and Wireless Sensor Network
[EN] By recognizing patterns in occupants' daily activities, building systems are able to optimize and personalize services. Established technologies are available for data collection and pattern mining, but they all share the drawback that the methodology used for data collection tends to be ill suited for pattern recognition. For this research, we developed a bespoke WSN and combined it with a compact data format for frequent episode mining to overcome this obstacle. The proposed framework has been evaluated with both synthetic data from a smart home simulator and with real data from a self-organizing WSN in a student's home. We are able to demonstrate that the framework is capable of discovering sequential patterns in heterogeneous sensor data. With corresponding scenarios, patterns in daily activities can be deduced. The framework is self-contained, scalable, and energy-efficient, and is thus applicable in multiple building system settings.The authors gratefully acknowledge financial support from the National Natural Science Foundation of China (no. 51408442 and no. 61572231).Li; Li, X.; Lu, Z.; Lloret, J.; Song, H. (2017). Sequential Behavior Pattern Discovery with Frequent Episode Mining and Wireless Sensor Network. IEEE Communications Magazine. 55(6):205-211. https://doi.org/10.1109/MCOM.2017.160027620521155
An Advanced Conceptual Diagnostic Healthcare Framework for Diabetes and Cardiovascular Disorders
The data mining along with emerging computing techniques have astonishingly
influenced the healthcare industry. Researchers have used different Data Mining
and Internet of Things (IoT) for enrooting a programmed solution for diabetes
and heart patients. However, still, more advanced and united solution is needed
that can offer a therapeutic opinion to individual diabetic and cardio
patients. Therefore, here, a smart data mining and IoT (SMDIoT) based advanced
healthcare system for proficient diabetes and cardiovascular diseases have been
proposed. The hybridization of data mining and IoT with other emerging
computing techniques is supposed to give an effective and economical solution
to diabetes and cardio patients. SMDIoT hybridized the ideas of data mining,
Internet of Things, chatbots, contextual entity search (CES), bio-sensors,
semantic analysis and granular computing (GC). The bio-sensors of the proposed
system assist in getting the current and precise status of the concerned
patients so that in case of an emergency, the needful medical assistance can be
provided. The novelty lies in the hybrid framework and the adequate support of
chatbots, granular computing, context entity search and semantic analysis. The
practical implementation of this system is very challenging and costly.
However, it appears to be more operative and economical solution for diabetes
and cardio patients.Comment: 11 PAGE
21st Century Simulation: Exploiting High Performance Computing and Data Analysis
This paper identifies, defines, and analyzes the limitations imposed on Modeling and Simulation by outmoded
paradigms in computer utilization and data analysis. The authors then discuss two emerging capabilities to
overcome these limitations: High Performance Parallel Computing and Advanced Data Analysis. First, parallel
computing, in supercomputers and Linux clusters, has proven effective by providing users an advantage in
computing power. This has been characterized as a ten-year lead over the use of single-processor computers.
Second, advanced data analysis techniques are both necessitated and enabled by this leap in computing power.
JFCOM's JESPP project is one of the few simulation initiatives to effectively embrace these concepts. The
challenges facing the defense analyst today have grown to include the need to consider operations among non-combatant
populations, to focus on impacts to civilian infrastructure, to differentiate combatants from non-combatants,
and to understand non-linear, asymmetric warfare. These requirements stretch both current
computational techniques and data analysis methodologies. In this paper, documented examples and potential
solutions will be advanced. The authors discuss the paths to successful implementation based on their experience.
Reviewed technologies include parallel computing, cluster computing, grid computing, data logging, OpsResearch,
database advances, data mining, evolutionary computing, genetic algorithms, and Monte Carlo sensitivity analyses.
The modeling and simulation community has significant potential to provide more opportunities for training and
analysis. Simulations must include increasingly sophisticated environments, better emulations of foes, and more
realistic civilian populations. Overcoming the implementation challenges will produce dramatically better insights,
for trainees and analysts. High Performance Parallel Computing and Advanced Data Analysis promise increased
understanding of future vulnerabilities to help avoid unneeded mission failures and unacceptable personnel losses.
The authors set forth road maps for rapid prototyping and adoption of advanced capabilities. They discuss the
beneficial impact of embracing these technologies, as well as risk mitigation required to ensure success
Methods for frequent sequence mining with subsequence constraints
In this thesis, we study scalable and general purpose methods for mining frequent sequences that satisfy a given subsequence constraint. Frequent sequence mining is a fundamental task in data mining and has many real-life applications like information extraction, market-basket analysis, web usage mining, or session analysis. Depending on the underlying application, we are generally interested in discovering certain frequent sequences, which are described using subsequence constraints. There exists many tools and
algorithms for this task, however, they are not sufficiently scalable to deal with large amounts of data that may arise in applications and are generally not extensible across range of applications.
We propose scalable, distributed sequence mining algorithms that target MapReduce. Our work builds on MG-FSM, which is a distributed framework for frequent sequence mining. We propose novel algorithms that improve and extend the basic MG-FSM framework to efficiently support traditional subsequence constraints that arise in applications. Additionally, we show that many subsequence constraints---including and beyond the traditional ones considered in literature---can be unified in a single framework. A unified treatment
allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. To this end, we propose a general purpose framework that provides a set of simple and intuitive ``pattern expressions'', which allows to describe any subsequence constraint of interest and explore algorithms for efficiently mining frequent subsequences under such general constraints.
Our experimental study on real-world datasets indicates that our proposed algorithms are scalable and effective across wide range of applications
Efficient classification using parallel and scalable compressed model and Its application on intrusion detection
In order to achieve high efficiency of classification in intrusion detection,
a compressed model is proposed in this paper which combines horizontal
compression with vertical compression. OneR is utilized as horizontal
com-pression for attribute reduction, and affinity propagation is employed as
vertical compression to select small representative exemplars from large
training data. As to be able to computationally compress the larger volume of
training data with scalability, MapReduce based parallelization approach is
then implemented and evaluated for each step of the model compression process
abovementioned, on which common but efficient classification methods can be
directly used. Experimental application study on two publicly available
datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the
classification using the compressed model proposed can effectively speed up the
detection procedure at up to 184 times, most importantly at the cost of a
minimal accuracy difference with less than 1% on average
- …