1,525 research outputs found

    Review on OFS: Online Feature Selection based on Regression analysis and Clustering method along with its Application

    Get PDF
    In Data mining the Feature selection is one of the main techniques. In this its result shows, almost all learning of feature selection is finite to batch learning. Not similar to existing batch learning methods, online learning can be chosen by an encouraging family of well-organized and scalable machine learning algorithms for large-scale approach. The large scale quantity of online learning needs to retrieve all the features/attributes of occurrence. The difficulty in Online Feature Selection in which the online learner is allowed to maintain a classifier that involved a small and fixed or exact number of features. This article demonstrates two different tasks of online feature selection. First one is learning with full input and second is learning with partial input. The sparsity regularization and truncation techniques are used for developing the algorithms. There is a challenge of online feature selection is how to make prediction accurately for an instance using a small number of active features in high dimensionality. The proposed system presents novel method such as Multiclass classification, Regression analysis and Clustering method to clear up each of the two problems and give their performance analysis

    Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification

    Get PDF
    Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms

    Dimension Reduction in Big Data Environment-A Survey

    Get PDF
    Relational database management system is able to tackle data set which is structured in some way and by means of querying to the system user gets certain answer. But if the data set itself does not lie under any sort of structure, it is generally very tedious job for user to get answer to certain query. This is the new challenge coming out for the last decade to the scientists, researchers, industrialists and this new form of data is termed as big data. Parallel computation not only from the concept of hardware, but different application dependent software is now being developed to tackle this new data set for solving the challenges generally attached with large data set such as data curation, search, querying, storage etc. Information sensing devices, RFID readers, cloud storage now days are making data set to grow in an increasing manner. The goal of big data analytics is to help industry and organizations to take intelligent decisions by analyzing huge number of transactions that remain untouched till today by conventional business intelligent systems. As the size of dataset grows large also with redundancy, software and people need to analyze only useful information for particular application and this newly reduced dataset are useful compare to noisy and large data

    Understanding citizen science and environmental monitoring: final report on behalf of UK Environmental Observation Framework

    Get PDF
    Citizen science can broadly be defined as the involvement of volunteers in science. Over the past decade there has been a rapid increase in the number of citizen science initiatives. The breadth of environmental-based citizen science is immense. Citizen scientists have surveyed for and monitored a broad range of taxa, and also contributed data on weather and habitats reflecting an increase in engagement with a diverse range of observational science. Citizen science has taken many varied approaches from citizen-led (co-created) projects with local community groups to, more commonly, scientist-led mass participation initiatives that are open to all sectors of society. Citizen science provides an indispensable means of combining environmental research with environmental education and wildlife recording. Here we provide a synthesis of extant citizen science projects using a novel cross-cutting approach to objectively assess understanding of citizen science and environmental monitoring including: 1. Brief overview of knowledge on the motivations of volunteers. 2. Semi-systematic review of environmental citizen science projects in order to understand the variety of extant citizen science projects. 3. Collation of detailed case studies on a selection of projects to complement the semi-systematic review. 4. Structured interviews with users of citizen science and environmental monitoring data focussing on policy, in order to more fully understand how citizen science can fit into policy needs. 5. Review of technology in citizen science and an exploration of future opportunities

    A study on machine learning algorithms for fall detection and movement classification

    Get PDF
    Fall among the elderly is an important health issue. Fall detection and movement tracking techniques are therefore instrumental in dealing with this issue. This thesis responds to the challenge of classifying different movement types as a part of a system designed to fulfill the need for a wearable device to collect data for fall and near-fall analysis. Four different fall activities (forward, backward, left and right), three normal activities (standing, walking and lying down) and near-fall situations are identified and detected. Different machine learning algorithms are compared and the best one is used for the real time classification. The comparison is made using Waikato Environment for Knowledge Analysis or in short WEKA. The system also has the ability to adapt to different gaits of different people. A feature selection algorithm is also introduced to reduce the number of features required for the classification problem

    Online feature selection for mining big data

    Get PDF
    Ministry of Education, Singapore under its Academic Research Funding Tier

    Storage systems for mobile-cloud applications

    Get PDF
    Mobile devices have become the major computing platform in todays world. However, some apps on mobile devices still suffer from insufficient computing and energy resources. A key solution is to offload resource-demanding computing tasks from mobile devices to the cloud. This leads to a scenario where computing tasks in the same application run concurrently on both the mobile device and the cloud. This dissertation aims to ensure that the tasks in a mobile app that employs offloading can access and share files concurrently on the mobile and the cloud in a manner that is efficient, consistent, and transparent to locations. Existing distributed file systems and network file systems do not satisfy these requirements. Furthermore, current offloading platforms either do not support efficient file access for offloaded tasks or do not offload tasks with file accesses. The first part of the dissertation addresses this issue by designing and implementing an application-level file system named Overlay File System (OFS). OFS assumes a cloud surrogate is paired with each mobile device for task and storage offloading. To achieve high efficiency, OFS maintains and buffers local copies of data sets on both the surrogate and the mobile device. OFS ensures consistency and guarantees that all the reads get the latest data. To effectively reduce the network traffic and the execution delay, OFS uses a delayed-update mechanism, which combines write-invalidate and write-update policies. To guarantee location transparency, OFS creates a unified view of file data. The research tests OFS on Android OS with a real mobile application and real mobile user traces. Extensive experiments show that OFS can effectively support consistent file accesses from computation tasks, no matter where they run. In addition, OFS can effectively reduce both file access latency and network traffic incurred by file accesses. While OFS allows offloaded tasks to access the required files in a consistent and transparent manner, file accesses by offloaded tasks can be further improved. Instead of retrieving the required files from its associated mobile device, a surrogate can discover and retrieve identical or similar file(s) from the surrogates belonging to other users to meet its needs. This is based on two observations: 1) multiple users have the same or similar files, e.g., shared files or images/videos of same object; 2) the need for a certain file content in mobile apps can usually be described by context features of the content, e.g., location, objects in an image, etc.; thus, any file with the required context features can be used to satisfy the need. Since files may be retrieved from surrogates, this solution improves latency and saves wireless bandwidth and power on mobile devices. The second part of the dissertation proposes and develops a Context-Aware File Discovery Service (CAFDS) that implements the idea described above. CAFDS uses a self-organizing map and k-means clustering to classify files into file groups based on file contexts. It then uses an enhanced decision tree to locate and retrieve files based on the file contexts defined by apps. To support diverse file discovery demands from various mobile apps, CAFDS allows apps to add new file contexts and to update existing file contexts dynamically, without affecting the discovery process. To evaluate the effectiveness of CAFDS, the research has implemented a prototype on Android and Linux. The performance of CAFDS was tested against Chord, a DHT based lookup scheme, and SPOON, a P2P file sharing system. The experiments show that CAFDS provides lower end-to-end latency for file search than Chord and SPOON, while providing similar scalability to Chord
    • …
    corecore