312,431 research outputs found

    Efficient Storage Management over Cloud Using Data Compression without Losing Searching Capacity

    Get PDF
    Nowadays due to social media, people may communicate with each other, share their thoughts and moments of life in form of texts, images or videos.  We are uploading our private data in terms of photos, videos, and documents on internet websites like Facebook, Whatsapp, Google+ and Youtube etc. In short today world is surrounded with large volume of data in different form. This put a requirement for effective management of these billions of terabytes of electronic data generally called BIG DATA. Handling large data sets is a major challenge for data centers. The only solution for this problem is to add as many hard disk as required. But if the data is kept in unformatted the requirement of hard disk will be very high. Cloud technology in today is becoming popular but efficient storage management for large volume of data on cloud still there is a big question. Many frameworks are available to address this problem. Hadoop is one of them. Hadoop provides an efficient way to store and retrieve large volume of data. But Hadoop is efficient only if the file containing data is large enough. Basically Hadoop uses a big hard disk block to store data. And this makes it inefficient in the area where volume to data is large but individual file is small. To satisfy both challenges to store large volume of data in less space. And to store small unit of file without wasting the space. We require to store data not is usual form but in compressed form so that we can keep the block size small. But if we do so it added one more dimension of problem. Searching the content in a compressed file is very in-efficient. Therefore we require an efficient algorithm which compress the file without disturbing the search capacity of the data center. Here we will provide the way how we can solve these challenges. Keywords:Cloud, Big DATA, Hadoop, Data Compression, MapReduc

    Discovering the core semantics of event from social media

    Full text link
    © 2015 Elsevier B.V. As social media is opening up such as Twitter and Sina Weibo,1 large volumes of short texts are flooding on the Web. The ocean of short texts dilutes the limited core semantics of event in cyberspace by redundancy, noises and irrelevant content on the web, which make it difficult to discover the core semantics of event. The major challenges include how to efficiently learn the semantic association distribution by small-scale association relations and how to maximize the coverage of the semantic association distribution by the minimum number of redundancy-free short texts. To solve the above issues, we explore a Markov random field based method for discovering the core semantics of event. This method makes semantics collaborative computation for learning association relation distribution and makes information gradient computation for discovering k redundancy-free texts as the core semantics of event. We evaluate our method by comparing with two state-of-the-art methods on the TAC dataset and the microblog dataset. The results show our method outperforms other methods in extracting core semantics accurately and efficiently. The proposed method can be applied to short text automatic generation, event discovery and summarization for big data analysis

    Framework for classroom student grading with open-ended questions: a text-mining approach

    Get PDF
    The purpose of this paper is to present a framework based on text-mining techniques to support teachers in their tasks of grading texts, compositions, or essays, which form the answers to open-ended questions (OEQ). The approach assumes that OEQ must be used as a learning and evaluation instrument with increasing frequency. Given the time-consuming grading process for those questions, their large-scale use is only possible when computational tools can help the teacher. This work assumes that the grading decision is entirely a teacher’s task responsibility, not the result of an automatic grading process. In this context, the teacher is the author of questions to be included in the tests, administration and results assessment, the entire cycle for this process being noticeably short: a few days at most. An attempt is made to address this problem. The method is entirely exploratory, descriptive and data-driven, the only data assumed as inputs being the texts of essays and compositions created by the students when answering OEQ for a single test on a specific occasion. Typically, the process involves exceedingly small data volumes measured by the power of current home computers, but big data when compared with human capabilities. The general idea is to use software to extract useful features from texts, perform lengthy and complex statistical analyses and present the results to the teacher, who, it is believed, will combine this information with his or her knowledge and experience to make decisions on mark allocation. A generic path model is formulated to represent that specific context and the kind of decisions and tasks a teacher should perform, the estimated results being synthesised using graphic displays. The method is illustrated by analysing three corpora of 126 texts originating in three different real learning contexts, time periods, educational levels and disciplines.info:eu-repo/semantics/publishedVersio

    Framework for classroom student grading with open-ended questions: A text-mining approach

    Get PDF
    The purpose of this paper is to present a framework based on text-mining techniques to support teachers in their tasks of grading texts, compositions, or essays, which form the answers to open-ended questions (OEQ). The approach assumes that OEQ must be used as a learning and evaluation instrument with increasing frequency. Given the time-consuming grading process for those questions, their large-scale use is only possible when computational tools can help the teacher. This work assumes that the grading decision is entirely a teacher’s task responsibility, not the result of an automatic grading process. In this context, the teacher is the author of questions to be included in the tests, administration and results assessment, the entire cycle for this process being noticeably short: a few days at most. An attempt is made to address this problem. The method is entirely exploratory, descriptive and data-driven, the only data assumed as inputs being the texts of essays and compositions created by the students when answering OEQ for a single test on a specific occasion. Typically, the process involves exceedingly small data volumes measured by the power of current home computers, but big data when compared with human capabilities. The general idea is to use software to extract useful features from texts, perform lengthy and complex statistical analyses and present the results to the teacher, who, it is believed, will combine this information with his or her knowledge and experience to make decisions on mark allocation. A generic path model is formulated to represent that specific context and the kind of decisions and tasks a teacher should perform, the estimated results being synthesised using graphic displays. The method is illustrated by analysing three corpora of 126 texts originating in three different real learning contexts, time periods, educational levels and disciplines.info:eu-repo/semantics/publishedVersio

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page
    • …
    corecore