513 research outputs found

    A survey of multiple classifier systems as hybrid systems

    Get PDF
    A current focus of intense research in pattern classification is the combination of several classifier systems, which can be built following either the same or different models and/or datasets building approaches. These systems perform information fusion of classification decisions at different levels overcoming limitations of traditional approaches based on single classifiers. This paper presents an up-to-date survey on multiple classifier system (MCS) from the point of view of Hybrid Intelligent Systems. The article discusses major issues, such as diversity and decision fusion methods, providing a vision of the spectrum of applications that are currently being developed

    Data-driven Computational Social Science: A Survey

    Get PDF
    Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research methods, there exist many critical social issues to be explored. To solve those issues, computational social science emerges due to the rapid advancements of computation technologies and the profound studies on social science. With the aids of the advanced research techniques, various kinds of data from diverse areas can be acquired nowadays, and they can help us look into social problems with a new eye. As a result, utilizing various data to reveal issues derived from computational social science area has attracted more and more attentions. In this paper, to the best of our knowledge, we present a survey on data-driven computational social science for the first time which primarily focuses on reviewing application domains involving human dynamics. The state-of-the-art research on human dynamics is reviewed from three aspects: individuals, relationships, and collectives. Specifically, the research methodologies used to address research challenges in aforementioned application domains are summarized. In addition, some important open challenges with respect to both emerging research topics and research methods are discussed.Comment: 28 pages, 8 figure

    Controversy trend detection in social media

    Get PDF
    In this research, we focus on the early prediction of whether topics are likely to generate significant controversy (in the form of social media such as comments, blogs, etc.). Controversy trend detection is important to companies, governments, national security agencies, and marketing groups because it can be used to identify which issues the public is having problems with and develop strategies to remedy them. For example, companies can monitor their press release to find out how the public is reacting and to decide if any additional public relations action is required, social media moderators can moderate discussions if the discussions start becoming abusive and getting out of control, and governmental agencies can monitor their public policies and make adjustments to the policies to address any public concerns. An algorithm was developed to predict controversy trends by taking into account sentiment expressed in comments, burstiness of comments, and controversy score. To train and test the algorithm, an annotated corpus was developed consisting of 728 news articles and over 500,000 comments on these articles made by viewers from CNN.com. This study achieved an average F-score of 71.3% across all time spans in detection of controversial versus non-controversial topics. The results suggest that it is possible for early prediction of controversy trends leveraging social media

    Metalearning

    Get PDF
    This open access book as one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, automated machine learning (AutoML) is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user. This book offers a comprehensive and thorough introduction to almost all aspects of metalearning and AutoML, covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience. This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence. ; Metalearning is the study of principled methods that exploit metaknowledge to obtain efficient models and solutions by adapting machine learning and data mining processes. While the variety of machine learning and data mining techniques now available can, in principle, provide good model solutions, a methodology is still needed to guide the search for the most appropriate model in an efficient way. Metalearning provides one such methodology that allows systems to become more effective through experience. This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms. It shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems. It can thus help developers improve their algorithms and also develop learning systems that can improve themselves. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining and artificial intelligence

    SODE: Self-Adaptive One-Dependence Estimators for classification

    Full text link
    © 2015 Elsevier Ltd. SuperParent-One-Dependence Estimators (SPODEs) represent a family of semi-naive Bayesian classifiers which relax the attribute independence assumption of Naive Bayes (NB) to allow each attribute to depend on a common single attribute (superparent). SPODEs can effectively handle data with attribute dependency but still inherent NB's key advantages such as computational efficiency and robustness for high dimensional data. In reality, determining an optimal superparent for SPODEs is difficult. One common approach is to use weighted combinations of multiple SPODEs, each having a different superparent with a properly assigned weight value (i.e., a weight value is assigned to each attribute). In this paper, we propose a self-adaptive SPODEs, namely SODE, which uses immunity theory in artificial immune systems to automatically and self-adaptively select the weight for each single SPODE. SODE does not need to know the importance of individual SPODE nor the relevance among SPODEs, and can flexibly and efficiently search optimal weight values for each SPODE during the learning process. Extensive experiments and comparisons on 56 benchmark data sets, and validations on image and text classification, demonstrate that SODE outperforms state-of-the-art weighted SPODE algorithms and is suitable for a wide range of learning tasks. Results also confirm that SODE provides an appropriate balance between runtime efficiency and accuracy

    Document analysis by means of data mining techniques

    Get PDF
    The huge amount of textual data produced everyday by scientists, journalists and Web users, allows investigating many different aspects of information stored in the published documents. Data mining and information retrieval techniques are exploited to manage and extract information from huge amount of unstructured textual data. Text mining also known as text data mining is the processing of extracting high quality information (focusing relevance, novelty and interestingness) from text by identifying patterns etc. Text mining typically involves the process of structuring input text by means of parsing and other linguistic features or sometimes by removing extra data and then finding patterns from structured data. Patterns are then evaluated at last and interpretation of output is performed to accomplish the desired task. Recently, text mining has got attention in several fields such as in security (involves analysis of Internet news), for commercial (for search and indexing purposes) and in academic departments (such as answering query). Beyond searching the documents consisting the words given in a user query, text mining may provide direct answer to user by semantic web for content based (content meaning and its context). It can also act as intelligence analyst and can also be used in some email spam filters for filtering out unwanted material. Text mining usually includes tasks such as clustering, categorization, sentiment analysis, entity recognition, entity relation modeling and document summarization. In particular, summarization approaches are suitable for identifying relevant sentences that describe the main concepts presented in a document dataset. Furthermore, the knowledge existed in the most informative sentences can be employed to improve the understanding of user and/or community interests. Different approaches have been proposed to extract summaries from unstructured text documents. Some of them are based on the statistical analysis of linguistic features by means of supervised machine learning or data mining methods, such as Hidden Markov models, neural networks and Naive Bayes methods. An appealing research field is the extraction of summaries tailored to the major user interests. In this context, the problem of extracting useful information according to domain knowledge related to the user interests is a challenging task. The main topics have been to study and design of novel data representations and data mining algorithms useful for managing and extracting knowledge from unstructured documents. This thesis describes an effort to investigate the application of data mining approaches, firmly established in the subject of transactional data (e.g., frequent itemset mining), to textual documents. Frequent itemset mining is a widely exploratory technique to discover hidden correlations that frequently occur in the source data. Although its application to transactional data is well-established, the usage of frequent itemsets in textual document summarization has never been investigated so far. A work is carried on exploiting frequent itemsets for the purpose of multi-document summarization so a novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer) is presented, that is based on an itemset-based model, i.e., a framework comprise of frequent itemsets, taken out from the document collection. Highly representative and not redundant sentences are selected for generating summary by considering both sentence coverage, with respect to a sentence relevance score, based on tf-idf statistics, and a concise and highly informative itemset-based model. To evaluate the ItemSum performance a suite of experiments on a collection of news articles has been performed. Obtained results show that ItemSum significantly outperforms mostly used previous summarizers in terms of precision, recall, and F-measure. We also validated our approach against a large number of approaches on the DUC’04 document collection. Performance comparisons, in terms of precision, recall, and F-measure, have been performed by means of the ROUGE toolkit. In most cases, ItemSum significantly outperforms the considered competitors. Furthermore, the impact of both the main algorithm parameters and the adopted model coverage strategy on the summarization performance are investigated as well. In some cases, the soundness and readability of the generated summaries are unsatisfactory, because the summaries do not cover in an effective way all the semantically relevant data facets. A step beyond towards the generation of more accurate summaries has been made by semantics-based summarizers. Such approaches combine the use of general-purpose summarization strategies with ad-hoc linguistic analysis. The key idea is to also consider the semantics behind the document content to overcome the limitations of general-purpose strategies in differentiating between sentences based on their actual meaning and context. Most of the previously proposed approaches perform the semantics-based analysis as a preprocessing step that precedes the main summarization process. Therefore, the generated summaries could not entirely reflect the actual meaning and context of the key document sentences. In contrast, we aim at tightly integrating the ontology-based document analysis into the summarization process in order to take the semantic meaning of the document content into account during the sentence evaluation and selection processes. With this in mind, we propose a new multi-document summarizer, namely Yago-based Summarizer, that integrates an established ontology-based entity recognition and disambiguation step. Named Entity Recognition from Yago ontology is being used for the task of text summarization. The Named Entity Recognition (NER) task is concerned with marking occurrences of a specific object being mentioned. These mentions are then classified into a set of predefined categories. Standard categories include “person”, “location”, “geo-political organization”, “facility”, “organization”, and “time”. The use of NER in text summarization improved the summarization process by increasing the rank of informative sentences. To demonstrate the effectiveness of the proposed approach, we compared its performance on the DUC’04 benchmark document collections with that of a large number of state-of-the-art summarizers. Furthermore, we also performed a qualitative evaluation of the soundness and readability of the generated summaries and a comparison with the results that were produced by the most effective summarizers. A parallel effort has been devoted to integrating semantics-based models and the knowledge acquired from social networks into a document summarization model named as SociONewSum. The effort addresses the sentence-based generic multi-document summarization problem, which can be formulated as follows: given a collection of news articles ranging over the same topic, the goal is to extract a concise yet informative summary, which consists of most salient document sentences. An established ontological model has been used to improve summarization performance by integrating a textual entity recognition and disambiguation step. Furthermore, the analysis of the user-generated content coming from Twitter has been exploited to discover current social trends and improve the appealing of the generated summaries. An experimental evaluation of the SociONewSum performance was conducted on real English-written news article collections and Twitter posts. The achieved results demonstrate the effectiveness of the proposed summarizer, in terms of different ROUGE scores, compared to state-of-the-art open source summarizers as well as to a baseline version of the SociONewSum summarizer that does not perform any UGC analysis. Furthermore, the readability of the generated summaries has also been analyzed
    • …
    corecore