33 research outputs found

    Preparing Low Cost Solution Based On Customized Process Of Parallel Clustering Solution

    Get PDF
    Big Data analysis is the field of data processing where it involves collections of large volume of data sets which are generally so large and really complex in nature and also there is no unified scientific solution globally for any data analysis due to its nature of difficulties to process them by adopting traditional approaches and technologies. Handling large volume of data and preparing them for deep analysis to evaluate them and prepare required information as required by the mining process is the most complex and sometimes costlier task in real-time. There are many solutions for the data mining process like clustering, special mining, k-means mining to name a few. But the real challenge in data mining process is choosing the correct solution or algorithm to apply for mining the input data and tuning the processing step in such a way that we establish a cost effective solution for the entire mining process. There may be many solutions where mining is efficient but cost of operation is not effective and sometimes it is vice-versa. Hence there is always an ever increasing demand for an efficient solution which is cost effective as well as efficient in data mining technique. The intent of this paper is researching on how we implement a concept called Parallel clustering which gives higher benefit in terms of cost and time in data mining processing without compromising the efficiency and accuracy in expected result. This paper discusses one such custom algorithm and its performance as compared to other solutions

    Attributed Network Embedding for Learning in a Dynamic Environment

    Full text link
    Network embedding leverages the node proximity manifested to learn a low-dimensional node vector representation for each node in the network. The learned embeddings could advance various learning tasks such as node classification, network clustering, and link prediction. Most, if not all, of the existing works, are overwhelmingly performed in the context of plain and static networks. Nonetheless, in reality, network structure often evolves over time with addition/deletion of links and nodes. Also, a vast majority of real-world networks are associated with a rich set of node attributes, and their attribute values are also naturally changing, with the emerging of new content patterns and the fading of old content patterns. These changing characteristics motivate us to seek an effective embedding representation to capture network and attribute evolving patterns, which is of fundamental importance for learning in a dynamic environment. To our best knowledge, we are the first to tackle this problem with the following two challenges: (1) the inherently correlated network and node attributes could be noisy and incomplete, it necessitates a robust consensus representation to capture their individual properties and correlations; (2) the embedding learning needs to be performed in an online fashion to adapt to the changes accordingly. In this paper, we tackle this problem by proposing a novel dynamic attributed network embedding framework - DANE. In particular, DANE first provides an offline method for a consensus embedding and then leverages matrix perturbation theory to maintain the freshness of the end embedding results in an online manner. We perform extensive experiments on both synthetic and real attributed networks to corroborate the effectiveness and efficiency of the proposed framework.Comment: 10 page

    A Survey on Feature Selection Algorithms

    Get PDF
    One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations. DOI: 10.17762/ijritcc2321-8169.16043

    A survey of feature selection in Internet traffic characterization

    Get PDF
    In the last decade, the research community has focused on new classification methods that rely on statistical characteristics of Internet traffic, instead of pre-viously popular port-number-based or payload-based methods, which are under even bigger constrictions. Some research works based on statistical characteristics generated large fea-ture sets of Internet traffic; however, nowadays it?s impossible to handle hun-dreds of features in big data scenarios, only leading to unacceptable processing time and misleading classification results due to redundant and correlative data. As a consequence, a feature selection procedure is essential in the process of Internet traffic characterization. In this paper a survey of feature selection methods is presented: feature selection frameworks are introduced, and differ-ent categories of methods are briefly explained and compared; several proposals on feature selection in Internet traffic characterization are shown; finally, future application of feature selection to a concrete project is proposed

    The influence of Twitter in the electoral processes. Analysis of the case of the primary elections in the PSOE, 2017

    Get PDF
    El propósito de este trabajo es analizar la influencia en la red social Twitter aplicada al estudio de la presencia de los políticos durante un proceso electoral de primarias en un partido. En este caso, el estudio se basa en las elecciones primarias del PSOE en 2017. Para ello se utilizan también técnicas de aprendizaje de máquina y cuantificación de contenidos a través del uso herramientas informáticas ligadas al Big data.This paper analyses Twitter influence as regards the presence of political candidates during political parties’ primary elections. In this case, the study is based on PSOE primary elections in 2017. Machine learning techniques and content quantification have been used as methodological tools related to Big Data

    Communicative superabundance of political leaders on Twitter. The Spanish case

    Get PDF
    Los últimos años han significado el descubrimiento y la utilización de forma cada vez más profesionalizada de las redes sociales como una forma más de comunicación política y de influencia en los procesos electorales; tanto en la esfera social y electoral como por su capacidad de influir en la agenda política y mediática. El presente texto realiza un análisis de la evolución del número total de tuits que los principales líderes políticos españoles emitieron en Twitter en los meses previos a las dos últimas elecciones generales, específicamente los meses de noviembre y diciembre de 2015 y mayo y junio de 2016. En concreto el trabajo contabiliza las cuentas de los candidatos a la presidencia de Gobierno por parte de los cuatro principales partidos políticos españoles, mostrando la evolución del número de tuits publicados. Ello permite evidenciar la sobreabundancia comunicativa, en varios casos, de los líderes políticos españoles. Paralelamente el estudio recoge el número y el crecimiento de los seguidores en la red social de los líderes buscando la interrelación entre ambos factores. Utilizando una metodología cuantitativa se muestra que la reiteración comunicativa sin valor añadido en la red por parte de muchos líderes no significa necesariamente una comunicación eficaz ni una influencia relevante en las redes sociales. Como consecuencia, el estudio encuentra entre sus conclusiones iniciales la existencia de una sobreabundancia informativa por parte de los principales líderes políticos que no se materializa en términos de audiencia.Recent years have meant the discovery and use of increasingly professionalized social networks as a form of political communication and influence on electoral processes; both in the social and electoral sphere and for its ability to influence the political and media agenda. This text is an analysis of the evolution of the total number of tuits that the main Spanish political leaders issued on Twitter in the months before the last two general elections, specifically the months of November and December 2015 and May and June 2016. Specifically, the work counts the accounts of the candidates for the presidency of the Government by the four main Spanish political parties, showing the evolution of the number of tuits published. This allows to demonstrate the communicative superabundance, in several cases, of the Spanish political leaders. At the same time, the study gathers the number and growth of followers in the social network of the leaders looking for the interrelation between both factors. Using a quantitative methodology, it is shown that the communicative reiteration without added value in the network by many leaders does not necessarily mean effective communication or a relevant influence in social networks. Therefore, the study finds among its initial conclusions the existence of an informative overabundance on the part of the main political leaders that does not materialize in terms of audience
    corecore