33 research outputs found
Preparing Low Cost Solution Based On Customized Process Of Parallel Clustering Solution
Big Data analysis is the field of data processing where it involves collections of large volume of data sets which are generally so large and really complex in nature and also there is no unified scientific solution globally for any data analysis due to its nature of difficulties to process them by adopting traditional approaches and technologies. Handling large volume of data and preparing them for deep analysis to evaluate them and prepare required information as required by the mining process is the most complex and sometimes costlier task in real-time. There are many solutions for the data mining process like clustering, special mining, k-means mining to name a few. But the real challenge in data mining process is choosing the correct solution or algorithm to apply for mining the input data and tuning the processing step in such a way that we establish a cost effective solution for the entire mining process. There may be many solutions where mining is efficient but cost of operation is not effective and sometimes it is vice-versa. Hence there is always an ever increasing demand for an efficient solution which is cost effective as well as efficient in data mining technique. The intent of this paper is researching on how we implement a concept called Parallel clustering which gives higher benefit in terms of cost and time in data mining processing without compromising the efficiency and accuracy in expected result. This paper discusses one such custom algorithm and its performance as compared to other solutions
Attributed Network Embedding for Learning in a Dynamic Environment
Network embedding leverages the node proximity manifested to learn a
low-dimensional node vector representation for each node in the network. The
learned embeddings could advance various learning tasks such as node
classification, network clustering, and link prediction. Most, if not all, of
the existing works, are overwhelmingly performed in the context of plain and
static networks. Nonetheless, in reality, network structure often evolves over
time with addition/deletion of links and nodes. Also, a vast majority of
real-world networks are associated with a rich set of node attributes, and
their attribute values are also naturally changing, with the emerging of new
content patterns and the fading of old content patterns. These changing
characteristics motivate us to seek an effective embedding representation to
capture network and attribute evolving patterns, which is of fundamental
importance for learning in a dynamic environment. To our best knowledge, we are
the first to tackle this problem with the following two challenges: (1) the
inherently correlated network and node attributes could be noisy and
incomplete, it necessitates a robust consensus representation to capture their
individual properties and correlations; (2) the embedding learning needs to be
performed in an online fashion to adapt to the changes accordingly. In this
paper, we tackle this problem by proposing a novel dynamic attributed network
embedding framework - DANE. In particular, DANE first provides an offline
method for a consensus embedding and then leverages matrix perturbation theory
to maintain the freshness of the end embedding results in an online manner. We
perform extensive experiments on both synthetic and real attributed networks to
corroborate the effectiveness and efficiency of the proposed framework.Comment: 10 page
A Survey on Feature Selection Algorithms
One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations.
DOI: 10.17762/ijritcc2321-8169.16043
A survey of feature selection in Internet traffic characterization
In the last decade, the research community has focused on new classification methods that rely on statistical characteristics of Internet traffic, instead of pre-viously popular port-number-based or payload-based methods, which are under even bigger constrictions. Some research works based on statistical characteristics generated large fea-ture sets of Internet traffic; however, nowadays it?s impossible to handle hun-dreds of features in big data scenarios, only leading to unacceptable processing time and misleading classification results due to redundant and correlative data. As a consequence, a feature selection procedure is essential in the process of Internet traffic characterization. In this paper a survey of feature selection methods is presented: feature selection frameworks are introduced, and differ-ent categories of methods are briefly explained and compared; several proposals on feature selection in Internet traffic characterization are shown; finally, future application of feature selection to a concrete project is proposed
The influence of Twitter in the electoral processes. Analysis of the case of the primary elections in the PSOE, 2017
El propósito de este trabajo es analizar la influencia en la red social Twitter aplicada al
estudio de la presencia de los políticos durante un proceso electoral de primarias en un
partido. En este caso, el estudio se basa en las elecciones primarias del PSOE en 2017.
Para ello se utilizan también técnicas de aprendizaje de máquina y cuantificación de
contenidos a través del uso herramientas informáticas ligadas al Big data.This paper analyses Twitter influence as regards the presence of political candidates
during political parties’ primary elections. In this case, the study is based on PSOE primary
elections in 2017. Machine learning techniques and content quantification have been used as
methodological tools related to Big Data
Communicative superabundance of political leaders on Twitter. The Spanish case
Los últimos años han significado el descubrimiento y la utilización de forma cada vez más
profesionalizada de las redes sociales como una forma más de comunicación política y de
influencia en los procesos electorales; tanto en la esfera social y electoral como por su
capacidad de influir en la agenda política y mediática. El presente texto realiza un análisis
de la evolución del número total de tuits que los principales líderes políticos españoles
emitieron en Twitter en los meses previos a las dos últimas elecciones generales, específicamente los meses de noviembre y diciembre de 2015 y mayo y junio de 2016.
En concreto el trabajo contabiliza las cuentas de los candidatos a la presidencia de
Gobierno por parte de los cuatro principales partidos políticos españoles, mostrando la
evolución del número de tuits publicados. Ello permite evidenciar la sobreabundancia
comunicativa, en varios casos, de los líderes políticos españoles. Paralelamente el estudio
recoge el número y el crecimiento de los seguidores en la red social de los líderes buscando
la interrelación entre ambos factores. Utilizando una metodología cuantitativa se muestra
que la reiteración comunicativa sin valor añadido en la red por parte de muchos líderes no
significa necesariamente una comunicación eficaz ni una influencia relevante en las redes
sociales. Como consecuencia, el estudio encuentra entre sus conclusiones iniciales la
existencia de una sobreabundancia informativa por parte de los principales líderes políticos
que no se materializa en términos de audiencia.Recent years have meant the discovery and use of increasingly professionalized social
networks as a form of political communication and influence on electoral processes; both in
the social and electoral sphere and for its ability to influence the political and media agenda.
This text is an analysis of the evolution of the total number of tuits that the main Spanish
political leaders issued on Twitter in the months before the last two general elections,
specifically the months of November and December 2015 and May and June 2016.
Specifically, the work counts the accounts of the candidates for the presidency of the
Government by the four main Spanish political parties, showing the evolution of the number
of tuits published. This allows to demonstrate the communicative superabundance, in
several cases, of the Spanish political leaders. At the same time, the study gathers the
number and growth of followers in the social network of the leaders looking for the
interrelation between both factors. Using a quantitative methodology, it is shown that the
communicative reiteration without added value in the network by many leaders does not
necessarily mean effective communication or a relevant influence in social networks.
Therefore, the study finds among its initial conclusions the existence of an informative
overabundance on the part of the main political leaders that does not materialize in terms
of audience