2 research outputs found
Categorização e classificação de notÃcias de big data em tecnologias segundo o Quadrante Mágico de Gartner
O desenvolvimento das tecnologias nos últimos anos levou a um aumento
contÃnuo de dados e sua acumulação a uma velocidade incalculável. Todos estes fatores
acima mencionados levaram à banalização de um novo conceito: Big Data.
Neste estudo foram extraÃdas 11 505 notÃcias sobre Big Data do Google News e
foram aplicadas técnicas de Text Mining de forma a obter conhecimento relevante e uma
categorização noticiosa, através de Latent Dirichlet Allocation. São abordadas as
Tecnologias Big Data relativamente aos Quadrantes de Gartner de forma a perceber o
tipo de Tecnologias em que as empresas de um Quadrante especÃfico investem. Desta
forma, este estudo tem uma contribuição interessante para a literatura, pois fornece
resultados concretos sobre o comportamento do mercado, provenientes de dados
factuais.
Este estudo comprova a força das empresas integrantes do Quadrante de Gartner
leaders, revelando que estas são cada vez mais lÃderes de mercado, apresentando uma
solução muito completa e diversificada de Tecnologias Big Data. É também
demonstrado que as empresas que integram o Quadrante de Gartner challengers não
demonstram entendimento sobre a direção em que o mercado se desloca e que uma
empresa que pertença ao Quadrante de Gartner visionaries, caso aposte fortemente na
Tecnologia Big Data stream analytics terá a sua posição alterada no Quadrante de
Gartner, aproximando-se cada vez mais do Quadrante leaders e, ao mesmo tempo, do
Quadrante niche players.The development of technologies in recent years has led to a continuous increase
in data, and its accumulation at an incalculable speed. All these factors mentioned above
have led to the trivialization of a new concept: Big Data.
In this study 11505 Google News Big Data news were extracted and Text
Mining techniques were applied to obtain relevant knowledge and a news categorization
through the Latent Dirichlet Allocation algorithm. Big Data Technologies are
approached relatively to the Gartner Quadrants in order to perceive the type of
Technologies wherein companies of a specific Quadrant invest. Thus, this study has an
interesting contribution to the literature, since it provides concrete results on the market
behavior, coming from factual data.
This study proves the strength of the Gartner leaders quadrant, revealing that
they are increasingly market leaders, presenting a very complete and diverse Big Data
Technology solution. It is also demonstrated that the companies in the challengers
Gartner Quadrant do not demonstrate understanding of the direction the market is
moving and that a company belonging to the visionaries Gartner Quadrant betting
strong on Big Data stream analytics technology will have is position modified in the
Gartner Quadrant, increasingly approaching the Leaders Quadrant and, at the same time,
the niche players Quadrant
Mining Online Text Data for Sentiment and News Impact Analysis
As continuous growth of Internet, an ever increasing amount of information becomesavailable on the World Wide Web (WWW). Information on the WWW has never been soexploded that search engines using traditional keyword-based searching strategies hardlymeet people’s needs to retrieve knowledge from online massive text data. The motivationof this thesis comes from the great demands on discovering implicit knowledge and richsemantics from online documents.This thesis focuses on analyzing online business news, a representative of objective information,and online customer reviews, a representative of subjective information. Foronline business news, a topic driven impact analysis model is proposed that quantifies theimpact of topic of a news article. With the proposed topic driven impact analysis model,an explorative visual analysis system called ImpactWheel is developed to help users betternavigate and understand topic-specific companies’ impact relationships through miningrich information source of online business news.For online customer reviews, both document overall sentiment classification and attributedbasedsentiment analysis are performed. In the regard of document overall sentiment classification,taking advantages of high frequency of Co-occurring Term (CoT) patterns incustomer reviews, a frequency-based algorithm is proposed to generate complex featureswhich benefits sentiment classifiers. In order to search for effective features and ignoreuseless ones produced by the frequency-based complex feature generation algorithm, anEffective Feature Search (EFS) framework is proposed, which makes a novel connectionbetween feature candidate generation and a Stochastic Local Search process. In theregard of attributed-based sentiment analysis, the concept of Sentiment Ontology Tree isproposed, which organizes a product’s domain specific knowledge as well as sentiments ina tree-like ontology structure. With the concept of SOT, a Hierarchial Learning via SentimentOntology Tree (HL-SOT) approach is proposed to solve the sentiment analysis tasksin a hierarchical classification process. To enhance the classification performance andcomputational efficiency of the HL-SOT approach which encodes texts using a globallyunified index term space, a Localized Feature Selection (LFS) framework is developedwhich generates the customized index term space for each node of SOT. Since that theHL-SOT approach was estimated by a RLS estimator which is not competent enough tofind max class separation and that the statistical linear classifier has been evidently provenits fallibility on classifying sentiment, a more pragmatic Hybrid Hierarchical ClassificationProcess (HHCP) is proposed. The HHCP approach employs a linear classifier thatis capable of maximizing the class separation while minimizing the within-class variancefor attribute detection and turns to a rule-based solution for sentiment orientation