5 research outputs found

    Modeling Content Lifespan in Online Social Networks Using Data Mining

    Get PDF
    Online Social Networks (OSNs) are integrated into business, entertainment, politics, and education; they are integrated into nearly every facet of our everyday lives. They have played essential roles in milestones for humanity, such as the social revolutions in certain countries, to more day-to-day activities, such as streaming entertaining or educational materials. Not surprisingly, social networks are the subject of study, not only for computer scientists, but also for economists, sociologists, political scientists, and psychologists, among others. In this dissertation, we build a model that is used to classify content on the OSNs of Reddit, 4chan, Flickr, and YouTube according the types of lifespan their content have and the popularity tiers that the content reaches. The proposed model is evaluated using 10-fold cross-validation, using data mining techniques of Sequential Minimal Optimization (SMO), which is a support vector machine algorithm, Decision Table, Naïve Bayes, and Random Forest. The run times and accuracies are compared across OSNs, models, and data mining algorithms. The peak/death category of Reddit content can be classified with 64% accuracy. The peak/death category of 4Chan content can be classified with 76% accuracy. The peak/death category of Flickr content can classified with 65% accuracy. We also used 10-fold cross-validation to measure the accuracy in which the popularity tier of content can be classified. The popularity tier of content on Reddit can be classified with 84% accuracy. The popularity tier of content on 4chan can be classified with 70% accuracy. The popularity tier of content on Flickr can be classified with 66% accuracy. The popularity tier of content on YouTube can be classified with only 48% accuracy. Our experiments compared the runtimes and accuracy of SMO, Naïve Bayes, Decision Table, and Random Forest to classify the lifespan of content on Reddit, 4chan, and Flickr as well as classify the popularity tier of content on Reddit, 4chan, Flickr, and YouTube. The experimental results indicate that SMO is capable of outperforming the other algorithms in runtime across all OSNs. Decision Table has the longest observed runtimes, failing to complete analysis before system crashes in some cases. The statistical analysis indicates, with 95% confidence, there is no statistically significant difference in accuracy between the algorithms across all OSNs. Reddit content was shown, with 95% confidence, to be the OSN least likely to be misclassified. All other OSNs, were shown to have no statistically significant difference in terms of their content being more or less likely to be misclassified when compared pairwise with each other

    Antecipação na tomada de decisão com múltiplos critérios sob incerteza

    Get PDF
    Orientador: Fernando José Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A presença de incerteza em resultados futuros pode levar a indecisões em processos de escolha, especialmente ao elicitar as importâncias relativas de múltiplos critérios de decisão e de desempenhos de curto vs. longo prazo. Algumas decisões, no entanto, devem ser tomadas sob informação incompleta, o que pode resultar em ações precipitadas com consequências imprevisíveis. Quando uma solução deve ser selecionada sob vários pontos de vista conflitantes para operar em ambientes ruidosos e variantes no tempo, implementar alternativas provisórias flexíveis pode ser fundamental para contornar a falta de informação completa, mantendo opções futuras em aberto. A engenharia antecipatória pode então ser considerada como a estratégia de conceber soluções flexíveis as quais permitem aos tomadores de decisão responder de forma robusta a cenários imprevisíveis. Essa estratégia pode, assim, mitigar os riscos de, sem intenção, se comprometer fortemente a alternativas incertas, ao mesmo tempo em que aumenta a adaptabilidade às mudanças futuras. Nesta tese, os papéis da antecipação e da flexibilidade na automação de processos de tomada de decisão sequencial com múltiplos critérios sob incerteza é investigado. O dilema de atribuir importâncias relativas aos critérios de decisão e a recompensas imediatas sob informação incompleta é então tratado pela antecipação autônoma de decisões flexíveis capazes de preservar ao máximo a diversidade de escolhas futuras. Uma metodologia de aprendizagem antecipatória on-line é então proposta para melhorar a variedade e qualidade dos conjuntos futuros de soluções de trade-off. Esse objetivo é alcançado por meio da previsão de conjuntos de máximo hipervolume esperado, para a qual as capacidades de antecipação de metaheurísticas multi-objetivo são incrementadas com rastreamento bayesiano em ambos os espaços de busca e dos objetivos. A metodologia foi aplicada para a obtenção de decisões de investimento, as quais levaram a melhoras significativas do hipervolume futuro de conjuntos de carteiras financeiras de trade-off avaliadas com dados de ações fora da amostra de treino, quando comparada a uma estratégia míope. Além disso, a tomada de decisões flexíveis para o rebalanceamento de carteiras foi confirmada como uma estratégia significativamente melhor do que a de escolher aleatoriamente uma decisão de investimento a partir da fronteira estocástica eficiente evoluída, em todos os mercados artificiais e reais testados. Finalmente, os resultados sugerem que a antecipação de opções flexíveis levou a composições de carteiras que se mostraram significativamente correlacionadas com as melhorias observadas no hipervolume futuro esperado, avaliado com dados fora das amostras de treinoAbstract: The presence of uncertainty in future outcomes can lead to indecision in choice processes, especially when eliciting the relative importances of multiple decision criteria and of long-term vs. near-term performance. Some decisions, however, must be taken under incomplete information, what may result in precipitated actions with unforeseen consequences. When a solution must be selected under multiple conflicting views for operating in time-varying and noisy environments, implementing flexible provisional alternatives can be critical to circumvent the lack of complete information by keeping future options open. Anticipatory engineering can be then regarded as the strategy of designing flexible solutions that enable decision makers to respond robustly to unpredictable scenarios. This strategy can thus mitigate the risks of strong unintended commitments to uncertain alternatives, while increasing adaptability to future changes. In this thesis, the roles of anticipation and of flexibility on automating sequential multiple criteria decision-making processes under uncertainty are investigated. The dilemma of assigning relative importances to decision criteria and to immediate rewards under incomplete information is then handled by autonomously anticipating flexible decisions predicted to maximally preserve diversity of future choices. An online anticipatory learning methodology is then proposed for improving the range and quality of future trade-off solution sets. This goal is achieved by predicting maximal expected hypervolume sets, for which the anticipation capabilities of multi-objective metaheuristics are augmented with Bayesian tracking in both the objective and search spaces. The methodology has been applied for obtaining investment decisions that are shown to significantly improve the future hypervolume of trade-off financial portfolios for out-of-sample stock data, when compared to a myopic strategy. Moreover, implementing flexible portfolio rebalancing decisions was confirmed as a significantly better strategy than to randomly choosing an investment decision from the evolved stochastic efficient frontier in all tested artificial and real-world markets. Finally, the results suggest that anticipating flexible choices has lead to portfolio compositions that are significantly correlated with the observed improvements in out-of-sample future expected hypervolumeDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric

    Latent Dirichlet Conditional Naive-Bayes Models for Privacy-Preservation Clustering

    No full text
    corecore