278 research outputs found

    Predictive Analysis on Twitter: Techniques and Applications

    Full text link
    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories

    Enhancing Privacy and Fairness in Search Systems

    Get PDF
    Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.Nach einer Zeit schneller Fortschritte in den Fähigkeiten digitaler Systeme beginnt die Gesellschaft zu erkennen, dass Systeme, die Menschen bei verschiedenen Aufgaben unterstützen sollen, den Einzelnen und die Gesellschaft auch schädigen können. Suchsysteme haben ein erhebliches Potenzial, um zu solchen unerwünschten Ergebnissen beizutragen, weil sie den Zugang zu Informationen vermitteln und explizit oder implizit Menschen in immer mehr Anwendungen in Ranglisten anordnen. Da sie riesige Datenmengen sowohl über Suchende als auch über Gesuchte sammeln, können sie die Privatsphäre dieser beiden Benutzergruppen verletzen. In Anwendungen, in denen Ranglisten einen Einfluss auf den finanziellen Lebensunterhalt der Menschen außerhalb der Plattform haben, z. B. auf Sharing-Economy-Plattformen oder Jobbörsen, haben Suchmaschinen eine immense wirtschaftliche Macht über ihre Nutzer, indem sie die Sichtbarkeit von Personen in Suchergebnissen kontrollieren. In dieser Dissertation werden neue Modelle und Methoden entwickelt, die verschiedene Aspekte der Privatsphäre und der Fairness in Suchsystemen, sowohl für Suchende als auch für Gesuchte, abdecken. Insbesondere leistet die Arbeit folgende Beiträge: (1) Wir schlagen ein Modell für die Berechnung von fairen Rankings vor, bei denen Suchsubjekte entsprechend ihrer Relevanz angezeigt werden. Die Sichtbarkeit wird im Laufe der Zeit durch ein Optimierungsmodell adjustiert, um die Verzerrungen der Sichtbarkeit für Sucher zu kompensieren, während die Nützlichkeit des Rankings beibehalten bleibt. (2) Wir schlagen ein Modell für die Bestimmung kritischer Suchanfragen vor, in dem für jeden Nutzer Aanfragen, die zu seinem Nutzerprofil in den Top-k-Suchergebnissen führen, herausgefunden werden. Das Problem der Berechnung von exponierenden Suchanfragen wird als Reverse-Nearest-Neighbor-Suche modelliert. Solche kritischen Suchanfragen werden dann von einem Learning-to-Rank-Modell geordnet, um die sensitiven Suchanfragen herauszufinden. (3) Wir schlagen ein Modell zur Quantifizierung von Risiken für die Privatsphäre aus Textdaten in Online Communities vor. Die Methode baut auf einem Themenmodell auf, bei dem jedes Thema durch einen Crowdsourcing-Sensitivitätswert annotiert wird. Die Risiko-Scores sind mit der Relevanz eines Benutzers mit kritischen Themen verbunden. Wir schlagen Relevanzmaße vor, die unterschiedliche Dimensionen des Benutzerinteresses an einem Thema erfassen, und wir zeigen, wie diese Maße mit der Risikowahrnehmung von Menschen korrelieren. (4) Wir schlagen ein Modell für personalisierte Suche vor, in dem die Privatsphäre geschützt wird. In dem Modell werden Suchanfragen von Nutzer partitioniert und in synthetische Profile eingefügt. Das Modell erreicht einen guten Kompromiss zwischen der Suchsystemnützlichkeit und der Privatsphäre, indem semantisch kohärente Fragmente der Suchhistorie innerhalb einzelner Profile beibehalten werden, wobei gleichzeitig angestrebt wird, die Ähnlichkeit der synthetischen Profile mit den ursprünglichen Nutzerprofilen zu minimieren. Die Modelle werden mithilfe von Informationssuchtechniken und Nutzerstudien ausgewertet. Wir benutzen eine Vielzahl von Datensätzen, die von Abfrageprotokollen über soziale Medien Postings und die Fragen vom Q&A Forums bis hin zu Artikellistungen von Sharing-Economy-Plattformen reichen

    Social Media Influencers- A Review of Operations Management Literature

    Get PDF
    This literature review provides a comprehensive survey of research on Social Media Influencers (SMIs) across the fields of SMIs in marketing, seeding strategies, influence maximization and applications of SMIs in society. Specifically, we focus on examining the methods employed by researchers to reach their conclusions. Through our analysis, we identify opportunities for future research that align with emerging areas and unexplored territories related to theory, context, and methodology. This approach offers a fresh perspective on existing research, paving the way for more effective and impactful studies in the future. Additionally, gaining a deeper understanding of the underlying principles and methodologies of these concepts enables more informed decision-making when implementing these strategie

    Quantitative Assessment of Factors in Sentiment Analysis

    Get PDF
    Sentiment can be defined as a tendency to experience certain emotions in relation to a particular object or person. Sentiment may be expressed in writing, in which case determining that sentiment algorithmically is known as sentiment analysis. Sentiment analysis is often applied to Internet texts such as product reviews, websites, blogs, or tweets, where automatically determining published feeling towards a product, or service is very useful to marketers or opinion analysts. The main goal of sentiment analysis is to identify the polarity of natural language text. This thesis sets out to examine quantitatively the factors that have an effect on sentiment analysis. The factors that are commonly used in sentiment analysis are text features, sentiment lexica or resources, and the machine learning algorithms employed. The main aim of this thesis is to investigate systematically the interaction between sentiment analysis factors and machine learning algorithms in order to improve sentiment analysis performance as compared to the opinions of human assessors. A software system known as TJP was designed and developed to support this investigation. The research reported here has three main parts. Firstly, the role of data pre-processing was investigated with TJP using a combination of features together with publically available datasets. This considers the relationship and relative importance of superficial text features such as emoticons, n-grams, negations, hashtags, repeated letters, special characters, slang, and stopwords. The resulting statistical analysis suggests that a combination of all of these features achieves better accuracy with the dataset, and had a considerable effect on system performance. Secondly, the effect of human marked up training data was considered, since this is required by supervised machine learning algorithms. The results gained from TJP suggest that training data greatly augments sentiment analysis performance. However, the combination of training data and sentiment lexica seems to provide optimal performance. Nevertheless, one particular sentiment lexicon, AFINN, contributed better than others in the absence of training data, and therefore would be appropriate for unsupervised approaches to sentiment analysis. Finally, the performance of two sophisticated ensemble machine learning algorithms was investigated. Both the Arbiter Tree and Combiner Tree were chosen since neither of them has previously been used with sentiment analysis. The objective here was to demonstrate their applicability and effectiveness compared to that of the leading single machine learning algorithms, NaĂŻve Bayes, and Support Vector Machines. The results showed that whilst either can be applied to sentiment analysis, the Arbiter Tree ensemble algorithm achieved better accuracy performance than either the Combiner Tree or any single machine learning algorithm

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    Uloga društvenih medija u mjerenju TV gledanosti

    Get PDF
    For many decades, traditional broadcast has been the main entertainment focal point in households. Like all media and entertainment industries, television has been altered by the internet and new technologies. The internet has made new forms of participatory communication possible and has increased the amount of interpersonal communication for individuals – audiences and users – providing opportunities to share, create and collaborate together. It offers manifold opportunities to communicate in all directions, as well as the opportunity to transmit and receive simultaneously all kinds of content and formats such as music, films, pictures and texts and enables the user to interact with links. The development of social media is more than a technical innovation: it sustains and influences all forms of social organisations. Besides (high speed) internet itself, wireless connectivity has created a comfortable environment for the usage of different devices. Smartphones, tablets and/or laptops are conquering households and invite (connected) usage while people watch TV; audiences divide their attention between a second and first screen, becoming a user and audience at the same time. It enables participation and social interaction within social media while watching TV. “Actions of the participatory audience appear in the value chain in several phases: when the audience is creating content, when they are editing or reediting the available content and when they are disseminating the content to other audience members” (Noguera Vivo et al., 2014, p. 181). This new participation of TV audiences in social media leads to an integration of TV consumption within the social media context. The “people formerly known as the audience are those who were on the receiving end of a media system that ran one way, in a broadcasting pattern, with high entry fees and a few firms competing to speak very loudly while the rest of the population listened in isolation from one another […]” (Rosen, 2006), the audience transformed into an active audience participating in the creation of (social) media content. The second screens enable virtual communication with friends about programs while watching and sharing what is liked and disliked, and television viewing coupled with audience interaction has gained popularity (Doughty, Rowland and Lawson, 2011). The audience can share, discuss, comment and vote about certain programs. Broadcasters and other suppliers offer applications accompanying TV consumption and solicit simultaneous usage. Audiences engage with the program and socialise with friends and communities around their favourite content. Television audience researchers discovered the internet as a source of audience data, and search for approaches to analyse online engagement of audiences. The main question of this work is to investigate if new data can be found and used in a systematic manner in addition to traditional television audience research methods. It was discovered that the relationship between television broadcasters and its social audience is the key to this approach. Traditional media such as TV broadcasters are still huge content providers and play a major role in the social media world, where content is shared and creates buzz and in addition users generate content themselves. Broadcasters are challenged to keep the relationship with and the attention of the viewer by building social interaction around the program. This is the prerequisite for the researcher to approach social media analysis in the context of television.Tradicionalno emitiranje je već desetljećima žarište zabave u kućanstvima. Kao i svi ostali mediji te industrije zabave, televizija je promijenjena zahvaljujući internetu i novim tehnologijama. Internet je omogućio nove forme komunikacije između sudionika te je povećao broj međuljutske komunikacije za pojedince – gledatelje i korisnike – tako što je omogućio prilike za zajedničkim dijeljenjem, stvaranjem i surađivanjem. Pruža mnoge prilike za komunikaciju u svim smjerovima, jednako kao i priliku za simultano slanje i primanje raznih vrsta sadržaja te formata kao što su muzika, filmovi, slike i tekstovi. Također omogućuje korisniku interakciju s web linkovima. Razvoj društvenih medija je više od tehnološke inovacije, ono podržava i utječe na sve oblike društvenih organizacija. Pored toga, sam internet (velike brzine) je uz bežično spajanje stvorio ugodnu okolinu za korištenje raznih uređaja. Pametni telefoni, tableti, i/ili laptopovi osvajaju kućanstva te pozivaju korisnika na online spajanje i korištenje interneta za vrijeme gledanja televizije pa tako gledatelji dijele svoju pozornost između dva ekrana, postajući na taj način istovremeni korisnici i gledatelji. Ovo omogućuje sudjelovanje te društvenu interakciju unutar društvenih medija tijekom gledanja televizije. “Djela uključenih gledatelja se pojavljuju u lancu vrijednosti u nekoliko faza: kada gledatelji stvaraju sadržaj, kada uređuju ili preuređuju dostupan sadržaj, te kada šire sadržaj drugim gledateljima. ” (Noguera Vivo et al., 2014, p. 181). Ovo novo sudjelovanje gledatelja na društvenim medijima vodi k integraciji gledanja televizije unutar konteksta društvenih medija. “Ljudi koji su prethodno prepoznati kao gledatelji su bili na primajućem kraju medijskog sustava koji se kretao u jednom smjeru, po strukturi emitiranja, uz visoke članarine te nekoliko tvrtki koje se natječu u tome da govore što glasnije dok ostatak populacije sluša u međusobnoj izolaciji […]” (Rosen, 2006), gledatelji su se pretvorili u aktivne gledatelje koji sudjeluju u stvaranju sadržaja (društvenih) medija. “Dodatni zasloni omogućavaju virtualnu komunikaciju s prijateljima o TV programima tijekom gledanja i dijeljenja sadržaja koji im se sviđa, odnosno ne sviđa, a i samo gledanje televizije s istovremenom interakcijom gledatelja postaje sve popularnije.” (Doughty, Rowland and Lawson, 2011). Gledatelji mogu dijeliti sadržaj, raspravljati, komentirati te glasati za određene TV programe. Televizijske kuće i ostali dobavljači nude aplikacije za praćenje korištenja usluge televizije te potiču njezino simultano korištenje. Gledatelji se uključuju u TV programe te raspravljaju s prijateljima i raznim zajednicama o njihovim najdražim sadržajima. Istražitelji koji prate gledanost televizije su prepoznali internet kao izvor podataka o gledateljima te istražuju pristupe za analizu online angažiranosti gledatelja. Potrebno je istražiti mogu li se pronaći novi podaci koji se mogu iskoristiti na sustavan način uz tradicionalne metode istraživanja gledanosti televizije. Otkriveno je da je ključ ovom pristupu sam odnos između televizijskih kuća i njihove publike, odnosno gledatelja. Tradicionalni mediji kao što su televizijske kuće se i dalje smatraju značajnim pružateljima sadržaja te igraju važnu ulogu u svijetu društvenih medija, gdje se dijele sadržaji koji stvaraju vijesti, i sadržaj pružaju sami korisnici. Izazov televizijskih kuća je da održavaju odnos s gledateljima te da imaju njihovu pozornost tako što će izgraditi društvenu interakciju oko TV programa. Ovo je preduvjet istražiteljima kako bi pristupili analizi društvenih medija u kontekstu televizije

    Multi-Dimensional-Personalization in mobile contexts

    Get PDF
    During the dot com era the word "personalisation” was a hot buzzword. With the fall of the dot com companies the topic has lost momentum. As the killer application for UMTS or the mobile internet has yet to be identified, the concept of Multi-Dimensional-Personalisation (MDP) could be a candidate. Using this approach, a recommendation of mobile advertisement or marketing (i.e., recommendations or notifications), online content, as well as offline events, can be offered to the user based on their known interests and current location. Instead of having to request or pull this information, the new service concept would proactively provide the information and services – with the consequence that the right information or service could therefore be offered at the right place, at the right time. The growing availability of "Location-based Services“ for mobile phones is a new target for the use of personalisation. "Location-based Services“ are information, for example, about restaurants, hotels or shopping malls with offers which are in close range / short distance to the user. The lack of acceptance for such services in the past is based on the fact that early implementations required the user to pull the information from the service provider. A more promising approach is to actively push information to the user. This information must be from interest to the user and has to reach the user at the right time and at the right place. This raises new requirements on personalisation which will go far beyond present requirements. It will reach out from personalisation based only on the interest of the user. Besides the interest, the enhanced personalisation has to cover the location and movement patterns, the usage and the past, present and future schedule of the user. This new personalisation paradigm has to protect the user’s privacy so that an approach supporting anonymous recommendations through an extended "Chinese Wall“ will be described

    Opinion Detection, Sentiment Analysis and User Attribute Detection from Online Text Data

    Get PDF
    With the growing increase in the use of the internet in most parts of the world today, users generate significant amounts of online text on different platforms such as online social networks, product review websites, travel blogs, to name just a few. The variety of content on these platforms has made them an important resource for researchers to gauge user activity, determine their opinions and analyze their behavior, without having to perform monetarily and temporally expensive surveys. Gaining insights into user behavior enables us to better understand their likes and dislikes, which in turn is helpful for economic purposes such as marketing, advertising and recommendations. Further, owing to the fact that online social networks have recently been instrumental in socio-political revolutions such as the Arab Spring, and for awareness-generation campaigns by MoveOn.org and Avaaz.org, analysis of online data can uncover user preferences. The overarching goal of this Ph.D. thesis is to pose some research questions and propose solutions, mostly pertaining to user opinions and attributes, keeping in mind the large quantities of noise present in online textual data. This thesis illustrates that with the extraction of informative textual features and the use of robust NLP and machine learning techniques, it is possible to perform efficient signal extraction from online text data, and use it to better understand user behavior. The first research problem addressed is that of opinion detection and sentiment analysis of users on a given topic, from their self-generated tweets. The key idea is to select relevant hashtags and n-grams using an l1l_1-regularized logistic regression model for opinion detection. The second research problem deals with temporal opinion detection from tweets, i.e., detecting user opinions on a topic in which the conversation evolves over time. For instance, on the widely-discussed topic of Obamacare (the Affordable Care Act in the U.S.), various issues became the focal points of discussion among users over time, as corresponding socio-political events and occurrences took place in real-time. We propose a machine-learning model based on seminal work from the sociological literature that is based on the premise that most opinion changes occur slowly over time. Our model is able to successfully capture opinions over time using publicly available tweets, as well as to uncover the key points of discussion as time progresses. In the third research problem, we utilize distributed representation of words in a method that determines, from user reviews, aspects of products and services that users like and dislike. We harness the contextual similarity between words and effectively build meta-features that capture user sentiment at a granular level. Finally in the fourth research problem, we propose a method to detect the age of users from their publicly available tweets. Using a method based on distributed representation of words and clustering, we are able to achieve high accuracies in age detection, as well as to simultaneously discover topics of conversation in which users of different age groups engage
    • …
    corecore