12,727 research outputs found

    Machine learning with big data to solve real-world problems

    Get PDF
    Machine learning algorithms use big data to learn future trends and predict them for businesses. Machine learning can be very efficient for deciphering data in industries where understanding consumer patterns can lead to big improvements. The use of machine learning can be a giant leap for businesses and cannot simply be integrated as the top layer. This requires redefining workflow, architecture, data collection and storage, analytics, and other modules. The magnitude of the system overhaul should be assessed and clearly communicated to the appropriate stakeholders. The main focus of machine learning is to develop computer programs that can access data and use it to learn. The learning process starts with observations or data, to find a pattern in the data and make better decisions. The main goal of data analysis using machine learning is that it allows the computer to learn automatically without human intervention and help and can adjust its actions accordingly. Considering the many applications that data analysis has found in the real world, therefore, in this article, a review of the basic applications of machine learning as one of the tools of artificial intelligence has been done with an emphasis on big data analysis. The purpose of this article is to understand the dimensions, components and applications, and challenges of using machine learning in the real world

    Self-disclosure model for classifying & predicting text-based online disclosure

    Full text link
    Les médias sociaux et les sites de réseaux sociaux sont devenus des babillards numériques pour les internautes à cause de leur évolution accélérée. Comme ces sites encouragent les consommateurs à exposer des informations personnelles via des profils et des publications, l'utilisation accrue des médias sociaux a généré des problèmes d’invasion de la vie privée. Des chercheurs ont fait de nombreux efforts pour détecter l'auto-divulgation en utilisant des techniques d'extraction d'informations. Des recherches récentes sur l'apprentissage automatique et les méthodes de traitement du langage naturel montrent que la compréhension du sens contextuel des mots peut entraîner une meilleure précision que les méthodes d'extraction de données traditionnelles. Comme mentionné précédemment, les utilisateurs ignorent souvent la quantité d'informations personnelles publiées dans les forums en ligne. Il est donc nécessaire de détecter les diverses divulgations en langage naturel et de leur donner le choix de tester la possibilité de divulgation avant de publier. Pour ce faire, ce travail propose le « SD_ELECTRA », un modèle de langage spécifique au contexte. Ce type de modèle détecte les divulgations d'intérêts, de données personnelles, d'éducation et de travail, de relations, de personnalité, de résidence, de voyage et d'accueil dans les données des médias sociaux. L'objectif est de créer un modèle linguistique spécifique au contexte sur une plate-forme de médias sociaux qui fonctionne mieux que les modèles linguistiques généraux. De plus, les récents progrès des modèles de transformateurs ont ouvert la voie à la formation de modèles de langage à partir de zéro et à des scores plus élevés. Les résultats expérimentaux montrent que SD_ELECTRA a surpassé le modèle de base dans toutes les métriques considérées pour la méthode de classification de texte standard. En outre, les résultats montrent également que l'entraînement d'un modèle de langage avec un corpus spécifique au contexte de préentraînement plus petit sur un seul GPU peut améliorer les performances. Une application Web illustrative est conçue pour permettre aux utilisateurs de tester les possibilités de divulgation dans leurs publications sur les réseaux sociaux. En conséquence, en utilisant l'efficacité du modèle suggéré, les utilisateurs pourraient obtenir un apprentissage en temps réel sur l'auto-divulgation.Social media and social networking sites have evolved into digital billboards for internet users due to their rapid expansion. As these sites encourage consumers to expose personal information via profiles and postings, increased use of social media has generated privacy concerns. There have been notable efforts from researchers to detect self-disclosure using Information extraction (IE) techniques. Recent research on machine learning and natural language processing methods shows that understanding the contextual meaning of the words can result in better accuracy than traditional data extraction methods. Driven by the facts mentioned earlier, users are often ignorant of the quantity of personal information published in online forums, there is a need to detect various disclosures in natural language and give them a choice to test the possibility of disclosure before posting. For this purpose, this work proposes "SD_ELECTRA," a context-specific language model to detect Interest, Personal, Education and Work, Relationship, Personality, Residence, Travel plan, and Hospitality disclosures in social media data. The goal is to create a context-specific language model on a social media platform that performs better than the general language models. Moreover, recent advancements in transformer models paved the way to train language models from scratch and achieve higher scores. Experimental results show that SD_ELECTRA has outperformed the base model in all considered metrics for the standard text classification method. In addition, the results also show that training a language model with a smaller pre-training context-specific corpus on a single GPU can improve its performance. An illustrative web application designed allows users to test the disclosure possibilities in their social media posts. As a result, by utilizing the efficiency of the suggested model, users would be able to get real-time learning on self-disclosure

    All liaisons are dangerous when all your friends are known to us

    Get PDF
    Online Social Networks (OSNs) are used by millions of users worldwide. Academically speaking, there is little doubt about the usefulness of demographic studies conducted on OSNs and, hence, methods to label unknown users from small labeled samples are very useful. However, from the general public point of view, this can be a serious privacy concern. Thus, both topics are tackled in this paper: First, a new algorithm to perform user profiling in social networks is described, and its performance is reported and discussed. Secondly, the experiments --conducted on information usually considered sensitive-- reveal that by just publicizing one's contacts privacy is at risk and, thus, measures to minimize privacy leaks due to social graph data mining are outlined.Comment: 10 pages, 5 table

    A Review of Machine Learning-based Security in Cloud Computing

    Full text link
    Cloud Computing (CC) is revolutionizing the way IT resources are delivered to users, allowing them to access and manage their systems with increased cost-effectiveness and simplified infrastructure. However, with the growth of CC comes a host of security risks, including threats to availability, integrity, and confidentiality. To address these challenges, Machine Learning (ML) is increasingly being used by Cloud Service Providers (CSPs) to reduce the need for human intervention in identifying and resolving security issues. With the ability to analyze vast amounts of data, and make high-accuracy predictions, ML can transform the way CSPs approach security. In this paper, we will explore some of the most recent research in the field of ML-based security in Cloud Computing. We will examine the features and effectiveness of a range of ML algorithms, highlighting their unique strengths and potential limitations. Our goal is to provide a comprehensive overview of the current state of ML in cloud security and to shed light on the exciting possibilities that this emerging field has to offer.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    How Machines Learn: Where Do Companies Get Data for Machine Learning and What Licenses Do They Need?

    Get PDF
    Machine learning services ingest customer data in order to provide refined, customized services. Machine learning algorithms are increasingly prominent in multiple sectors within the software-as-a-service industry including online advertising, health diagnostics, and travel. However, very little has been written on the rights a company utilizing machine learning needs to obtain in order to use customer data to improve its own products or services. Machine learning encompasses multiple types of data use and analysis, including (a) supervised machine learning algorithms, which take specific data provided in a tagged and classified format to deliver specific predictable output; and (b) unsupervised machine learning algorithms, where untagged data is processed in order to look for patterns and correlations without a specified output. This Article introduces the reader to the types of data use involved in various machine learning models, the level of data retention normally required for each model, and the risks of using personal information or re-identifiable data in connection with machine learning. The paper also discusses the type of license a commercial provider and consumer would need to enter into for various types of machine learning software. Finally, the paper proposes best practices for ensuring adequate rights are obtained through legal agreements so that machines may self-improve and innovate
    • …
    corecore