2,571 research outputs found

    Training High Quality Spam-detection Models Using Weak Labels

    Get PDF
    To be effective in detecting spam in online content sharing networks, it is necessary that techniques used to detect spam have good precision, high recall, and the ability to adapt to new types of spam. A bottleneck in developing such machine learning techniques is the lack of availability of high quality labeled training data. Human labeling to obtain high quality labeled data is expensive and not scalable. Current approaches such as unsupervised learning or semi-supervised learning can only produce low quality labels. Generally, the present disclosure is directed to a weak supervision approach to train a machine learning model to detect spam content items. Weak labels are generated for content items in training data using various techniques such as rules that encode domain knowledge and/or anomaly detection techniques such as unsupervised machine learning/ clustering or semi-supervised machine learning. The accuracy of the various techniques is estimated based on observed agreements/ disagreements in the weak labels. The weak labels are combined into a single value (e.g., per content item) that is used as a probabilistic training label to train a machine learning model using supervised learning that is noise aware. In the training, a penalty is applied for deviation from the probabilistic label such that the penalty is higher for a label associated with a higher confidence and lower for a label associated with a lower confidence. The model thus trained can be used to detect spam content

    Discovering Employment Listings from Imagery

    Get PDF
    Generally, the present disclosure is directed to discovering employment listings from imagery. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to identify employment listings based on image data

    Removing Breathing Artifacts from Audio

    Get PDF
    Generally, the present disclosure is directed to removing breathing artifacts from audio. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to generate a clean audio sample based on audio data from the audio sample

    Predicting Delivery Time of Components in a Supply Chain

    Get PDF
    Generally, the present disclosure is directed to predicting a delivery time of components in a supply chain. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict a delivery time of a component based on enterprise data relating to the component

    Applying machine learning techniques to determine product risks

    Get PDF
    Generally, the present disclosure is directed to techniques to automatically determine risks associated with a product. In some implementations, the techniques of the present disclosure can include or otherwise leverage one or more machine-learned models to determine if the release and continued sales of product has legal, privacy and/or business vulnerabilities based on identifying sensitive keywords in product-related documentation and product areas. This disclosure applies machine learning techniques to automate product audits and improve product review quality. Machine learning techniques are applied to proactively identify risks associated with a product, e.g., for a software product or service, the techniques are applied to determine a risk of privacy failure or incidents when user privacy may be violated. Application of machine learning as described herein can automate product review for risks. The techniques can help reduce the time spent by employees on reviewing products. Further, the techniques can substitute or augment manual product review. Deploying automated product review techniques also reduces reliance on the limited number of subject matter experts that typically conduct product review. Product stakeholders can learn from the risks and vulnerabilities identified in the automated review. Applying the techniques described herein can help accelerate product launch and reduce risks associated with the product. The techniques can be applied for product review by companies and other entities, e.g., when the product is subject to regulatory review and/or public scrutiny

    Predictive Cryptocurrency Mining and Staking

    Get PDF
    Generally, the present disclosure is directed to determining the validity of a chain within a blockchain system. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict the likelihood that a block within a blockchain system will be verified based on characteristics of the block
    • …
    corecore