Search CORE

2,571 research outputs found

Training High Quality Spam-detection Models Using Weak Labels

Author: Luo Chong
Shao Haidong
Xia Cassandra
Publication venue: Technical Disclosure Commons
Publication date: 29/10/2020
Field of study

To be effective in detecting spam in online content sharing networks, it is necessary that techniques used to detect spam have good precision, high recall, and the ability to adapt to new types of spam. A bottleneck in developing such machine learning techniques is the lack of availability of high quality labeled training data. Human labeling to obtain high quality labeled data is expensive and not scalable. Current approaches such as unsupervised learning or semi-supervised learning can only produce low quality labels. Generally, the present disclosure is directed to a weak supervision approach to train a machine learning model to detect spam content items. Weak labels are generated for content items in training data using various techniques such as rules that encode domain knowledge and/or anomaly detection techniques such as unsupervised machine learning/ clustering or semi-supervised machine learning. The accuracy of the various techniques is estimated based on observed agreements/ disagreements in the weak labels. The weak labels are combined into a single value (e.g., per content item) that is used as a probabilistic training label to train a machine learning model using supervised learning that is noise aware. In the training, a penalty is applied for deviation from the probabilistic label such that the penalty is higher for a label associated with a higher confidence and lower for a label associated with a lower confidence. The model thus trained can be used to detect spam content

Technical Disclosure Common

Discovering Employment Listings from Imagery

Author: Armstrong Charles
Publication venue: Technical Disclosure Commons
Publication date: 07/12/2017
Field of study

Generally, the present disclosure is directed to discovering employment listings from imagery. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to identify employment listings based on image data

Technical Disclosure Common

Removing Breathing Artifacts from Audio

Author: N/A
Publication venue: Technical Disclosure Commons
Publication date: 07/12/2017
Field of study

Generally, the present disclosure is directed to removing breathing artifacts from audio. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to generate a clean audio sample based on audio data from the audio sample

Technical Disclosure Common

Predicting Delivery Time of Components in a Supply Chain

Author: Raman Venki
Reddy Sunil
Publication venue: Technical Disclosure Commons
Publication date: 07/12/2017
Field of study

Generally, the present disclosure is directed to predicting a delivery time of components in a supply chain. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict a delivery time of a component based on enterprise data relating to the component

Technical Disclosure Common

Applying machine learning techniques to determine product risks

Author
Publication venue: Technical Disclosure Commons
Publication date: 28/08/2017
Field of study

Generally, the present disclosure is directed to techniques to automatically determine risks associated with a product. In some implementations, the techniques of the present disclosure can include or otherwise leverage one or more machine-learned models to determine if the release and continued sales of product has legal, privacy and/or business vulnerabilities based on identifying sensitive keywords in product-related documentation and product areas. This disclosure applies machine learning techniques to automate product audits and improve product review quality. Machine learning techniques are applied to proactively identify risks associated with a product, e.g., for a software product or service, the techniques are applied to determine a risk of privacy failure or incidents when user privacy may be violated. Application of machine learning as described herein can automate product review for risks. The techniques can help reduce the time spent by employees on reviewing products. Further, the techniques can substitute or augment manual product review. Deploying automated product review techniques also reduces reliance on the limited number of subject matter experts that typically conduct product review. Product stakeholders can learn from the risks and vulnerabilities identified in the automated review. Applying the techniques described herein can help accelerate product launch and reduce risks associated with the product. The techniques can be applied for product review by companies and other entities, e.g., when the product is subject to regulatory review and/or public scrutiny

Technical Disclosure Common

Predictive Cryptocurrency Mining and Staking

Author: Price Thomas
Publication venue: Technical Disclosure Commons
Publication date: 07/12/2017
Field of study

Generally, the present disclosure is directed to determining the validity of a chain within a blockchain system. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict the likelihood that a block within a blockchain system will be verified based on characteristics of the block

Technical Disclosure Common