1,967 research outputs found
Statistical learning methods for mining marketing and biological data
Nowadays, the value of data has been broadly recognized and emphasized. More and more decisions are made based on data and analysis rather than solely on experience and intuition. With the fast development of networking, data storage, and data collection capacity, data have increased dramatically in industry, science and engineering domains, which brings both great opportunities and challenges. To take advantage of the data flood, new computational methods are in demand to process, analyze and understand these datasets.
This dissertation focuses on the development of statistical learning methods for online advertising and bioinformatics to model real world data with temporal or spatial changes. First, a collaborated online change-point detection method is proposed to identify the change-points in sparse time series. It leverages the signals from the auxiliary time series such as engagement metrics to compensate the sparse revenue data and improve detection efficiency and accuracy through smart collaboration. Second, a task-specific multi-task learning algorithm is developed to model the ever-changing video viewing behaviors. With the 1-regularized task-specific features and jointly estimated shared features, it allows different models to seek common ground while reserving differences. Third, an empirical Bayes method is proposed to identify 3\u27 and 5\u27 alternative splicing in RNA-seq data. It formulates alternative 3\u27 and 5\u27 splicing site selection as a change-point problem and provides for the first time a systematic framework to pool information across genes and integrate various information when available, in particular the useful junction read information, in order to obtain better performance
Machine learning for targeted display advertising: Transfer learning in action
This paper presents a detailed discussion of problem formulation and
data representation issues in the design, deployment, and operation of a
massive-scale machine learning system for targeted display advertising.
Notably, the machine learning system itself is deployed and has been in
continual use for years, for thousands of advertising campaigns (in
contrast to simply having the models from the system be deployed). In
this application, acquiring sufficient data for training from the ideal
sampling distribution is prohibitively expensive. Instead, data are
drawn from surrogate domains and learning tasks, and then transferred
to the target task. We present the design of this multistage transfer
learning system, highlighting the problem formulation aspects. We then
present a detailed experimental evaluation, showing that the different
transfer stages indeed each add value. We next present production
results across a variety of advertising clients from a variety of
industries, illustrating the performance of the system in use. We close
the paper with a collection of lessons learned from the work over half a
decade on this complex, deployed, and broadly used machine learning system.Statistics Working Papers Serie
ASSESSING PHOTOGRAMMETRY ARTIFICIAL INTELLIGENCE IN MONUMENTAL BUILDINGSā CRACK DIGITAL DETECTION
Natural and human-made disasters have significant impacts on monumental buildings, threatening them from being deteriorated. If no rapid consolidations took into consideration traumatic accidents would endanger the existence of precious sites. In this context, Beirut\u27s enormous 4th of August 2020 explosion damaged an estimated 640 historical monuments, many volunteers assess damages for more than a year to prevent the more crucial risk of demolitions. This research aims to assist the collaboration ability among photogrammetry science, Artificial Intelligence Model (AIM) and Architectural Coding to optimize the process for better coverage and scientific approach of data specific to the crack disorders to build a comprehensive model consolidation technique. Despite the current technological improvement, the restoration of the existing monument is a challenging and lengthy process where the actual site situation\u27s re-ignitions consume enormous time, from assessing the damages to establishing the restoration relying on human resource developments and manual drawings
Q-PPG: Energy-Efficient PPG-Based Heart Rate Monitoring on Wearable Devices
Hearth Rate (HR) monitoring is increasingly performed in wrist-worn devices using low-cost photoplethysmography (PPG) sensors. However, Motion Artifacts (MAs) caused by movements of the subject's arm affect the performance of PPG-based HR tracking. This is typically addressed coupling the PPG signal with acceleration measurements from an inertial sensor. Unfortunately, most standard approaches of this kind rely on hand-tuned parameters, which impair their generalization capabilities and their applicability to real data in the field. In contrast, methods based on deep learning, despite their better generalization, are considered to be too complex to deploy on wearable devices.In this work, we tackle these limitations, proposing a design space exploration methodology to automatically generate a rich family of deep Temporal Convolutional Networks (TCNs) for HR monitoring, all derived from a single "seed" model. Our flow involves a cascade of two Neural Architecture Search (NAS) tools and a hardware-friendly quantizer, whose combination yields both highly accurate and extremely lightweight models. When tested on the PPG-Dalia dataset, our most accurate model sets a new state-of-the-art in Mean Absolute Error. Furthermore, we deploy our TCNs on an embedded platform featuring a STM32WB55 microcontroller, demonstrating their suitability for real-time execution. Our most accurate quantized network achieves 4.41 Beats Per Minute (BPM) of Mean Absolute Error (MAE), with an energy consumption of 47.65 mJ and a memory footprint of 412 kB. At the same time, the smallest network that obtains a MAE < 8 BPM, among those generated by our flow, has a memory footprint of 1.9 kB and consumes just 1.79 mJ per inference
Recommended from our members
Location Data: Perils, Profits, Promise
Most of the modern online economy is based on websites offering free services and content in exchange for advertising access and user data. Web companies collect vast troves of data about their users in order to better target their advertisements. An important subset of this harvested data is the locations visited by users. Location data is valuable as it is a ``real world" signal compared to online behaviors: a visit to a store is a stronger signal than a visit to a website, and location data can reveal user attributes that are interesting to advertisers. The collection of this data, however, raises many concerns. Location data can reveal important attributes that users may not wish to disclose: ZIP codes can reveal income and race, visits to places of worship may allow discrimination, and insurers may want to know about trips to hospitals. The risks exist at both an individual level, with location tied to physical safety, and at a collective level, with inference about group membership a necessary step towards discrimination. In this thesis, I examine issues of privacy and fairness in the use of location data. In the first portion, I empirically demonstrate new attacks on the anonymity and privacy of users, including a theoretical basis for user identification. In the second portion, I propose and analyze new solutions for dealing with privacy, anonymity, and fairness in the collection and use of location data. In contrast to previous work which presents privacy in abstract ways or ignores the power of data aggregators, the work presented here focuses on concretely informing users and incorporates the economic incentives driving privacy and fairness concerns
Click Fraud Detection in Online and In-app Advertisements: A Learning Based Approach
Click Fraud is the fraudulent act of clicking on pay-per-click advertisements to increase a siteās revenue, to drain revenue from the advertiser, or to inflate the popularity of content on social media platforms. In-app advertisements on mobile platforms are among the most common targets for click fraud, which makes companies hesitant to advertise their products. Fraudulent clicks are supposed to be caught by ad providers as part of their service to advertisers, which is commonly done using machine learning methods. However: (1) there is a lack of research in current literature addressing and evaluating the different techniques of click fraud detection and prevention, (2) threat models composed of active learning systems (smart attackers) can mislead the training process of the fraud detection model by polluting the training data, (3) current deep learning models have significant computational overhead, (4) training data is often in an imbalanced state, and balancing it still results in noisy data that can train the classifier incorrectly, and (5) datasets with high dimensionality cause increased computational overhead and decreased classifier correctness -- while existing feature selection techniques address this issue, they have their own performance limitations. By extending the state-of-the-art techniques in the field of machine learning, this dissertation provides the following solutions: (i) To address (1) and (2), we propose a hybrid deep-learning-based model which consists of an artificial neural network, auto-encoder and semi-supervised generative adversarial network. (ii) As a solution for (3), we present Cascaded Forest and Extreme Gradient Boosting with less hyperparameter tuning. (iii) To overcome (4), we propose a row-wise data reduction method, KSMOTE, which filters out noisy data samples both in the raw data and the synthetically generated samples. (iv) For (5), we propose different column-reduction methods such as multi-time-scale Time Series analysis for fraud forecasting, using binary labeled imbalanced datasets and hybrid filter-wrapper feature selection approaches
- ā¦