458 research outputs found
Real Time Web Search Framework for Performing Efficient Retrieval of Data
With the rapidly growing amount of information on the internet, real-time system is one of the key strategies to cope with the information overload and to help users in finding highly relevant information. Real-time events and domain-specific information are important knowledge base references on the Web that frequently accessed by millions of users. Real-time system is a vital to product and a technique must resolve the context of challenges to be more reliable, e.g. short data life-cycles, heterogeneous user interests, strict time constraints, and context-dependent article relevance. Since real-time data have only a short time to live, real-time models have to be continuously adapted, ensuring that real-time data are always up-to-date. The focal point of this manuscript is for designing a real-time web search approach that aggregates several web search algorithms at query time to tune search results for relevancy. We learn a context-aware delegation algorithm that allows choosing the best real-time algorithms for each query request. The evaluation showed that the proposed approach outperforms the traditional models, in which it allows us to adapt the specific properties of the considered real-time resources. In the experiments, we found that it is highly relevant for most recently searched queries, consistent in its performance, and resilient to the drawbacks faced by other algorithms
FLINT: A Platform for Federated Learning Integration
Cross-device federated learning (FL) has been well-studied from algorithmic,
system scalability, and training speed perspectives. Nonetheless, moving from
centralized training to cross-device FL for millions or billions of devices
presents many risks, including performance loss, developer inertia, poor user
experience, and unexpected application failures. In addition, the corresponding
infrastructure, development costs, and return on investment are difficult to
estimate. In this paper, we present a device-cloud collaborative FL platform
that integrates with an existing machine learning platform, providing tools to
measure real-world constraints, assess infrastructure capabilities, evaluate
model training performance, and estimate system resource requirements to
responsibly bring FL into production. We also present a decision workflow that
leverages the FL-integrated platform to comprehensively evaluate the trade-offs
of cross-device FL and share our empirical evaluations of business-critical
machine learning applications that impact hundreds of millions of users.Comment: Preprint for MLSys 202
Building K-Anonymous User Cohorts with\\ Consecutive Consistent Weighted Sampling (CCWS)
To retrieve personalized campaigns and creatives while protecting user
privacy, digital advertising is shifting from member-based identity to
cohort-based identity. Under such identity regime, an accurate and efficient
cohort building algorithm is desired to group users with similar
characteristics. In this paper, we propose a scalable -anonymous cohort
building algorithm called {\em consecutive consistent weighted sampling}
(CCWS). The proposed method combines the spirit of the (-powered) consistent
weighted sampling and hierarchical clustering, so that the -anonymity is
ensured by enforcing a lower bound on the size of cohorts. Evaluations on a
LinkedIn dataset consisting of M users and ads campaigns demonstrate that
CCWS achieves substantial improvements over several hashing-based methods
including sign random projections (SignRP), minwise hashing (MinHash), as well
as the vanilla CWS
Policy Implications of User-Generated Data Network Effects
User-generated data (UGD) network effects are an exciting and novel economic force. They upset conventional market competition dynamics, and they lead to the formation of dominant data platforms with market power that spans different and seemingly unrelated markets. This article explains that UGD network effects are a blessing and a curse. They provide dominant data platforms with the opportunity to generate welfare-enhancing efficiencies as well as welfare-reducing anticompetitive harms. After exploring the economic opportunities and social threats, this article explores the implications of UGD network effects on competition policy. Drawing on traditional network effects theory, this article proposes and critically examines a host of remedial approaches for policymakers to consider. These remedies include modernized public utility-style regulation, open access policies, and adjusted standards for anti-monopolization and merger scrutiny
Fairness in Recommendation: Foundations, Methods and Applications
As one of the most pervasive applications of machine learning, recommender
systems are playing an important role on assisting human decision making. The
satisfaction of users and the interests of platforms are closely related to the
quality of the generated recommendation results. However, as a highly
data-driven system, recommender system could be affected by data or algorithmic
bias and thus generate unfair results, which could weaken the reliance of the
systems. As a result, it is crucial to address the potential unfairness
problems in recommendation settings. Recently, there has been growing attention
on fairness considerations in recommender systems with more and more literature
on approaches to promote fairness in recommendation. However, the studies are
rather fragmented and lack a systematic organization, thus making it difficult
to penetrate for new researchers to the domain. This motivates us to provide a
systematic survey of existing works on fairness in recommendation. This survey
focuses on the foundations for fairness in recommendation literature. It first
presents a brief introduction about fairness in basic machine learning tasks
such as classification and ranking in order to provide a general overview of
fairness research, as well as introduce the more complex situations and
challenges that need to be considered when studying fairness in recommender
systems. After that, the survey will introduce fairness in recommendation with
a focus on the taxonomies of current fairness definitions, the typical
techniques for improving fairness, as well as the datasets for fairness studies
in recommendation. The survey also talks about the challenges and opportunities
in fairness research with the hope of promoting the fair recommendation
research area and beyond.Comment: Accepted by ACM Transactions on Intelligent Systems and Technology
(TIST
A reference architecture for big data systems
Over dozens of years, applying new IT technologies into organizations has always been a big concern for business. Big data certainly is a new concept exciting business. To be able to access more data and empower to analysis big data requires new big data platforms. However, there still remains limited reference architecture for big data systems. In this paper, based on existing reference architecture of big data systems, we propose new high level abstract reference architecture and related reference architecture notations, that better express the overall architecture. The new reference architecture is verified using one existing case and an additional new use case
A novel approach towards skill-based search and services of Open Educational Resources
Ha, K.-H., Niemann, K., Schwertel, U., Holtkamp, P., Pirkkalainen, H., Börner, D. et al (2011). A novel approach towards skill-based search and services of Open Educational Resources. In E. Garcia-Barriocanal, A. Öztürk, & M. C. Okur (Eds.), Metadata and Semantics Research: 5th International Conference MTSR 2011 (pp. 312-323), Izmir, Turkey, October 12-14, 2011. Springer.Open educational resources (OER) have a high potential to address
the growing need for training materials in management education and training.
Today, a high number of OER in management are already available in a large
number of repositories. However, users face barriers as they have to search
repository by repository with different interfaces to retrieve the appropriate
learning content. In addition, the use of search criteria related to skills, such as
learning objectives and skill-levels is not generally supported. The European
co-funded project OpenScout addresses these barriers by intelligently
connecting leading European OER repositories and providing federated, skillbased
search and retrieval web services. On top of this content federation the
project supports users with easy-to-apply tools that will accelerate the (re-) use
of open content
- …