318 research outputs found
TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou
Life-long user behavior modeling, i.e., extracting a user's hidden interests
from rich historical behaviors in months or even years, plays a central role in
modern CTR prediction systems. Conventional algorithms mostly follow two
cascading stages: a simple General Search Unit (GSU) for fast and coarse search
over tens of thousands of long-term behaviors and an Exact Search Unit (ESU)
for effective Target Attention (TA) over the small number of finalists from
GSU. Although efficient, existing algorithms mostly suffer from a crucial
limitation: the \textit{inconsistent} target-behavior relevance metrics between
GSU and ESU. As a result, their GSU usually misses highly relevant behaviors
but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU,
no matter how attention is allocated, mostly deviates from the real user
interests and thus degrades the overall CTR prediction accuracy. To address
such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)},
where our Consistency-Preserved GSU (CP-GSU) adopts the identical
target-behavior relevance metric as the TA in ESU, making the two stages twins.
Specifically, to break TA's computational bottleneck and extend it from ESU to
GSU, or namely from behavior length to length , we build a
novel attention mechanism by behavior feature splitting. For the video inherent
features of a behavior, we calculate their linear projection by efficient
pre-computing \& caching strategies. And for the user-item cross features, we
compress each into a one-dimentional bias term in the attention score
calculation to save the computational cost. The consistency between two stages,
together with the effective TA-based relevance metric in CP-GSU, contributes to
significant performance gain in CTR prediction.Comment: Accepted by KDD 202
Visible relations in online communities : modeling and using social networks
The Internet represents a unique opportunity for people to interact with each other across time and space, and online communities have existed long before the Internet's solidification in everyday living. There are two inherent challenges that online communities continue to contend with: motivating participation and organizing information. An online community's success or failure rests on the content generated by its users. Specifically, users need to continually participate by contributing new content and organizing existing content for others to be attracted and retained. I propose both participation and organization can be enhanced if users have an explicit awareness of the implicit social network which results from their online interactions. My approach makes this normally ``hidden" social network visible and shows users that these intangible relations have an impact on satisfying their information needs and vice versa. That is, users can more readily situate their information needs within social processes, understanding that the value of information they receive and give is influenced and has influence on the mostly incidental relations they have formed with others. First, I describe how to model a social network within an online discussion forum and visualize the subsequent relationships in a way that motivates participation. Second, I show that social networks can also be modeled to generate recommendations of information items and that, through an interactive visualization, users can make direct adjustments to the model in order to improve their personal recommendations. I conclude that these modeling and visualization techniques are beneficial to online communities as their social capital is enhanced by "weaving" users more tightly together
CIRA annual report FY 2016/2017
Reporting period April 1, 2016-March 31, 2017
Distributed Synchronization Under Data Churn
Nowadays an increasing number of applications need to maintain local copies of remote data sources to provide services to their users. Because of the dynamic nature of the sources, an application has to synchronize its copies with remote sources constantly to provide reliable services. Instead of push-based synchronization, we focus on pull-based strategy because it doesn’t require source cooperation and has been widely adopted by existing systems.
The scalability of the pull-based synchronization comes at the expense of increased inconsistency of the copied content. We model this system under non-Poisson update/refresh processes and obtain sample-path averages of various metrics of staleness cost, generalizing previous results and studying its statistical properties.
Computing staleness requires knowledge of the inter-update distribution at the source, which can only be estimated through blind sampling – periodic downloads and comparison against previous copies. We show that all previous approaches are biased unless the observation rate tends to infinity or the update process is Poisson. To overcome these issues, we propose four new algorithms that achieve various levels of consistency, which depend on the amount of temporal information revealed by the source and capabilities of the download process.
Then we focus on applying freshness to P2P replication systems. We extend our results to several more difficult algorithms – cascaded replication, cooperative caching, and redundant querying from the clients. Surprisingly, we discover that optimal cooperation involves just a single peer and that redundant querying can hurt the ability of the system to handle load (i.e., may lead to lower scalability)
Temporal update dynamics under blind sampling
Abstract—Network applications commonly maintain local copies of remote data sources in order to provide caching, indexing, and data-mining services to their clients. Modeling performance of these systems and predicting future updates usually requires knowledge of the inter-update distribution at the source, which can only be estimated through blind sampling – periodic downloads and comparison against previous copies. In this paper, we first introduce a stochastic modeling framework for this problem, where the update and sampling processes are both renewal. We then show that all previous approaches are biased unless the observation rate tends to infinity or the update process is Poisson. To overcome these issues, we propose four new algorithms that achieve various levels of consistency, which depend on the amount of temporal information revealed by the source and capabilities of the download process. I
Cost-Aware Resource Management for Decentralized Internet Services
Decentralized network services, such as naming systems, content
distribution networks, and publish-subscribe systems, play an
increasingly critical role and are required to provide high
performance, low latency service, achieve high availability in the
presence of network and node failures, and handle a large volume
of users. Judicious utilization of expensive system resources,
such as memory space, network bandwidth, and number of machines,
is fundamental to achieving the above properties. Yet, current
network services typically rely on less-informed, heuristic-based
techniques to manage scarce resources, and often fall short of
expectations.
This thesis presents a principled approach for building high
performance, robust, and scalable network services. The key
contribution of this thesis is to show that resolving the
fundamental cost-benefit tradeoff between resource consumption and
performance through mathematical optimization is practical in
large-scale distributed systems, and enables decentralized network
services to meet efficiently system-wide performance goals. This
thesis presents a practical approach for resource management in
three stages: analytically model the cost-benefit tradeoff as a
constrained optimization problem, determine a near-optimal
resource allocation strategy on the fly, and enforce the derived
strategy through light-weight, decentralized mechanisms. It
builds on self-organizing structured overlays, which provide
failure resilience and scalability, and complements them with
stronger performance guarantees and robustness under sudden
changes in workload. This work enables applications to meet
system-wide performance targets, such as low average response
times, high cache hit rates, and small update dissemination times
with low resource consumption. Alternatively, applications can
make the maximum use of available resources, such as storage and
bandwidth, and derive large gains in performance.
I have implemented an extensible framework called Honeycomb to
perform cost-aware resource management on structured overlays
based on the above approach and built three critical network
services using it. These services consist of a new name system for
the Internet called CoDoNS that distributes data associated with
domain names, an open-access content distribution network called
CobWeb that caches web content for faster access by users, and an
online information monitoring system called Corona that notifies
users about changes to web pages. Simulations and performance
measurements from a planetary-scale deployment show that these
services provide unprecedented performance improvement over the
current state of the art
Information Technology Project Update: 2011-2012
Marshall University Information Technology (MUIT) strives to provide seamless access to global resources, a robust infrastructure and current tools to support our faculty, staff and students, and high levels of technology to compete and excel in a world characterized by constant change with increased mobility. MUIT engages in collaborative relationships within the University and with the local community acting as a trusted partner anticipating needs and responding with innovative solutions in support of the University’s mission of teaching, research, and service via extensive research and planning endeavors
Semantic discovery and reuse of business process patterns
Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse
CIRA annual report FY 2015/2016
Reporting period April 1, 2015-March 31, 2016
- …