318 research outputs found

    TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

    Full text link
    Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length 10210^2 to length 104−10510^4-10^5, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction.Comment: Accepted by KDD 202

    Visible relations in online communities : modeling and using social networks

    Get PDF
    The Internet represents a unique opportunity for people to interact with each other across time and space, and online communities have existed long before the Internet's solidification in everyday living. There are two inherent challenges that online communities continue to contend with: motivating participation and organizing information. An online community's success or failure rests on the content generated by its users. Specifically, users need to continually participate by contributing new content and organizing existing content for others to be attracted and retained. I propose both participation and organization can be enhanced if users have an explicit awareness of the implicit social network which results from their online interactions. My approach makes this normally ``hidden" social network visible and shows users that these intangible relations have an impact on satisfying their information needs and vice versa. That is, users can more readily situate their information needs within social processes, understanding that the value of information they receive and give is influenced and has influence on the mostly incidental relations they have formed with others. First, I describe how to model a social network within an online discussion forum and visualize the subsequent relationships in a way that motivates participation. Second, I show that social networks can also be modeled to generate recommendations of information items and that, through an interactive visualization, users can make direct adjustments to the model in order to improve their personal recommendations. I conclude that these modeling and visualization techniques are beneficial to online communities as their social capital is enhanced by "weaving" users more tightly together

    CIRA annual report FY 2016/2017

    Get PDF
    Reporting period April 1, 2016-March 31, 2017

    Distributed Synchronization Under Data Churn

    Get PDF
    Nowadays an increasing number of applications need to maintain local copies of remote data sources to provide services to their users. Because of the dynamic nature of the sources, an application has to synchronize its copies with remote sources constantly to provide reliable services. Instead of push-based synchronization, we focus on pull-based strategy because it doesn’t require source cooperation and has been widely adopted by existing systems. The scalability of the pull-based synchronization comes at the expense of increased inconsistency of the copied content. We model this system under non-Poisson update/refresh processes and obtain sample-path averages of various metrics of staleness cost, generalizing previous results and studying its statistical properties. Computing staleness requires knowledge of the inter-update distribution at the source, which can only be estimated through blind sampling – periodic downloads and comparison against previous copies. We show that all previous approaches are biased unless the observation rate tends to infinity or the update process is Poisson. To overcome these issues, we propose four new algorithms that achieve various levels of consistency, which depend on the amount of temporal information revealed by the source and capabilities of the download process. Then we focus on applying freshness to P2P replication systems. We extend our results to several more difficult algorithms – cascaded replication, cooperative caching, and redundant querying from the clients. Surprisingly, we discover that optimal cooperation involves just a single peer and that redundant querying can hurt the ability of the system to handle load (i.e., may lead to lower scalability)

    Temporal update dynamics under blind sampling

    Full text link
    Abstract—Network applications commonly maintain local copies of remote data sources in order to provide caching, indexing, and data-mining services to their clients. Modeling performance of these systems and predicting future updates usually requires knowledge of the inter-update distribution at the source, which can only be estimated through blind sampling – periodic downloads and comparison against previous copies. In this paper, we first introduce a stochastic modeling framework for this problem, where the update and sampling processes are both renewal. We then show that all previous approaches are biased unless the observation rate tends to infinity or the update process is Poisson. To overcome these issues, we propose four new algorithms that achieve various levels of consistency, which depend on the amount of temporal information revealed by the source and capabilities of the download process. I

    Cost-Aware Resource Management for Decentralized Internet Services

    Full text link
    Decentralized network services, such as naming systems, content distribution networks, and publish-subscribe systems, play an increasingly critical role and are required to provide high performance, low latency service, achieve high availability in the presence of network and node failures, and handle a large volume of users. Judicious utilization of expensive system resources, such as memory space, network bandwidth, and number of machines, is fundamental to achieving the above properties. Yet, current network services typically rely on less-informed, heuristic-based techniques to manage scarce resources, and often fall short of expectations. This thesis presents a principled approach for building high performance, robust, and scalable network services. The key contribution of this thesis is to show that resolving the fundamental cost-benefit tradeoff between resource consumption and performance through mathematical optimization is practical in large-scale distributed systems, and enables decentralized network services to meet efficiently system-wide performance goals. This thesis presents a practical approach for resource management in three stages: analytically model the cost-benefit tradeoff as a constrained optimization problem, determine a near-optimal resource allocation strategy on the fly, and enforce the derived strategy through light-weight, decentralized mechanisms. It builds on self-organizing structured overlays, which provide failure resilience and scalability, and complements them with stronger performance guarantees and robustness under sudden changes in workload. This work enables applications to meet system-wide performance targets, such as low average response times, high cache hit rates, and small update dissemination times with low resource consumption. Alternatively, applications can make the maximum use of available resources, such as storage and bandwidth, and derive large gains in performance. I have implemented an extensible framework called Honeycomb to perform cost-aware resource management on structured overlays based on the above approach and built three critical network services using it. These services consist of a new name system for the Internet called CoDoNS that distributes data associated with domain names, an open-access content distribution network called CobWeb that caches web content for faster access by users, and an online information monitoring system called Corona that notifies users about changes to web pages. Simulations and performance measurements from a planetary-scale deployment show that these services provide unprecedented performance improvement over the current state of the art

    Information Technology Project Update: 2011-2012

    Get PDF
    Marshall University Information Technology (MUIT) strives to provide seamless access to global resources, a robust infrastructure and current tools to support our faculty, staff and students, and high levels of technology to compete and excel in a world characterized by constant change with increased mobility. MUIT engages in collaborative relationships within the University and with the local community acting as a trusted partner anticipating needs and responding with innovative solutions in support of the University’s mission of teaching, research, and service via extensive research and planning endeavors

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    CIRA annual report FY 2015/2016

    Get PDF
    Reporting period April 1, 2015-March 31, 2016
    • …
    corecore