230 research outputs found

    Bonus Computing: An Evolution from and a Supplement to Volunteer Computing

    Get PDF
    Despite the huge success in various worldwide projects, volunteer computing also suffers from the possible lack of computing resources (one volunteered device can join one project at a time) and from the uncertain job interruptions (the volunteered device can crash or disconnect from the Internet at any time). To relieve the challenges faced by volunteer computing, we have proposed bonus computing that exploits the free quotas of public Cloud resources particularly to deal with problems composed of fine-grained, short-running, and compute-intensive tasks. In addition to explaining the loosely-coupled functional architecture and six architectural patterns of bonus computing in this paper, we also employ the Monte-Carlo approximation of Pi (π) as a use case demonstration both to facilitate understanding and to help validate its functioning mechanism. The results exhibit not only effectiveness but also multiple advantages of bonus computing, which makes it a valuable evolution from and supplement to volunteer computing

    OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

    Full text link
    How to get insights from relational data streams in a timely manner is a hot research topic. This type of data stream can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as open environment challenges for machine learning. While existing studies have been done on incremental learning for data streams, their evaluations are mostly conducted with manually partitioned datasets. Thus, a natural question is how those open environment challenges look like in real-world relational data streams and how existing incremental learning algorithms perform on real datasets. To fill this gap, we develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in relational data streams. Specifically, we investigate 55 real-world relational data streams and establish that open environment scenarios are indeed widespread in real-world datasets, which presents significant challenges for stream learning algorithms. Through benchmarks with existing incremental learning algorithms, we find that increased data quantity may not consistently enhance the model accuracy when applied in open environment scenarios, where machine learning models can be significantly compromised by missing values, distribution shifts, or anomalies in real-world data streams. The current techniques are insufficient in effectively mitigating these challenges posed by open environments. More researches are needed to address real-world open environment challenges. All datasets and code are open-sourced in https://github.com/sjtudyq/OEBench

    TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

    Full text link
    Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length 10210^2 to length 10410510^4-10^5, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction.Comment: Accepted by KDD 202

    Impacts of coagulation on the appearance time method for new particle growth rate evaluation and their corrections

    Get PDF
    The growth rate of atmospheric new particles is a key parameter that determines their survival probability of becoming cloud condensation nuclei and hence their impact on the climate. There have been several methods to estimate the new particle growth rate. However, due to the impact of coagulation and measurement uncertainties, it is still challenging to estimate the initial growth rate of new particles, especially in polluted environments with high background aerosol concentrations. In this study, we explore the influences of coagulation on the appearance time method to estimate the growth rate of sub-3 nm particles. The principle of the appearance time method and the impacts of coagulation on the retrieved growth rate are clarified via derivations. New formulae in both discrete and continuous spaces are proposed to correct for the impacts of coagulation. Aerosol dynamic models are used to test the new formulae. New particle formation in urban Beijing is used to illustrate the importance of considering the impacts of coagulation on the sub-3 nm particle growth rate and its calculation. We show that the conventional appearance time method needs to be corrected when the impacts of coagulation sink, coagulation source, and particle coagulation growth are non-negligible compared to the condensation growth. Under the simulation conditions with a constant concentration of non-volatile vapors, the corrected growth rate agrees with the theoretical growth rates. However, the uncorrected parameters, e.g., vapor evaporation and the variation in vapor concentration, may impact the growth rate obtained with the appearance time method. Under the simulation conditions with a varying vapor concentration, the average bias in the corrected 1.5-3 nm particle growth rate ranges from 6 %-44 %, and the maximum bias in the size-dependent growth rate is 150 %. During the test new particle formation event in urban Beijing, the corrected condensation growth rate of sub-3 nm particles was in accordance with the growth rate contributed by sulfuric acid condensation, whereas the conventional appearance time method overestimated the condensation growth rate of 1.5 nm particles by 80 %.Peer reviewe

    Detection of Triphenylmethane Drugs in Fish Muscle by Surface-Enhanced Raman Spectroscopy Coupled with Au-Ag Core-Shell Nanoparticles

    Get PDF
    Silver-coated gold bimetallic nanoparticles were synthesized and used as substrates for surface-enhanced Raman spectroscopy (SERS) in detecting prohibited triphenylmethane drugs (including crystal violet and malachite green) in fish muscle. The optical properties and physical properties of bimetallic nanospheres were characterized by UV-Vis spectroscopy and transmission electron microscopy. The optimal nanospheres selected had relatively uniform size (diameter: 33 ± 3 nm) with a silver layer coated on the surface of gold seed (diameter: 18 ± 2 nm). For both crystal violet and malachite green, characteristic SERS spectral features could be identified at concentration as low as 0.1 μg/L with these bimetallic nanospheres. Crystal violet and malachite green residues in fish muscle could also be detected at levels as low as 0.1 ng/g, which could meet the most restricted regulatory requirements for the limit of detection in terms of analytical methods for crystal violet or malachite green in fish muscle. This study provides a basis for applying SERS technology with bimetallic nanoparticles to the identification of trace amounts of prohibited substances in aquatic food products, and the methodology could be extended to analyses of other hazardous chemicals in complex food matrices like vegetables and meats

    Acid-Base Clusters during Atmospheric New Particle Formation in Urban Beijing

    Get PDF
    Molecular clustering is the initial step of atmospheric new particle formation (NPF) that generates numerous secondary particles. Using two online mass spectrometers with and without a chemical ionization inlet, we characterized the neutral clusters and the naturally charged ion clusters during NPF periods in urban Beijing. In ion clusters, we observed pure sulfuric acid (SA) clusters, SA-amine clusters, SA-ammonia (NH3) clusters, and SA-amine-NH3 clusters. However, only SA clusters and SA-amine clusters were observed in the neutral form. Meanwhile, oxygenated organic molecule (OOM) clusters charged by a nitrate ion and a bisulfate ion were observed in ion clusters. Acid-base clusters correlate well with the occurrence of sub-3 nm particles, whereas OOM clusters do not. Moreover, with the increasing cluster size, amine fractions in ion acid-base clusters decrease, while NH3 fractions increase. This variation results from the reduced stability differences between SA-amine clusters and SA-NH3 clusters, which is supported by both quantum chemistry calculations and chamber experiments. The lower average number of dimethylamine (DMA) molecules in atmospheric ion clusters than the saturated value from controlled SA-DMA nucleation experiments suggests that there is insufficient DMA in urban Beijing to fully stabilize large SA clusters, and therefore, other basic molecules such as NH3 play an important role.Peer reviewe
    corecore