172 research outputs found

    User identification and community exploration via mining big personal data in online platforms

    Get PDF
    User-generated big data mining is vital important for large online platforms in terms of security, profits improvement, products recommendation and system management. Personal attributes recognition, user behavior prediction, user identification, and community detection are the most critical and interesting issues that remain as challenges in many real applications in terms of accuracy, efficiency and data security. For an online platform with tens of thousands of users, it is always vulnerable to malicious users who pose a threat to other innocent users and consume unnecessary resources, where accurate user identification is urgently required to prevent corresponding malicious attempts. Meanwhile, accurate prediction of user behavior will help large platforms provide satisfactory recommendations to users and efficiently allocate different amounts of resources to different users. In addition to individual identification, community exploration of large social networks that formed by online databases could also help managers gain knowledge of how a community evolves. And such large scale and diverse social networks can be used to validate network theories, which are previously developed from synthetic networks or small real networks. In this thesis, we study several specific cases to address some key challenges that remain in different types of large online platforms, such as user behavior prediction for cold-start users, privacy protection for user-generated data, and large scale and diverse social community analysis. In the first case, as an emerging business, online education has attracted tens of thousands users as it can provide diverse courses that can exactly satisfy whatever demands of the students. Due to the limitation of public school systems, many students pursue private supplementary tutoring for improving their academic performance. Similar to online shopping platform, online education system is also a user-product based service, where users usually have to select and purchase the courses that meet their demands. It is important to construct a course recommendation and user behavior prediction system based on user attributes or user-generated data. Item recommendation in current online shopping systems is usually based on the interactions between users and products, since most of the personal attributes are unnecessary for online shopping services, and users often provide false information during registration. Therefore, it is not possible to recommend items based on personal attributes by exploiting the similarity of attributes among users, such as education level, age, school, gender, etc. Different from most online shopping platforms, online education platforms have access to a large number of credible personal attributes since accurate personal information is important in education service, and user behaviors could be predicted with just user attribute. Moreover, previous works on learning individual attributes are based primarily on panel survey data, which ensures its credibility but lacks efficiency. Therefore, most works simply include hundreds or thousands of users in the study. With more than 200,000 anonymous K-12 students' 3-year learning data from one of the world's largest online extra-curricular education platforms, we uncover students' online learning behaviors and infer the impact of students' home location, family socioeconomic situation and attended school's reputation/rank on the students' private tutoring course participation and learning outcomes. Further analysis suggests that such impact may be largely attributed to the inequality of access to educational resources in different cities and the inequality in family socioeconomic status. Finally, we study the predictability of students' performance and behaviors using machine learning algorithms with different groups of features, showing students' online learning performance can be predicted based on personal attributes and user-generated data with MAE<10%<10\%. As mentioned above, user attributes are usually fake information in most online platforms, and online platforms are usually vulnerable of malicious users. It is very important to identify the users or verify their attributes. Many researches have used user-generated mobile phone data (which includes sensitive information) to identify diverse user attributes, such as social economic status, ages, education level, professions, etc. Most of these approaches leverage original sensitive user data to build feature-rich models that take private information as input, such as exact locations, App usages and call detailed records. However, accessing users' mobile phone raw data may violate the more and more strict private data protection policies and regulations (e.g. GDPR). We observe that appropriate statistical methods can offer an effective means to eliminate private information and preserve personal characteristics, thus enabling the identification of the user attributes without privacy concern. Typically, identifying an unfamiliar caller's profession is important to protect citizens' personal safety and property. Due to limited data protection of various popular online services in some countries such as taxi hailing or takeouts ordering, many users nowadays encounter an increasing number of phone calls from strangers. The situation may be aggravated when criminals pretend to be such service delivery staff, bringing threats to the user individuals as well as the society. Additionally, more and more people suffer from excessive digital marketing and fraud phone calls because of personal information leakage. Therefore, a real time identification of unfamiliar caller is urgently needed. We explore the feasibility of user identification with privacy-preserved user-generated mobile, and we develop CPFinder, a system which implements automatic user identification callers on end devices. The system could mainly identify four categories of users: taxi drivers, delivery and takeouts staffs, telemarketers and fraudsters, and normal users (other professions). Our evaluation over an anonymized dataset of 1,282 users with a period of 3 months in Shanghai City shows that the CPFinder can achieve an accuracy of 75+\% for multi-class classification and 92.35+\% for binary classification. In addition to the mining of personal attributes and behaviors, the community mining of a large group of people based on online big data also attracts lots of attention due to the accessibility of large scale social network in online platforms. As one of the very important branch of social network, scientific collaboration network has been studied for decades as online big publication databases are easy to access and many user attribute are available. Academic collaborations become regular and the connections among researchers become closer due to the prosperity of globalized academic communications. It has been found that many computer science conferences are closed communities in terms of the acceptance of newcomers' papers, especially are the well-regarded conferences~\cite{cabot2018cs}. However, an in-depth study on the difference in the closeness and structural features of different conferences and what caused these differences is still missing. %Also, reviewing the strong and weak tie theories, there are multifaceted influences exerted by the combination of this two types of ties in different context. More analysis is needed to determine whether the network is closed or has other properties. We envision that social connections play an increasing role in the academic society and influence the paper selection process. The influences are not only restricted within visible links, but also extended to weak ties that connect two distanced node. Previous studies of coauthor networks did not adequately consider the central role of some authors in the publication venues, such as \ac{PC} chairs of the conferences. Such people could influence the evolutionary patterns of coauthor networks due to their authorities and trust for members to select accepted papers and their core positions in the community. Thus, in addition to the ratio of newcomers' papers it would be interesting if the PC chairs' relevant metrics could be quantified to measure the closure of a conference from the perspective of old authors' papers. Additionally, the analysis of the differences among different conferences in terms of the evolution of coauthor networks and degree of closeness may disclose the formation of closed communities. Therefore, we will introduce several different outcomes due to the various structural characteristics of several typical conferences. In this paper, using the DBLP dataset of computer science publications and a PC chair dataset, we show the evidence of the existence of strong and weak ties in coauthor networks and the PC chairs' influences are also confirmed to be related with the tie strength and network structural properties. Several PC chair relevant metrics based on coauthor networks are introduced to measure the closure and efficiency of a conference.2021-10-2

    Proxy Re-encryption based Fair Trade Protocol for Digital Goods Transactions via Smart Contracts

    Full text link
    With the massive amount of digital data generated everyday, transactions of digital goods become a trend. One of the essential requirements for such transactions is fairness, which is defined as that both of the seller and the buyer get what they want, or neither. Current fair trade protocols generally involve a trusted third-party (TTP), which achieves fairness by heavily relying on the TTP's behaviors and the two parties' trust in the TTP. With the emergence of Blockchain, its decentralization and transparency make it a very good candidate to replace the TTP. In this work, we attempt to design a secure and fair protocol for digital goods transactions through smart contracts on Blockchain. To ensure security of the digital goods, we propose an advanced passive proxy re-encryption (PRE) scheme, which enables smart contracts to transfer the decryption right to a buyer after receiving his/her payment. Furthermore, based on smart contracts and the proposed passive PRE scheme, a fair trade protocol for digital goods transactions is proposed, whose fairness is guaranteed by the arbitration protocol. The proposed protocol supports Ciphertext publicity and repeatable sale, while involving less number of interactions. Comprehensive experiment results validate the feasibility and effectiveness of the proposed protocol

    Research on Precipitation Prediction Model Based on Extreme Learning Machine Ensemble

    Get PDF
    Precipitation is a significant index to measure the degree of drought and flood in a region, which directly reflects the local natural changes and ecological environment. It is very important to grasp the change characteristics and law of precipitation accurately for effectively reducing disaster loss and maintaining the stable development of a social economy. In order to accurately predict precipitation, a new precipitation prediction model based on extreme learning machine ensemble (ELME) is proposed. The integrated model is based on the extreme learning machine (ELM) with different kernel functions and supporting parameters, and the submodel with the minimum root mean square error (RMSE) is found to fit the test data. Due to the complex mechanism and factors affecting precipitation change, the data have strong uncertainty and significant nonlinear variation characteristics. The mean generating function (MGF) is used to generate the continuation factor matrix, and the principal component analysis technique is employed to reduce the dimension of the continuation matrix, and the effective data features are extracted. Finally, the ELME prediction model is established by using the precipitation data of Liuzhou city from 1951 to 2021 in June, July and August, and a comparative experiment is carried out by using ELM, long-term and short-term memory neural network (LSTM) and back propagation neural network based on genetic algorithm (GA-BP). The experimental results show that the prediction accuracy of the proposed method is significantly higher than that of other models, and it has high stability and reliability, which provides a reliable method for precipitation prediction

    The interval-valued intuitionistic fuzzy geometric choquet aggregation operator based on the generalized banzhaf index and 2-additive measure

    Get PDF
    Based on the operational laws on interval-valued intuitionistic fuzzy sets, the generalized Banzhaf interval-valued intuitionistic fuzzy geometric Choquet (GBIVIFGC) operator is proposed, which is also an interval-valued intuitionistic fuzzy value. It is worth pointing out that the GBIVIFGC operator can be seen as an extension of some geometric mean operators. Since the fuzzy measure is defined on the power set, it makes the problem exponentially complex. In order to overall reflect the interaction among elements and reduce the complexity of solving a fuzzy measure, we further introduce the GBIVIFGC operator w.r.t. 2-additive measures. Furthermore, if the information about weights of experts and attributes is incompletely known, the models of obtaining the optimal 2-additive measures on criteria set and expert set are given by using the introduced cross entropy measure and the Banzhaf index. Finally, an approach to pattern recognition and multi-criteria group decision making under interval-valued intuitionistic fuzzy environment is developed, respectively

    Acceleration for Timing-Aware Gate-Level Logic Simulation with One-Pass GPU Parallelism

    Full text link
    Witnessing the advancing scale and complexity of chip design and benefiting from high-performance computation technologies, the simulation of Very Large Scale Integration (VLSI) Circuits imposes an increasing requirement for acceleration through parallel computing with GPU devices. However, the conventional parallel strategies do not fully align with modern GPU abilities, leading to new challenges in the parallelism of VLSI simulation when using GPU, despite some previous successful demonstrations of significant acceleration. In this paper, we propose a novel approach to accelerate 4-value logic timing-aware gate-level logic simulation using waveform-based GPU parallelism. Our approach utilizes a new strategy that can effectively handle the dependency between tasks during the parallelism, reducing the synchronization requirement between CPU and GPU when parallelizing the simulation on combinational circuits. This approach requires only one round of data transfer and hence achieves one-pass parallelism. Moreover, to overcome the difficulty within the adoption of our strategy in GPU devices, we design a series of data structures and tune them to dynamically allocate and store new-generated output with uncertain scale. Finally, experiments are carried out on industrial-scale open-source benchmarks to demonstrate the performance gain of our approach compared to several state-of-the-art baselines

    Spatio-temporal variations and influencing factors of polycyclic aromatic hydrocarbons in atmospheric bulk deposition along a plain-mountain transect in western China

    Get PDF
    Ten atmospheric bulk deposition (the sum of wet and dry deposition) samplers for polycyclic aromatic hydrocarbons (PAHs) were deployed at a plain-mountain transect (namely PMT transect, from Daying to Qingping) in Chengdu Plain, West China from June 2007 to June 2008 in four consecutive seasons (about every three months). The bulk deposition fluxes of ∑15-PAHs ranged from 169.19 μg m−2 yr−1 to 978.58 μg m−2 yr−1 with geometric mean of 354.22 μg m−2 yr−1. The most prevalent PAHs were 4-ring (39.65%) and 3-ring (35.56%) PAHs. The flux values were comparable to those in rural areas. Higher fluxes of total PAHs were observed in the middle of PMT transect (SL, YX and JY, which were more urbanized than other sites). The seasonal deposition fluxes in the sampling profile indicated seasonality of the contaminant source was an important factor in controlling deposition fluxes. PAHs bulk deposition was negatively correlated with meteorological parameters (temperature, wind speed, humidity, and precipitation). No significant correlations between soil concentrations and atmospheric deposition were found along this transect. PAHs in soil samples had combined sources of coal, wood and petroleum combustion, while a simple source of coal, wood and grass combustion for bulk deposition. There were significant positive correlation relationship (p < 0.05) between annual atmospheric bulk deposition and local PAHs emission, with biomass burning as the major contribution to the total emission of PAHs. This transect acts as an important PAHs source rather than being a sink according to the ratio of deposition/emission. Mountain cold trap effect existed in this transect where the altitude was higher than 1000 m. Long-range transport had an impact on the bulk deposition in summer. And this transect was a source to Tibetan only in summer. The forward trajectory analysis showed most air masses did not undergo long-range transport due to the blocking effect of surrounding mountains. Only a few air masses (<10%) arrived at the eastern and northern region of China or farther regions via long-range transport

    Interleukin 6-regulated macrophage polarization controls atherosclerosis-associated vascular intimal hyperplasia

    Get PDF
    Vascular intimal hyperplasia (VIH) is an important stage of atherosclerosis (AS), in which macrophages not only play a critical role in local inflammation, but also transform into foam cells to participate into plaque formation, where they appear to be heterogeneous. Recently, it was shown that CD11c+ macrophages were more associated with active plaque progression. However, the molecular regulation of phenotypic changes of plaque macrophages during VIH has not been clarified and thus addressed in the current study. Since CD11c- cells were M2a-polarized anti-inflammatory macrophages, while CD11c+ cells were M1/M2b-polarized pro-inflammatory macrophages, we used bioinformatics tools to analyze the CD11c+ versus CD11c- plaque macrophages, aiming to detect the differential genes associated with M1/M2 macrophage polarization. We obtained 122 differential genes that were significantly altered in CD11c+ versus CD11c- plaque macrophages, regardless of CD11b expression. Next, hub genes were predicted in these 122 genes, from which we detected 3 candidates, interleukin 6 (Il6), Decorin (Dcn) and Tissue inhibitor matrix metalloproteinase 1 (Timp1). The effects of these 3 genes on CD11c expression as well as on the macrophage polarization were assessed in vitro, showing that only expression of Il6, but not expression of Dcn or Timp1, induced M1/M2b-like polarization in M2a macrophages. Moreover, only suppression of Il6, but not suppression of either of Dcn or Timp1, induced M2a-like polarization in M1/M2b macrophages. Furthermore, pharmaceutical suppression of Il6 attenuated VIH formation and progression of AS in a mouse model that co-applied apolipoprotein E-knockout and high-fat diet. Together, our data suggest that formation of VIH can be controlled through modulating macrophage polarization, as a promising therapeutic approach for prevent AS
    • …
    corecore