15 research outputs found

    A New Approach for Trust Prediction by using collaborative filtering based of Pareto dominance in Social Networks

    Get PDF
    Along with the increasing popularity of social web sites, users rely more on the trustworthiness informationfor many online activities among users.[24] However, such social network data often suffers from two problems,(1)severe data sparsity and are not able to provide users with enough information, (2)dataset’s is very large.Therefore, trust prediction has emerged as an important topic in social network research. In this paper weproposed a new approach by using collaborative filtering method and the concept of Pareto dominance. We usesPareto dominance to perform a pre-filtering process eliminating less representative users from the k-neighbourselection process while retaining the most promising ones. The results from experiments performed on FilmTrustdataset and Epinions dataset

    Thai word segmentation on social networks with time sensitivity

    Get PDF
    Social network service like Twitter is one of the important social networks that has had a huge impact on Thai culture.It has changed the behavior of many Thai people from using televisions to using computers or smart phones regularly.Thai people also share their experiences and get information such as news on social networks. With the increasing number of micro-blog messages that are originated and discussed over social networks, Thai word segmentation is becoming a compelling research issue as it is an important task in natural language processing. However, the existing Thai segmentation approaches are not designed to deal with short and noisy messages like Twitter. In this paper, we proposed Thai word segmentation on social networks approach by exploit both the local context (in tweets) and the global context from Thai Wikipedia.We evaluate our approach based on a real-world Twitter dataset. Our experiments show that the proposed approach can effectively segment Twitter messages over the baseline

    From Group Recommendations to Group Formation

    Full text link
    There has been significant recent interest in the area of group recommendations, where, given groups of users of a recommender system, one wants to recommend top-k items to a group that maximize the satisfaction of the group members, according to a chosen semantics of group satisfaction. Examples semantics of satisfaction of a recommended itemset to a group include the so-called least misery (LM) and aggregate voting (AV). We consider the complementary problem of how to form groups such that the users in the formed groups are most satisfied with the suggested top-k recommendations. We assume that the recommendations will be generated according to one of the two group recommendation semantics - LM or AV. Rather than assuming groups are given, or rely on ad hoc group formation dynamics, our framework allows a strategic approach for forming groups of users in order to maximize satisfaction. We show that the problem is NP-hard to solve optimally under both semantics. Furthermore, we develop two efficient algorithms for group formation under LM and show that they achieve bounded absolute error. We develop efficient heuristic algorithms for group formation under AV. We validate our results and demonstrate the scalability and effectiveness of our group formation algorithms on two large real data sets.Comment: 14 pages, 22 figure

    Event detection in social networks

    Get PDF

    Scalable Data Integration for Linked Data

    Get PDF
    Linked Data describes an extensive set of structured but heterogeneous datasources where entities are connected by formal semantic descriptions. In thevision of the Semantic Web, these semantic links are extended towards theWorld Wide Web to provide as much machine-readable data as possible forsearch queries. The resulting connections allow an automatic evaluation to findnew insights into the data. Identifying these semantic connections betweentwo data sources with automatic approaches is called link discovery. We derivecommon requirements and a generic link discovery workflow based on similaritiesbetween entity properties and associated properties of ontology concepts. Mostof the existing link discovery approaches disregard the fact that in times ofBig Data, an increasing volume of data sources poses new demands on linkdiscovery. In particular, the problem of complex and time-consuming linkdetermination escalates with an increasing number of intersecting data sources.To overcome the restriction of pairwise linking of entities, holistic clusteringapproaches are needed to link equivalent entities of multiple data sources toconstruct integrated knowledge bases. In this context, the focus on efficiencyand scalability is essential. For example, reusing existing links or backgroundinformation can help to avoid redundant calculations. However, when dealingwith multiple data sources, additional data quality problems must also be dealtwith. This dissertation addresses these comprehensive challenges by designingholistic linking and clustering approaches that enable reuse of existing links.Unlike previous systems, we execute the complete data integration workflowvia a distributed processing system. At first, the LinkLion portal will beintroduced to provide existing links for new applications. These links act asa basis for a physical data integration process to create a unified representationfor equivalent entities from many data sources. We then propose a holisticclustering approach to form consolidated clusters for same real-world entitiesfrom many different sources. At the same time, we exploit the semantic typeof entities to improve the quality of the result. The process identifies errorsin existing links and can find numerous additional links. Additionally, theentity clustering has to react to the high dynamics of the data. In particular,this requires scalable approaches for continuously growing data sources withmany entities as well as additional new sources. Previous entity clusteringapproaches are mostly static, focusing on the one-time linking and clustering ofentities from few sources. Therefore, we propose and evaluate new approaches for incremental entity clustering that supports the continuous addition of newentities and data sources. To cope with the ever-increasing number of LinkedData sources, efficient and scalable methods based on distributed processingsystems are required. Thus we propose distributed holistic approaches to linkmany data sources based on a clustering of entities that represent the samereal-world object. The implementation is realized on Apache Flink. In contrastto previous approaches, we utilize efficiency-enhancing optimizations for bothdistributed static and dynamic clustering. An extensive comparative evaluationof the proposed approaches with various distributed clustering strategies showshigh effectiveness for datasets from multiple domains as well as scalability on amulti-machine Apache Flink cluster
    corecore