431 research outputs found

    Towards Name Disambiguation: Relational, Streaming, and Privacy-Preserving Text Data

    Get PDF
    In the real world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesakes of one another. Such mistakes deteriorate the performance of document retrieval, web search, and more seriously, cause improper attribution of credit or blame in digital forensics. To resolve this issue, the name disambiguation task 1 is designed to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique real-life person. Existing algorithms for this task mainly suffer from the following drawbacks. First, the majority of existing solutions substantially rely on feature engineering, such as biographical feature extraction, or construction of auxiliary features from Wikipedia. However, for many scenarios, such features may be costly to obtain or unavailable in privacy sensitive domains. Instead we solve the name disambiguation task in restricted setting by leveraging only the relational data in the form of anonymized graphs. Second, most of the existing works for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task should be performed in an online streaming fashion in order to identify records of new ambiguous entities having no preexisting records. Finally, we investigate the potential disclosure risk of textual features used in name disambiguation and propose several algorithms to tackle the task in a privacy-aware scenario. In summary, in this dissertation, we present a number of novel approaches to address name disambiguation tasks from the above three aspects independently, namely relational, streaming, and privacy preserving textual data

    On relational learning and discovery in social networks: a survey

    Get PDF
    The social networking scene has evolved tremendously over the years. It has grown in relational complexities that extend a vast presence onto popular social media platforms on the internet. With the advance of sentimental computing and social complexity, relationships which were once thought to be simple have now become multi-dimensional and widespread in the online scene. This explosion in the online social scene has attracted much research attention. The main aims of this work revolve around the knowledge discovery and datamining processes of these feature-rich relations. In this paper, we provide a survey of relational learning and discovery through popular social analysis of different structure types which are integral to applications within the emerging field of sentimental and affective computing. It is hoped that this contribution will add to the clarity of how social networks are analyzed with the latest groundbreaking methods and provide certain directions for future improvements

    An evaluation of identity in online social networking: distinguishing fact from fiction

    Get PDF
    Online social networks are understood to replicate the real life connections between people. As the technology matures, more people are joining social networking communities such as MySpace (www.myspace.com) and Facebook (www.facebook.com). These online communities provide the opportunity for individuals to present themselves and maintain social interactions through their profiles. Such traces in profiles can be used as evidence in deciding the level of trust with which to imbue individuals in making access control decisions. However, online profiles have serious implications over the reality of identity disclosure. There are many reasons why someone may choose not to reveal their true self, which sometimes leads to misidentification or deception. On one hand, the structure of online profiles allows anonymity, which gives users the opportunity to create a persona that may not represent their true identity. On the other hand, we often play multiple identities in different contexts where such behaviour is acceptable. However, realizing the context for each identity representation depends on the individual. As a result, some represented identities will be essentially real, if edited for public view, some will be disguised, and others will be fictitious or humorous. The millions of social network profiles, and billions of connections between them, make it difficult to formalize an automated approach to differentiate fact from fiction in online self-described identities. How can we be sure with whom we are interacting, and whether these individuals or groups are being truthful with the online identities they present to the rest of the community? What tools and techniques can be used to gather, organize, and explore the available data for informing the level of honesty that should be entrusted to an individual? Can we verify the validity of the identity automatically, based on the available information online? We aim to evaluate identity representation online and examine how identity can be verified in a less trusted online community. We propose a personality classifier model to identify a user‟s personality (such as expressive, valid, active, positive, popular, sociable and traceable) using traces of 2.2 million profile features collected from MySpace. We use data mining techniques and social network analysis to extract significant patterns in the data and network structure, and improve the classifier during the cycle of development. We evaluate our classifier model on profiles with known identities such as „real‟ and „fake‟. Our results indicate that by utilizing people‟s online, self-reported information, personality, and their network of friends and interactions, we are able to provide evidence for validating the type of identity in a manner that is both accurate and scalable

    Standards as a driving force that influences emerging technological trajectories in the converging world of the Internet and things: An investigation of the M2M/IoT patent network

    Get PDF
    While standards are said to create windows of opportunity in facilitation of technological convergence, it is not clear how they affect technological trajectories and strategic choices of firms in the face of convergence and in the process of catch-up. There is little research on the relationship between standards and technological trajectories, particularly in the age of convergence. This paper investigates how standards shape the emerging M2M/IoT technological trajectory and influence convergence in terms of technological importance and diversity. We, firstly, found that standards are a driving force of technological convergence. The second finding is that 3GPP standards assume a crucial role in setting the boundary conditions of the M2M/IoT technological systems. Third, we identified strategic groups and strategic patents that centered around the M2M/IoT trajectory. Forth, standards serve as an important factor in the process of creating a new path for catch-up firms (e.g. Huawei). These findings make contributions to innovation and standards studies by empirically examining the relationship between technological trajectories and standards. Furthermore, they clearly cast light on ongoing cooperation and competition along the M2M/IoT trajectory, and offer practical implications for catch-up strategies

    A finder and representation system for knowledge carriers based on granular computing

    Get PDF
    In one of his publications Aristotle states ”All human beings by their nature desire to know” [Kraut 1991]. This desire is initiated the day we are born and accompanies us for the rest of our life. While at a young age our parents serve as one of the principle sources for knowledge, this changes over the course of time. Technological advances and particularly the introduction of the Internet, have given us new possibilities to share and access knowledge from almost anywhere at any given time. Being able to access and share large collections of written down knowledge is only one part of the equation. Just as important is the internalization of it, which in many cases can prove to be difficult to accomplish. Hence, being able to request assistance from someone who holds the necessary knowledge is of great importance, as it can positively stimulate the internalization procedure. However, digitalization does not only provide a larger pool of knowledge sources to choose from but also more people that can be potentially activated, in a bid to receive personalized assistance with a given problem statement or question. While this is beneficial, it imposes the issue that it is hard to keep track of who knows what. For this task so-called Expert Finder Systems have been introduced, which are designed to identify and suggest the most suited candidates to provide assistance. Throughout this Ph.D. thesis a novel type of Expert Finder System will be introduced that is capable of capturing the knowledge users within a community hold, from explicit and implicit data sources. This is accomplished with the use of granular computing, natural language processing and a set of metrics that have been introduced to measure and compare the suitability of candidates. Furthermore, are the knowledge requirements of a problem statement or question being assessed, in order to ensure that only the most suited candidates are being recommended to provide assistance

    University patenting, licensing and technology transfer: how organizational context and available resources determine performance.

    Get PDF
    The paper assesses the performance of the technology licensing offices (TLO) and technology transfer offices (TTO) which have been active in Portuguese higher education institutions. Data stemming from a survey of these entities was analyzed in successive steps through factor analysis, cluster analysis and estimation of a model using the Partial-Least Squares methodology. It is shown that the institutional nature of each of the surveyed organizations implies different behaviours and outcomes. Further it has also became clear that the type of resources and activities in the surveyed organizations determine both their “primary outcome” (patent applications and technology transfer processes) and their “final outcome” (technology licensing contracts and technology-based spin-offs). The results of this paper might be particularly relevant for other similar economies as Portugal where high-tech and knowledge-intensive industries have not been dominant.technology transfer; university-industry relationships; university patenting; university spin-offs

    Security in Data Mining- A Comprehensive Survey

    Get PDF
    Data mining techniques, while allowing the individuals to extract hidden knowledge on one hand, introduce a number of privacy threats on the other hand. In this paper, we study some of these issues along with a detailed discussion on the applications of various data mining techniques for providing security. An efficient classification technique when used properly, would allow an user to differentiate between a phishing website and a normal website, to classify the users as normal users and criminals based on their activities on Social networks (Crime Profiling) and to prevent users from executing malicious codes by labelling them as malicious. The most important applications of Data mining is the detection of intrusions, where different Data mining techniques can be applied to effectively detect an intrusion and report in real time so that necessary actions are taken to thwart the attempts of the intruder. Privacy Preservation, Outlier Detection, Anomaly Detection and PhishingWebsite Classification are discussed in this paper

    Identifying experts and authoritative documents in social bookmarking systems

    Get PDF
    Social bookmarking systems allow people to create pointers to Web resources in a shared, Web-based environment. These services allow users to add free-text labels, or “tags”, to their bookmarks as a way to organize resources for later recall. Ease-of-use, low cognitive barriers, and a lack of controlled vocabulary have allowed social bookmaking systems to grow exponentially over time. However, these same characteristics also raise concerns. Tags lack the formality of traditional classificatory metadata and suffer from the same vocabulary problems as full-text search engines. It is unclear how many valuable resources are untagged or tagged with noisy, irrelevant tags. With few restrictions to entry, annotation spamming adds noise to public social bookmarking systems. Furthermore, many algorithms for discovering semantic relations among tags do not scale to the Web. Recognizing these problems, we develop a novel graph-based Expert and Authoritative Resource Location (EARL) algorithm to find the most authoritative documents and expert users on a given topic in a social bookmarking system. In EARL’s first phase, we reduce noise in a Delicious dataset by isolating a smaller sub-network of “candidate experts”, users whose tagging behavior shows potential domain and classification expertise. In the second phase, a HITS-based graph analysis is performed on the candidate experts’ data to rank the top experts and authoritative documents by topic. To identify topics of interest in Delicious, we develop a distributed method to find subsets of frequently co-occurring tags shared by many candidate experts. We evaluated EARL’s ability to locate authoritative resources and domain experts in Delicious by conducting two independent experiments. The first experiment relies on human judges’ n-point scale ratings of resources suggested by three graph-based algorithms and Google. The second experiment evaluated the proposed approach’s ability to identify classification expertise through human judges’ n-point scale ratings of classification terms versus expert-generated data
    • 

    corecore