11,691 research outputs found

    Social Search with Missing Data: Which Ranking Algorithm?

    Get PDF
    Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods

    Mining Helpdesk Databases For Professional Development Topic Discovery

    Get PDF
    This single-site, instrumental case study created and tested a methodological road map by which academic institutions can use text data mining techniques to derive technology skillset weaknesses and professional development topics from the site’s technical support helpdesk database. The methods employed were described in detail and applied to the helpdesk database of an independent, co-educational boarding high school in the northeastern United States. Standard text data mining procedures, including the formation of a wordlist (frequently occurring terms), and the creation and application of clustering (automated data grouping) and classification (automated data labeling) models generated meaningful and revealing themes from the helpdesk database. The results of the text mining procedures were bolstered and analyzed using human interpretation and spreadsheet-based summaries. Major findings included the discovery of four prominent technologies that warranted professional development at the site and a universally-applicable approach to undertaking successful helpdesk data mining endeavors. The case study’s conclusions included a call to action for researchers to leverage the methodology at other locations. Future data mining studies may yield practical and applicable knowledge at research sites. Shared methods, approaches, and findings from such studies will advance the field of helpdesk data mining used to glean professional development topics for the very people who have submitted technological support requests to helpdesk providers

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    A Feasibility Study of Azure Machine Learning for Sheet Metal Fabrication

    Get PDF
    The research demonstrated that sheet metal fabrication machines can utilize machine learning to gain competitive advantage. With various possible applications of machine learning, it was decided to focus on the topic of predictive maintenance. Implementation of the predictive service is accomplished with Microsoft Azure Machine Learning. The aim was to demonstrate to the stakeholders at the case company potential laying in machine learning. It was found that besides machine learning technologies being founded on sophisticated algorithms and mathematics it can still be utilized and bring benefits with moderate effort required. Significance of this study is in it demonstrating potentials of the machine learning to be used in improving operations management and especially for sheet metal fabrication machines.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format

    The Perils and Promises of Big Data Research in Information Systems

    Get PDF
    With the proliferation of “big data” and powerful analytical techniques, information systems (IS) researchers are increasingly engaged in what we label as big data research (BDR)—research based on large digital trace datasets and computationally intensive methods. The number of such research papers has been growing rapidly in the top IS journals during the last decade, with roughly 16% of papers in 2018 employing this approach. In this editorial, we propose five conjectures that articulate the potential consequences of increasing BDR prevalence for the IS field’s research goals and outputs. We discuss ways in which IS researchers may be able to better leverage big data and new analysis techniques to conduct more impactful research. Our intent with these conjectures and analyses is to stimulate debate in the IS community. Indeed, we need a productive discussion about how emerging new research methods, digital trace data, and the development of indigenous theory relate to and can support one another

    A SYSTEMATIC REVIEW OF COMPUTATIONAL METHODS IN AND RESEARCH TAXONOMY OF HOMOPHILY IN INFORMATION SYSTEMS

    Get PDF
    Homophily is both a principle for social group formation with like-minded people as well as a mechanism for social interactions. Recent years have seen a growing body of management research on homophily particularly on large-scale social media and digital platforms. However, the predominant traditional qualitative and quantitative methods employed face validity issues and/or are not well-suited for big social data. There are scant guidelines for applying computational methods to specific research domains concerning descriptive patterns, explanatory mechanisms, or predictive indicators of homophily. To fill this research gap, this paper offers a structured review of the emerging literature on computational social science approaches to homophily with a particular emphasis on their relevance, appropriateness, and importance to information systems research. We derive a research taxonomy for homophily and offer methodological reflections and recommendations to help inform future research

    Novel methods for multi-view learning with applications in cyber security

    Get PDF
    Modern data is complex. It exists in many different forms, shapes and kinds. Vectors, graphs, histograms, sets, intervals, etc.: they each have distinct and varied structural properties. Tailoring models to the characteristics of various feature representations has been the subject of considerable research. In this thesis, we address the challenge of learning from data that is described by multiple heterogeneous feature representations. This situation arises often in cyber security contexts. Data from a computer network can be represented by a graph of user authentications, a time series of network traffic, a tree of process events, etc. Each representation provides a complementary view of the holistic state of the network, and so data of this type is referred to as multi-view data. Our motivating problem in cyber security is anomaly detection: identifying unusual observations in a joint feature space, which may not appear anomalous marginally. Our contributions include the development of novel supervised and unsupervised methods, which are applicable not only to cyber security but to multi-view data in general. We extend the generalised linear model to operate in a vector-valued reproducing kernel Hilbert space implied by an operator-valued kernel function, which can be tailored to the structural characteristics of multiple views of data. This is a highly flexible algorithm, able to predict a wide variety of response types. A distinguishing feature is the ability to simultaneously identify outlier observations with respect to the fitted model. Our proposed unsupervised learning model extends multidimensional scaling to directly map multi-view data into a shared latent space. This vector embedding captures both commonalities and disparities that exist between multiple views of the data. Throughout the thesis, we demonstrate our models using real-world cyber security datasets.Open Acces
    • …
    corecore