360 research outputs found

    ReferralWeb--a resource location system guided by personal relations

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 47-[48]).by Mehul A. ShahM.Eng

    Web and Semantic Web Query Languages

    Get PDF
    A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This article introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced using common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion

    Utilising Target Adjacency Information for Multi-target Prediction

    Get PDF
    In this paper, we explored how information on the cost of misprediction can be used to train supervised learners for multi-target prediction (MTP). In particular, our work uses depression, anxiety and stress severity level prediction as the case study. MTP describes proposals which results require the concurrent prediction of multiple targets. There is an increasing number of practical applications that involve MTP. They include global weather forecasting, social network users’ interaction and the thriving of different species in a single habitat. Recent work in MTP suggests the utilization of “side information” to improve prediction performance. Side information has been used in other areas, such as recommender systems, information retrieval and computer vision. Existing side information includes matrices, rules, feature representations, etc. In this work, we review very recent work on MTP with side information and propose the use of knowledge on the cost of incorrect prediction as side information. We apply this notion in predicting depression, anxiety and stress of 270,322 anonymous respondents to the DASS-21 psychometric scale in Malaysia. Predicting depression, anxiety and stress based on the DASS-21 fit an MTP problem. Often, a patient experiences anxiety as well as depression at the same time. This is not unusual since it has been discovered that both tend to co-exist at different degrees depending on a patient’s experience. By using existing machine learning algorithms to predict the severity levels of each category (i.e., depression, anxiety and stress), the result shows improved precision with the use of cost matrix as side information in MTP

    Analysis of nutrition data by means of a matrix factorization method

    Get PDF
    We present a factorization framework to analyze the data of a regression learning task with two peculiarities. First, inputs can be split into two parts that represent semantically significant entities. Second, the performance of regressors is very low. The basic idea of the approach presented here is to try to learn the ordering relations of the target variable instead of its exact value. Each part of the input is mapped into a common Euclidean space in such a way that the distance in the common space is the representation of the interaction of both parts of the input. The factorization approach obtains reliable models from which it is possible to compute a ranking of the features according to their responsibility in the variation of the target variable. Additionally, the Euclidean representation of data provides a visualization where metric properties have a clear semantics. We illustrate the approach with a case study: the analysis of a dataset about the variations of Body Mass Index for Age of children after a Food Aid Program deployed in poor rural communities in Southern México. In this case, the two parts of inputs are the vectorial representation of children and their diets. In addition to discovering latent information, the mapping of inputs allows us to visualize children and diets in a common metric spac

    Reporting serendipity in biomedical research literature : a mixed-methods analysis

    Get PDF
    As serendipity is an unexpected, anomalous, or inconsistent observation that culminates in a valuable, positive outcome (McCay-Peet & Toms, 2018, pp. 4–6), it can be inferred that effectively supporting serendipity will result in a greater incidence of the desired positive outcomes (McCay-Peet & Toms, 2018, p. 22). In order to effectively support serendipity, however, we must first understand the overall process or experience of serendipity and the factors influencing its attainment. Currently, our understanding and models of the serendipitous experience are based almost exclusively on example collections, compilations of examples of serendipity that authors and researchers have collected as they encounter them (Gries, 2009, p. 9). Unfortunately, reliance on such collections can lead to an over-representation of more vivid and dramatic examples and possible underrepresentation of more common, but less noticeable, exemplars. By applying the principles of corpus research, which involves electronic compilation of examples in existing documents, we can alleviate this problem and obtain a more balanced and representative understanding of serendipitous experiences (Gries, 2009). This three-article dissertation describes the phenomenon of serendipity, as it is recorded in biomedical research articles indexed in the PubMed Central database, in a way that might inform the development of machine compilation systems for the support of serendipity. Within this study, serendipity is generally defined as a process or experience that begins with encountering some type of information. That information is subsequently analyzed and further pursued by an individual with related knowledge, skills, and understanding, and, finally, allows them to realize a valuable outcome. The information encounter that initiates the serendipity experience exhibits qualities of unexpectedness as well as value for the user. In this mixed method study, qualitative content analysis, supported by natural language processing, and concurrent with statistical analysis, is applied to gain a robust understanding of the phenomenon of serendipity that may reveal features of serendipitous experience useful to the development of recommender system algorithms.Includes bibliographical reference

    Contributions to probabilistic non-negative matrix factorization - Maximum marginal likelihood estimation and Markovian temporal models

    Get PDF
    Non-negative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a non-negative data matrix (i.e., with non-negative entries) as the product of twonon-negative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This low-rank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of non-negative factors. This general framework, coined probabilistic NMF, encompasses many well-known latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemi-Bayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients sample-wise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data
    corecore