353 research outputs found

    Towards trust-aware recommendations in social networks

    Get PDF
    Recommender systems have been strongly researched within the last decade. With the emergence and popularization of social networks a new fi eld has been opened for social recommendations. Introducing new concepts such as trust and considering the network topology are some of the new strategies that recommender systems have to take into account in order to adapt their techniques to these new scenarios. In this thesis a simple model for recommendations on twitter is developed to apply some of the known techniques and explore how well the state of the art does in a real scenario. The thesis can serve as a basis for further social recommender system research

    Development of a simulation tool for measurements and analysis of simulated and real data to identify ADLs and behavioral trends through statistics techniques and ML algorithms

    Get PDF
    openCon una popolazione di anziani in crescita, il numero di soggetti a rischio di patologia è in rapido aumento. Molti gruppi di ricerca stanno studiando soluzioni pervasive per monitorare continuamente e discretamente i soggetti fragili nelle loro case, riducendo i costi sanitari e supportando la diagnosi medica. Comportamenti anomali durante l'esecuzione di attività di vita quotidiana (ADL) o variazioni sulle tendenze comportamentali sono di grande importanza.With a growing population of elderly people, the number of subjects at risk of pathology is rapidly increasing. Many research groups are studying pervasive solutions to continuously and unobtrusively monitor fragile subjects in their homes, reducing health-care costs and supporting the medical diagnosis. Anomalous behaviors while performing activities of daily living (ADLs) or variations on behavioral trends are of great importance. To measure ADLs a significant number of parameters need to be considering affecting the measurement such as sensors and environment characteristics or sensors disposition. To face the impossibility to study in the real context the best configuration of sensors able to minimize costs and maximize accuracy, simulation tools are being developed as powerful means. This thesis presents several contributions on this topic. In the following research work, a study of a measurement chain aimed to measure ADLs and represented by PIRs sensors and ML algorithm is conducted and a simulation tool in form of Web Application has been developed to generate datasets and to simulate how the measurement chain reacts varying the configuration of the sensors. Starting from eWare project results, the simulation tool has been thought to provide support for technicians, developers and installers being able to speed up analysis and monitoring times, to allow rapid identification of changes in behavioral trends, to guarantee system performance monitoring and to study the best configuration of the sensors network for a given environment. The UNIVPM Home Care Web App offers the chance to create ad hoc datasets related to ADLs and to conduct analysis thanks to statistical algorithms applied on data. To measure ADLs, machine learning algorithms have been implemented in the tool. Five different tasks have been identified. To test the validity of the developed instrument six case studies divided into two categories have been considered. To the first category belong those studies related to: 1) discover the best configuration of the sensors keeping environmental characteristics and user behavior as constants; 2) define the most performant ML algorithms. The second category aims to proof the stability of the algorithm implemented and its collapse condition by varying user habits. Noise perturbation on data has been applied to all case studies. Results show the validity of the generated datasets. By maximizing the sensors network is it possible to minimize the ML error to 0.8%. Due to cost is a key factor in this scenario, the fourth case studied considered has shown that minimizing the configuration of the sensors it is possible to reduce drastically the cost with a more than reasonable value for the ML error around 11.8%. Results in ADLs measurement can be considered more than satisfactory.INGEGNERIA INDUSTRIALEopenPirozzi, Michel

    Query Expansion Techniques for Enterprise Search

    Get PDF
    Although web search remains an active research area, interest in enterprise search has waned. This is despite the fact that the market for enterprise search applications is expected to triple within the next six years, and that knowledge workers spend an average of 1.6 to 2.5 hours each day searching for information. To improve search relevancy, and hence reduce this time, an enterprise- focused application must be able to handle the unique queries and constraints of the enterprise environment. The goal of this thesis research was to develop, implement, and study query expansion techniques that are most effective at improving relevancy in enterprise search. The case-study instrument used in this investigation was a custom Apache Solr-based search application deployed at a local medium-sized manufacturing company. It was hypothesized that techniques specifically tailored to the enterprise search environment would prove most effective. Query expansion techniques leveraging entity recognition, alphanumeric term identification, intent classification, collection enrichment, and word vectors were implemented and studied using real enterprise data. They were evaluated against a test set of queries developed using relevance survey results from multiple users, using standard relevancy metrics such as normalized discounted cumulative gain (nDCG). Comprehensive analysis revealed that the current implementation of the collection enrichment and word vector query expansion modules did not demonstrate meaningful improvements over the baseline methods. However, the entity recognition, alphanumeric term identification, and query intent classification modules produced meaningful and statistically significant improvements in relevancy, allowing us to accept the hypothesis

    Concept graphs: Applications to biomedical text categorization and concept extraction

    Get PDF
    As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification. This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics

    Short Text Categorization using World Knowledge

    Get PDF
    The content of the World Wide Web is drastically multiplying, and thus the amount of available online text data is increasing every day. Today, many users contribute to this massive global network via online platforms by sharing information in the form of a short text. Such an immense amount of data covers subjects from all the existing domains (e.g., Sports, Economy, Biology, etc.). Further, manually processing such data is beyond human capabilities. As a result, Natural Language Processing (NLP) tasks, which aim to automatically analyze and process natural language documents have gained significant attention. Among these tasks, due to its application in various domains, text categorization has become one of the most fundamental and crucial tasks. However, the standard text categorization models face major challenges while performing short text categorization, due to the unique characteristics of short texts, i.e., insufficient text length, sparsity, ambiguity, etc. In other words, the conventional approaches provide substandard performance, when they are directly applied to the short text categorization task. Furthermore, in the case of short text, the standard feature extraction techniques such as bag-of-words suffer from limited contextual information. Hence, it is essential to enhance the text representations with an external knowledge source. Moreover, the traditional models require a significant amount of manually labeled data and obtaining labeled data is a costly and time-consuming task. Therefore, although recently proposed supervised methods, especially, deep neural network approaches have demonstrated notable performance, the requirement of the labeled data remains the main bottleneck of these approaches. In this thesis, we investigate the main research question of how to perform \textit{short text categorization} effectively \textit{without requiring any labeled data} using knowledge bases as an external source. In this regard, novel short text categorization models, namely, Knowledge-Based Short Text Categorization (KBSTC) and Weakly Supervised Short Text Categorization using World Knowledge (WESSTEC) have been introduced and evaluated in this thesis. The models do not require any hand-labeled data to perform short text categorization, instead, they leverage the semantic similarity between the short texts and the predefined categories. To quantify such semantic similarity, the low dimensional representation of entities and categories have been learned by exploiting a large knowledge base. To achieve that a novel entity and category embedding model has also been proposed in this thesis. The extensive experiments have been conducted to assess the performance of the proposed short text categorization models and the embedding model on several standard benchmark datasets

    Knowledge representation for data integration and exploration in translational medicine

    Get PDF
    Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2014Biomedical research has evolved into a data-intensive science, where prodigious amounts of data can be collected from disparate resources at any time. However, the value of data can only be leveraged through its analysis, which ultimately results in the acquisition of knowledge. In domains such as translational medicine, data integration and interoperability are key requirements for an efficient data analysis. The semantic web and its technologies have been proposed as a solution for the problems of data integration and interoperability. One of the tools of the semantic web is the representation of domain knowledge with ontologies, which provide a formal description of that knowledge in a structured manner. The thesis underlying this work is that the representation of domain knowledge in ontologies can be exploited to improve the current knowledge about a disease, as well as improve the diagnosis and prognosis processes. The following two objectives were defined to validate this thesis: 1) to create a semantic model that represents and integrates the heterogeneous sources of data necessary for the characterization of a disease and of its prognosis process, exploiting semantic web technologies and existing ontologies; 2) to develop a methodology that exploits the knowledge represented in existing ontologies to improve the results of knowledge exploration methods obtained with translational medicine datasets. The first objective was accomplished and resulting in the following contributions: the methodology for the creation of a semantic model in the OWL language; a semantic model of the disease hypertrophic cardiomyopathy; and a review on the exploitation of semantic web resources in translation medicine systems. In the case of the second objective, also accomplished, the contributions are the adaptation of a standard enrichment analysis to use data from patients; and the application of the adapted enrichment analysis to improve the predictions made with a translational medicine dataset.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/65257/2009

    An empirical study on credit evaluation of SMEs based on detailed loan data

    Get PDF
    Small and micro-sized Enterprises (SMEs) are an important part of Chinese economic system.The establishment of credit evaluating model of SMEs can effectively help financial intermediaries to reveal credit risk of enterprises and reduce the cost of enterprises information acquisition. Besides it can also serve as a guide to investors which also helps companies with good credit. This thesis conducts an empirical study based on loan data from a Chinese bank of loans granted to SMEs. The study aims to develop a data-driven model that can accurately predict if a given loan has an acceptable risk from the bank’s perspective, or not. Furthermore, we test different methods to deal with the problem of unbalanced class and uncredible sample. Lastly, the importance of variables is analyzed. Remaining Unpaid Principal, Floating Interest Rate, Time Until Maturity Date, Real Interest Rate, Amount of Loan all have significant effects on the final result of the prediction.The main contribution of this study is to build a credit evaluation model of small and micro enterprises, which not only helps commercial banks accurately identify the credit risk of small and micro enterprises, but also helps to overcome creditdifficulties of small and micro enterprises.As pequenas e microempresas constituem uma parte importante do sistema económico chinês. A definição de um modelo de avaliação de crédito para estas empresas pode ajudar os intermediários financeiros a revelarem o risco de crédito das empresas e a reduzirem o custo de aquisição de informação das empresas. Além disso, pode igualmente servir como guia para os investidores, auxiliando também empresas com bom crédito. Na presente tese apresenta-se um estudo empírico baseado em dados de um banco chinês relativos a empréstimos concedidos a pequenas e microempresas. O estudo visa desenvolver um modelo empírico que possa prever com precisão se um determinado empréstimo tem um risco aceitável do ponto de vista do banco, ou não. Além disso, são efetuados testes com diferentes métodos que permitem lidar com os problemas de classes de dados não balanceadas e de amostras que não refletem o problema real a modelar. Finalmente, é analisada a importância relativa das variáveis. O montante da dívida por pagar, a taxa de juro variável, o prazo até a data de vencimento, a taxa de juro real, o montante do empréstimo, todas têm efeitos significativos no resultado final da previsão. O principal contributo deste estudo é, assim, a construção de um modelo de avaliação de crédito que permite apoiar os bancos comerciais a identificarem com precisão o risco de crédito das pequenas e micro empresas e ajudar também estas empresas a superarem as suas dificuldades de crédito

    Web Page Classification and Hierarchy Adaptation

    Get PDF
    • …
    corecore