8 research outputs found

    The Morpheus Visualization System : a general-purpose RDF results browser

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 69-72).As the amount of information available on the deep web grows, finding ways to make this information accessible is growing increasingly problematic. As some have estimated that the content in the deep web is several orders of magnitude greater than that in the shallow web, there is a clear need for an effective tool to search the deep web. While many have attempted a solution, none have been successful in effectively addressing the problem of deep web searching. The Morpheus project presents a unique approach to the problem as it integrates the deep web with the shallow web while preserving the semantics of the deep web sites it accesses. At the heart Morpheus is its visualization system which allows users to access the deep web information. The visualization system makes use of clustering algorithms, visual information techniques, as well as the semantics of the deep web sites stored by Morpheus to present deep web results to users in an effective manner. User testing was also conducted to identify problematic areas of the system during development as well as to evaluate the usability of the system's design. Results indicate that users find that the Morpheus visualization system is a highly usable and learnable interface for searching the deep web for results as well as for processing those results.by Akash G. Shah.M.Eng

    Handgrip pattern recognition

    Get PDF
    There are numerous tragic gun deaths each year. Making handguns safer by personalizing them could prevent most such tragedies. Personalized handguns, also called smart guns, are handguns that can only be fired by the authorized user. Handgrip pattern recognition holds great promise in the development of the smart gun. Two algorithms, static analysis algorithm and dynamic analysis algorithm, were developed to find the patterns of a person about how to grasp a handgun. The static analysis algorithm measured 160 subjects\u27 fingertip placements on the replica gun handle. The cluster analysis and discriminant analysis were applied to these fingertip placements, and a classification tree was built to find the fingertip pattern for each subject. The dynamic analysis algorithm collected and measured 24 subjects\u27 handgrip pressure waveforms during the trigger pulling stage. A handgrip recognition algorithm was developed to find the correct pattern. A DSP box was built to make the handgrip pattern recognition to be done in real time. A real gun was used to evaluate the handgrip recognition algorithm. The result was shown and it proves that such a handgrip recognition system works well as a prototype

    SIMULATION DE LA FILIERE TEXTILE/HABILLEMENT/DISTRIBUTION : REDUCTION DE LA COMPLEXITE EN VUE D’UNE MEILLEURE PREVISION DES VENTES

    Get PDF
    The use of the "Quick Response" logistic method for the Textile/Apparel/Distribution (TAD) Channel firms is mainly based on the data exchange between partners thanks to the EDI language.However, different constraints, due to the information treatment, occur and limit its application between the partners of the TAD Channel. These constraints are :* the lack of modelling and simulation tools of logistic flows between the different actors to check their trading available strategy,* the choice and the use of the well adapted forecasting method,* the huge number of information to treat (complexity of the time series identification).Our approach tends to submit different methods and models with respect to the previous problems. A modelling scheme of the different flows inside and between firms helps to simulate several possible supply strategies. The simulation results put on the fore the simulator advantages as regard to the information treatment and the ability to find (or not) one of the possible solution with respect to all the initial constraints and conditions.A literature review of the forecasting methods and models adapted to the short term horizon is also proposed. A comparison made on six well known forecasting methods, evaluated with several measures based on the forecasting error, shows that the more precise procedures are the methods which are able to integrate a self adaptation function of their own parameters.And, the complexity problem can be solved by the use of data analysis methods which help to reduce the global information by minimisation of its lost. Two clustering methods and a symbolic description of items are proposed to give the minimum and essential information for model identification. The use of a genetic clustering algorithm helps also to find the optimal solution.L'application d'une démarche logistique globale ou "Quick Response" au sein des entreprises de la filière Textile/Habillement/Distribution (THD) repose essentiellement sur l'échange d'informations entre partenaires, traduites sous forme de messages EDI (Echange de Données Informatisées).Cependant, différentes contraintes de traitement de l'information freinent son utilisation au sein des systèmes de gestion des partenaires de l'industrie textile telles que :•l’absence d’outils de modélisation des flux et de simulation des stratégies d’échanges commerciaux entre différents partenaires,•la contrainte d’utilisation et de choix d’un modèle de prévision,•la complexité du problème d’identification.Notre contribution consiste à proposer différents modèles et méthodes pour répondre aux différents environnements. Un cadre de modélisation des flux intra et inter entreprises est présenté afin de simuler différentes stratégies d'approvisionnements. Les exemples de simulation mettent en évidence la capacité du simulateur à traiter une information permettant de fournir une solution envisageable tout en tenant compte de l'ensemble des contraintes initiales.Une identification des méthodes et modèles de prévision adaptés à l'environnement de vente des articles textiles est également proposée. L'application de six méthodes de prévision, et leurs évaluations par des mesures différentes de l'erreur sur des données de vente réelles, a permis de mettre en valeur les capacités d'adaptation et de précision des méthodes de lissage utilisant une procédure d'autorégulation de leurs propres paramètres.Enfin, la réduction du nombre de données à traiter tout en minimisant la perte d'information est abordée. Les méthodologies de classification proposées constituent des méthodes d'analyse des données de vente des articles textiles et fournissent l'essentiel de l'information pour l'identification d'un modèle de prévision adapté. L'utilisation d'un algorithme génétique de classification, dont la capacité réside à explorer l'ensemble des solutions, a permis d'atteindre la répartition optimale globale

    Bayesian nonparametric multilevel modelling and applications

    Full text link
    Our research aims at contributing to the multilevel modeling in data analytics. We address the task of multilevel clustering, multilevel regression, and classification. We provide state of the art solution for the critical problem

    Protocols for the efficient dissemination of context-aware messages

    Get PDF
    Context-aware applications are able to react and adapt to the context of their users. This context includes, for instance, location, properties of the user or their surroundings, nearby devices, etc. Over the last years, powerful mobile devices, i.e., smartphones or tablet computers, have become an important part in many people's computing life. Most of these devices maintain a continuous high-speed network connection, allowing to provide distributed applications with an uninterrupted stream of data. Additionally, a huge number of sensors, both in these mobile devices and deployed in our surroundings, enable the creation of comprehensive context models. Such large-scale context models open up new possibilities for the development of context-aware applications by providing access to relevant context information from providers all over the world. However, until now, applications need to query the context model for relevant information or register for events or messages; it is not possible to "push" information to the mobile devices, neither from the infrastructure nor from other mobile devices. To support application developers, we propose Contextcast, a novel communication paradigm that allows for the dissemination of context-aware (or contextual) messages in a system of context-aware routers. This includes the fundamental semantics to address clients using context constraints and a reference dissemination scheme for such messages. To enable Contextcast to grow to scales similar to the context-aware systems that it is intended to be used with, we also propose a couple of optimized routing approaches. They are designed to reduce the number of maintenance messages that are necessary for the dissemination of contextual messages. One optimized routing algorithm uses coarse context information to reduce the amount of context updates propagated to routers. To this end, routers use the similarity of contexts to automatically find groups of similar clients, whose information can then be propagated as a single, coarse context. While this reduces the amount of context information to be propagated, the resulting information loss causes more messages to be forwarded, since routers no longer possess exact information to match against the constraints in contextual messages. A configurable similarity threshold allows for various trade-offs between the coarseness of the context information and the resulting additional message load. The second orthogonal routing approach relies on statistics to determine the characteristics of contexts and messages in the system. Without context knowledge, routers must assume the presence of a matching recipient and forward a message speculatively to disseminate it to all recipients. Using statistics, routers can determine how often certain messages occur and then calculate the benefit of propagating contexts corresponding to these messages. Several parameters enable an administrator to adjust how fast the system reacts to changes, depending on the observed messages and context updates. Additionally, temporal support extends Contextcast with a powerful mechanism that allows application developers and clients to address messages to certain contexts in the past or future. This includes an additional context attribute \cattr{time} and a constraint with various, easy to use temporal operators. We also propose efficient routing approaches for historical and future messages. Routing historical messages focuses on efficient routing while effectively protecting the clients' privacy, i.e., their respective context history. The routing approach for future messages delays forwarding messages until a matching context is registered, thus preventing needlessly forwarded messages

    Os modelos de exposição necessários à aquisição de publicidade no sector televisivo

    Get PDF
    O investimento publicitário no sector televisivo depende do desenvolvimento de modelos de “ratings” ou da identificação de abordagens metodológicas alternativas de previsão da exposição televisiva. Avaliámos o contributo da Análise Simbólica e do Data Mining para a construção dos modelos quantitativos de exposição, que servem de suporte à actividade de planeamento de media. Nas bases de audimetria consta informação com uma considerável capacidade explicativa da evolução dos ratings que pode alcançar os 90%. Porém, o potencial predictivo das análises univariadas e multivariadas de Regressão linear e não linear é consideravelmente menor, situando-se no máximo no intervalo 70%-80%. Foram testadas determinadas metodologias de Redes Neuronais (MLP e RBF), Árvores de Regressão (CART e CHAID), IBL, segmentação e clustering das séries temporais e modelos locais de Regressão. A construção de modelos explicativos dos comportamentos “estruturais” de consumo televisivo, permitiu verificar que no painel existe uma reduzida a moderada duplicação das audiências e que a totalidade dos comportamentos de lealdade está presente, existindo alguma tendência para a especialização das audiências. O desenvolvimento de um modelo explicativo estrutural da exposição televisiva demonstra os múltiplos contextos de exposição intencional e não intencional e fundamenta uma proposta alternativa de construção dos modelos de exposição, recorrendo a metodologias simbólicas, ao Data Mining Sequencial, Temporal, Multirrelacional e a algoritmos Bayesianos e de Regressão não linear, que é aplicável nos contextos de maior irregularidade dos dados de ratings ou quando novos conteúdos são transmitidos. Para os segmentos que apresentam uma exposição fortemente irregular é proposta a construção de Regras de associação e sequenciais que vão permitir a identificação dos suportes mais adequados à divulgação da mensagem publicitária, com a posterior construção de Redes Bayesianas e de Regras de Classificação multirrelacionais para reduzir a incerteza dos resultados em determinado período. Quando existem hábitos de consumo televisivo poderá ser suficiente recorrer ao Data Mining Sequencial, a modelos Binomiais Logísticos ou à Classificação de Bayes. No contexto de transmissão de eventos desportivos devemos recorrer às Regras Temporais que permitem identificar informação relevante nas séries temporais multivariadas de “ratings”, viabilizando uma melhor negociação com as estações televisivas.Television advertising investment depends on the development of ratings models or on the identification of alternative methodological approaches for the prediction of television exposure. In this research study, we evaluate the contribution of Symbolic Analysis and Data Mining for the construction of quantitative exposure models, which support the activity of media planning. According to the results attained, ratings databases contain information with a considerable explanatory capacity on the evolution of commercial ratings, which can reach up to 90%. However, the predictive potential of univariate and multivariate Linear Regression models and non-linear analysis is considerably lower and in general drops in the 70% -80% range. Certain methodologies were tested within the Neuronal Networks field (MLP and RBF), Regression Trees (CART and CHAID), IBL, segmentation and clustering of time series and Local Regression models. The construction of explanatory models for television “structural” consumption behaviours allowed us assessing that the panel presents reduced audience duplication ratings but all of the loyalty behaviours are present and there is a trend towards the specialisation of TV audiences. The development of a structural explanatory television exposure model demonstrates the multiple contexts of intentional and unintentional TV exposure and justifies an alternative proposal for the construction of exposure models, using symbolic methodologies, Temporal, Sequential and Multi-relational Data Mining and Bayesian algorithms and Non-Linear. Regression, which is most suited in the contexts of a higher irregularity of Ratings data or when new content is broadcasted. For audience segments which exhibit stronger irregular patterns, the construction of association or sequence rules is proposed. These rules will allow the identification of the most appropriate commercial spots for the broadcasting of the advertising message, with the subsequent construction of Bayesian Networks and Multi-Relational Regression Rules so as to reduce the uncertainty of the results over a given period. When viewers have television consumption habits, it may be sufficient to use Binomial Logistic models and Data Mining Sequential models or Bayes classification. In the context of the broadcast of sports events, there is a great difficulty in the construction of causal models. Therefore, we must turn to Temporal Rules in order to identify relevant information in the multivariate ratings time series, enabling a better negotiation with the TV stations.L’investissement au publicité au secteur de la télévision dépend du developpement des modèles de “ratings” ou de l’identification de plusieures approches alternatives de prévision de l’exposition à la télévision. On a évalué le contribut de l’Analyse Symbolique et du Data Mining à fin de créer des modèles quantitatifs d’exposition qui supportent l’activité de planification du media. Aux bases de l’audiométrie on trouve l’information avec une capacité explicatif considérable sur l’évolution des ratings qui peut atteindre un pourcentage de 90%. Cependant, le potentiel de pronostiquer les analyses univariées et multivariées da la Régression Linéaire et non Linéaire est considerablement inférieur et se situe dans un intervalle 70%-80% maximum. On a examiné certaines méthodologies des réseaux de neurones (MLP et RBF), arbres de régression (CART e CHAID), IBL, segmentation et clustering des séries chronologiques et des modèles locales de Régression. La création des modèles explicatifs des comportements “structurals” de consommation de télévision a montré qui au panneau existe une duplication des audiences faible à modérée et que tous les comportements de loyauté sont présents et qu’il ya une certaine tendance pour la spécialization des audiences. Le développement d’un modèle explicatif structural de l’exposition à la télévision montre les contextes variés de l’exposition intentionnel et non intentionnel et soutient une suggestion alternative de création des modèles de exposition, donnant la possibilité de utilization des méthodologies symboliques, le Data Mining Séquentiel, Temporel, Multirrelacional et algorithmes bayésiens et de Régression non linéaire, qui sont appliqués dans les contextes plus irrégulières des ratings ou quand les nouveaux contenus sont transmis. Pour les segments qui présentent une exposition beaucoup irréguliere on propose la création des règles de association et sequentielles qui permettront l’identification des supports plus convenables à la divulgation du message publicitaire, avec la création en arrière des règles bayésiens et des règles de classification multirrelationals à fin de réduire l’incertitude des résultats dans un période determiné. Quand on existe les habitudes de consommation de la télévision sera suffissant utilizer le Data Mining Sequentiel, les modèles Logistiques Binominales ou la classification de Bayes Au contexte de transmission des évenements sportifs on doit appliquer les Règles Temporelles qui identifient l’information plus important dans les séries chronologiques multivariées des “ratings”, et qui permet une meilleure négociation avec les chaînes de télévision

    Os modelos de exposição necessários à aquisição de publicidade no sector televisivo

    Get PDF
    O investimento publicitário no sector televisivo depende do desenvolvimento de modelos de “ratings” ou da identificação de abordagens metodológicas alternativas de previsão da exposição televisiva. Avaliámos o contributo da Análise Simbólica e do Data Mining para a construção dos modelos quantitativos de exposição, que servem de suporte à actividade de planeamento de media. Nas bases de audimetria consta informação com uma considerável capacidade explicativa da evolução dos ratings que pode alcançar os 90%. Porém, o potencial predictivo das análises univariadas e multivariadas de Regressão linear e não linear é consideravelmente menor, situando-se no máximo no intervalo 70%-80%. Foram testadas determinadas metodologias de Redes Neuronais (MLP e RBF), Árvores de Regressão (CART e CHAID), IBL, segmentação e clustering das séries temporais e modelos locais de Regressão. A construção de modelos explicativos dos comportamentos “estruturais” de consumo televisivo, permitiu verificar que no painel existe uma reduzida a moderada duplicação das audiências e que a totalidade dos comportamentos de lealdade está presente, existindo alguma tendência para a especialização das audiências. O desenvolvimento de um modelo explicativo estrutural da exposição televisiva demonstra os múltiplos contextos de exposição intencional e não intencional e fundamenta uma proposta alternativa de construção dos modelos de exposição, recorrendo a metodologias simbólicas, ao Data Mining Sequencial, Temporal, Multirrelacional e a algoritmos Bayesianos e de Regressão não linear, que é aplicável nos contextos de maior irregularidade dos dados de ratings ou quando novos conteúdos são transmitidos. Para os segmentos que apresentam uma exposição fortemente irregular é proposta a construção de Regras de associação e sequenciais que vão permitir a identificação dos suportes mais adequados à divulgação da mensagem publicitária, com a posterior construção de Redes Bayesianas e de Regras de Classificação multirrelacionais para reduzir a incerteza dos resultados em determinado período. Quando existem hábitos de consumo televisivo poderá ser suficiente recorrer ao Data Mining Sequencial, a modelos Binomiais Logísticos ou à Classificação de Bayes. No contexto de transmissão de eventos desportivos devemos recorrer às Regras Temporais que permitem identificar informação relevante nas séries temporais multivariadas de “ratings”, viabilizando uma melhor negociação com as estações televisivas.Television advertising investment depends on the development of ratings models or on the identification of alternative methodological approaches for the prediction of television exposure. In this research study, we evaluate the contribution of Symbolic Analysis and Data Mining for the construction of quantitative exposure models, which support the activity of media planning. According to the results attained, ratings databases contain information with a considerable explanatory capacity on the evolution of commercial ratings, which can reach up to 90%. However, the predictive potential of univariate and multivariate Linear Regression models and non-linear analysis is considerably lower and in general drops in the 70% -80% range. Certain methodologies were tested within the Neuronal Networks field (MLP and RBF), Regression Trees (CART and CHAID), IBL, segmentation and clustering of time series and Local Regression models. The construction of explanatory models for television “structural” consumption behaviours allowed us assessing that the panel presents reduced audience duplication ratings but all of the loyalty behaviours are present and there is a trend towards the specialisation of TV audiences. The development of a structural explanatory television exposure model demonstrates the multiple contexts of intentional and unintentional TV exposure and justifies an alternative proposal for the construction of exposure models, using symbolic methodologies, Temporal, Sequential and Multi-relational Data Mining and Bayesian algorithms and Non-Linear. Regression, which is most suited in the contexts of a higher irregularity of Ratings data or when new content is broadcasted. For audience segments which exhibit stronger irregular patterns, the construction of association or sequence rules is proposed. These rules will allow the identification of the most appropriate commercial spots for the broadcasting of the advertising message, with the subsequent construction of Bayesian Networks and Multi-Relational Regression Rules so as to reduce the uncertainty of the results over a given period. When viewers have television consumption habits, it may be sufficient to use Binomial Logistic models and Data Mining Sequential models or Bayes classification. In the context of the broadcast of sports events, there is a great difficulty in the construction of causal models. Therefore, we must turn to Temporal Rules in order to identify relevant information in the multivariate ratings time series, enabling a better negotiation with the TV stations.L’investissement au publicité au secteur de la télévision dépend du developpement des modèles de “ratings” ou de l’identification de plusieures approches alternatives de prévision de l’exposition à la télévision. On a évalué le contribut de l’Analyse Symbolique et du Data Mining à fin de créer des modèles quantitatifs d’exposition qui supportent l’activité de planification du media. Aux bases de l’audiométrie on trouve l’information avec une capacité explicatif considérable sur l’évolution des ratings qui peut atteindre un pourcentage de 90%. Cependant, le potentiel de pronostiquer les analyses univariées et multivariées da la Régression Linéaire et non Linéaire est considerablement inférieur et se situe dans un intervalle 70%-80% maximum. On a examiné certaines méthodologies des réseaux de neurones (MLP et RBF), arbres de régression (CART e CHAID), IBL, segmentation et clustering des séries chronologiques et des modèles locales de Régression. La création des modèles explicatifs des comportements “structurals” de consommation de télévision a montré qui au panneau existe une duplication des audiences faible à modérée et que tous les comportements de loyauté sont présents et qu’il ya une certaine tendance pour la spécialization des audiences. Le développement d’un modèle explicatif structural de l’exposition à la télévision montre les contextes variés de l’exposition intentionnel et non intentionnel et soutient une suggestion alternative de création des modèles de exposition, donnant la possibilité de utilization des méthodologies symboliques, le Data Mining Séquentiel, Temporel, Multirrelacional et algorithmes bayésiens et de Régression non linéaire, qui sont appliqués dans les contextes plus irrégulières des ratings ou quand les nouveaux contenus sont transmis. Pour les segments qui présentent une exposition beaucoup irréguliere on propose la création des règles de association et sequentielles qui permettront l’identification des supports plus convenables à la divulgation du message publicitaire, avec la création en arrière des règles bayésiens et des règles de classification multirrelationals à fin de réduire l’incertitude des résultats dans un période determiné. Quand on existe les habitudes de consommation de la télévision sera suffissant utilizer le Data Mining Sequentiel, les modèles Logistiques Binominales ou la classification de Bayes Au contexte de transmission des évenements sportifs on doit appliquer les Règles Temporelles qui identifient l’information plus important dans les séries chronologiques multivariées des “ratings”, et qui permet une meilleure négociation avec les chaînes de télévision
    corecore