48 research outputs found

    Processing Fuzzy Relational Queries Using Fuzzy Views

    Get PDF
    International audienceThis paper proposes two original approaches to the processing of fuzzy queries in a relational database context. The general idea is to use views, either materialized or not. In the first case, materialized views are used to store the satisfaction degrees related to user-defined fuzzy predicates, instead of calculating them at runtime by means of user functions embedded in the query (which induces an important overhead). In the second case, abstract views are used to efficiently access the tuples that belong to the α-cut of the query result, by means of a derived Boolean selection condition

    A CRISP-DM-based Methodology for Assessing Agent-based Simulation Models using Process Mining

    Get PDF
    Agent-based simulation (ABS) models are potent tools for analyzing complex systems. However, understanding and validating ABS models can be a significant challenge. To address this challenge, cutting-edge data-driven techniques offer sophisticated capabilities for analyzing the outcomes of ABS models. One such technique is process mining, which encompasses a range of methods for discovering, monitoring, and enhancing processes by extracting knowledge from event logs. However, applying process mining to event logs derived from ABSs is not trivial, and deriving meaningful insights from the resulting process models adds an additional layer of complexity. Although process mining is invaluable in extracting insights from ABS models, there is a lack of comprehensive methodological guidance for its application in ABS evaluation in the research landscape. In this paper, we propose a methodology, based on the CRoss-Industry Standard Process for Data Mining (CRISP-DM) methodology, to assess ABS models using process mining techniques. We incorporate process mining techniques into the stages of the CRISP-DM methodology, facilitating the analysis of ABS model behaviors and their underlying processes. We demonstrate our methodology using an established agent-based model, Schelling model of segregation. Our results show that our proposed methodology can effectively assess ABS models through produced event logs, potentially paving the way for enhanced agent-based model validity and more insightful decision-making

    Search beyond traditional probabilistic information retrieval

    Get PDF
    "This thesis focuses on search beyond probabilistic information retrieval. Three ap- proached are proposed beyond the traditional probabilistic modelling. First, term associ- ation is deeply examined. Term association considers the term dependency using a factor analysis based model, instead of treating each term independently. Latent factors, con- sidered the same as the hidden variables of ""eliteness"" introduced by Robertson et al. to gain understanding of the relation among term occurrences and relevance, are measured by the dependencies and occurrences of term sequences and subsequences. Second, an entity-based ranking approach is proposed in an entity system named ""EntityCube"" which has been released by Microsoft for public use. A summarization page is given to summarize the entity information over multiple documents such that the truly relevant entities can be highly possibly searched from multiple documents through integrating the local relevance contributed by proximity and the global enhancer by topic model. Third, multi-source fusion sets up a meta-search engine to combine the ""knowledge"" from different sources. Meta-features, distilled as high-level categories, are deployed to diversify the baselines. Three modified fusion methods are employed, which are re- ciprocal, CombMNZ and CombSUM with three expanded versions. Through extensive experiments on the standard large-scale TREC Genomics data sets, the TREC HARD data sets and the Microsoft EntityCube Web collections, the proposed extended models beyond probabilistic information retrieval show their effectiveness and superiority.

    Data-driven conceptual modeling: how some knowledge drivers for the enterprise might be mined from enterprise data

    Get PDF
    As organizations perform their business, they analyze, design and manage a variety of processes represented in models with different scopes and scale of complexity. Specifying these processes requires a certain level of modeling competence. However, this condition does not seem to be balanced with adequate capability of the person(s) who are responsible for the task of defining and modeling an organization or enterprise operation. On the other hand, an enterprise typically collects various records of all events occur during the operation of their processes. Records, such as the start and end of the tasks in a process instance, state transitions of objects impacted by the process execution, the message exchange during the process execution, etc., are maintained in enterprise repositories as various logs, such as event logs, process logs, effect logs, message logs, etc. Furthermore, the growth rate in the volume of these data generated by enterprise process execution has increased manyfold in just a few years. On top of these, models often considered as the dashboard view of an enterprise. Models represents an abstraction of the underlying reality of an enterprise. Models also served as the knowledge driver through which an enterprise can be managed. Data-driven extraction offers the capability to mine these knowledge drivers from enterprise data and leverage the mined models to establish the set of enterprise data that conforms with the desired behaviour. This thesis aimed to generate models or knowledge drivers from enterprise data to enable some type of dashboard view of enterprise to provide support for analysts. The rationale for this has been started as the requirement to improve an existing process or to create a new process. It was also mentioned models can also serve as a collection of effectors through which an organization or an enterprise can be managed. The enterprise data refer to above has been identified as process logs, effect logs, message logs, and invocation logs. The approach in this thesis is to mine these logs to generate process, requirement, and enterprise architecture models, and how goals get fulfilled based on collected operational data. The above a research question has been formulated as whether it is possible to derive the knowledge drivers from the enterprise data, which represent the running operation of the enterprise, or in other words, is it possible to use the available data in the enterprise repository to generate the knowledge drivers? . In Chapter 2, review of literature that can provide the necessary background knowledge to explore the above research question has been presented. Chapter 3 presents how process semantics can be mined. Chapter 4 suggest a way to extract a requirements model. The Chapter 5 presents a way to discover the underlying enterprise architecture and Chapter 6 presents a way to mine how goals get orchestrated. Overall finding have been discussed in Chapter 7 to derive some conclusions

    Personalized information retrieval based on time-sensitive user profile

    Get PDF
    Les moteurs de recherche, largement utilisĂ©s dans diffĂ©rents domaines, sont devenus la principale source d'information pour de nombreux utilisateurs. Cependant, les SystĂšmes de Recherche d'Information (SRI) font face Ă  de nouveaux dĂ©fis liĂ©s Ă  la croissance et Ă  la diversitĂ© des donnĂ©es disponibles. Un SRI analyse la requĂȘte soumise par l'utilisateur et explore des collections de donnĂ©es de nature non structurĂ©e ou semi-structurĂ©e (par exemple : texte, image, vidĂ©o, page Web, etc.) afin de fournir des rĂ©sultats qui correspondent le mieux Ă  son intention et ses intĂ©rĂȘts. Afin d'atteindre cet objectif, au lieu de prendre en considĂ©ration l'appariement requĂȘte-document uniquement, les SRI s'intĂ©ressent aussi au contexte de l'utilisateur. En effet, le profil utilisateur a Ă©tĂ© considĂ©rĂ© dans la littĂ©rature comme l'Ă©lĂ©ment contextuel le plus important permettant d'amĂ©liorer la pertinence de la recherche. Il est intĂ©grĂ© dans le processus de recherche d'information afin d'amĂ©liorer l'expĂ©rience utilisateur en recherchant des informations spĂ©cifiques. Comme le facteur temps a gagnĂ© beaucoup d'importance ces derniĂšres annĂ©es, la dynamique temporelle est introduite pour Ă©tudier l'Ă©volution du profil utilisateur qui consiste principalement Ă  saisir les changements du comportement, des intĂ©rĂȘts et des prĂ©fĂ©rences de l'utilisateur en fonction du temps et Ă  actualiser le profil en consĂ©quence. Les travaux antĂ©rieurs ont distinguĂ© deux types de profils utilisateurs : les profils Ă  court-terme et ceux Ă  long-terme. Le premier type de profil est limitĂ© aux intĂ©rĂȘts liĂ©s aux activitĂ©s actuelles de l'utilisateur tandis que le second reprĂ©sente les intĂ©rĂȘts persistants de l'utilisateur extraits de ses activitĂ©s antĂ©rieures tout en excluant les intĂ©rĂȘts rĂ©cents. Toutefois, pour les utilisateurs qui ne sont pas trĂšs actifs dont les activitĂ©s sont peu nombreuses et sĂ©parĂ©es dans le temps, le profil Ă  court-terme peut Ă©liminer des rĂ©sultats pertinents qui sont davantage liĂ©s Ă  leurs intĂ©rĂȘts personnels. Pour les utilisateurs qui sont trĂšs actifs, l'agrĂ©gation des activitĂ©s rĂ©centes sans ignorer les intĂ©rĂȘts anciens serait trĂšs intĂ©ressante parce que ce type de profil est gĂ©nĂ©ralement en Ă©volution au fil du temps. Contrairement Ă  ces approches, nous proposons, dans cette thĂšse, un profil utilisateur gĂ©nĂ©rique et sensible au temps qui est implicitement construit comme un vecteur de termes pondĂ©rĂ©s afin de trouver un compromis en unifiant les intĂ©rĂȘts rĂ©cents et anciens. Les informations du profil utilisateur peuvent ĂȘtre extraites Ă  partir de sources multiples. Parmi les mĂ©thodes les plus prometteuses, nous proposons d'utiliser, d'une part, l'historique de recherche, et d'autre part les mĂ©dias sociaux. En effet, les donnĂ©es de l'historique de recherche peuvent ĂȘtre extraites implicitement sans aucun effort de l'utilisateur et comprennent les requĂȘtes Ă©mises, les rĂ©sultats correspondants, les requĂȘtes reformulĂ©es et les donnĂ©es de clics qui ont un potentiel de retour de pertinence/rĂ©troaction. Par ailleurs, la popularitĂ© des mĂ©dias sociaux permet d'en faire une source inestimable de donnĂ©es utilisĂ©es par les utilisateurs pour exprimer, partager et marquer comme favori le contenu qui les intĂ©resse. En premier lieu, nous avons modĂ©lisĂ© le profil utilisateur utilisateur non seulement en fonction du contenu de ses activitĂ©s mais aussi de leur fraĂźcheur en supposant que les termes utilisĂ©s rĂ©cemment dans les activitĂ©s de l'utilisateur contiennent de nouveaux intĂ©rĂȘts, prĂ©fĂ©rences et pensĂ©es et doivent ĂȘtre pris en considĂ©ration plus que les anciens intĂ©rĂȘts surtout que de nombreux travaux antĂ©rieurs ont prouvĂ© que l'intĂ©rĂȘt de l'utilisateur diminue avec le temps. Nous avons modĂ©lisĂ© le profil utilisateur sensible au temps en fonction d'un ensemble de donnĂ©es collectĂ©es de Twitter (un rĂ©seau social et un service de microblogging) et nous l'avons intĂ©grĂ© dans le processus de reclassement afin de personnaliser les rĂ©sultats standards en fonction des intĂ©rĂȘts de l'utilisateur.En second lieu, nous avons Ă©tudiĂ© la dynamique temporelle dans le cadre de la session de recherche oĂč les requĂȘtes rĂ©centes soumises par l'utilisateur contiennent des informations supplĂ©mentaires permettant de mieux expliquer l'intention de l'utilisateur et prouvant qu'il n'a pas trouvĂ© les informations recherchĂ©es Ă  partir des requĂȘtes prĂ©cĂ©dentes.Ainsi, nous avons considĂ©rĂ© les interactions rĂ©centes et rĂ©currentes au sein d'une session de recherche en donnant plus d'importance aux termes apparus dans les requĂȘtes rĂ©centes et leurs rĂ©sultats cliquĂ©s. Nos expĂ©rimentations sont basĂ©s sur la tĂąche Session TREC 2013 et la collection ClueWeb12 qui ont montrĂ© l'efficacitĂ© de notre approche par rapport Ă  celles de l'Ă©tat de l'art. Au terme de ces diffĂ©rentes contributions et expĂ©rimentations, nous prouvons que notre modĂšle gĂ©nĂ©rique de profil utilisateur sensible au temps assure une meilleure performance de personnalisation et aide Ă  analyser le comportement des utilisateurs dans les contextes de session de recherche et de mĂ©dias sociaux.Recently, search engines have become the main source of information for many users and have been widely used in different fields. However, Information Retrieval Systems (IRS) face new challenges due to the growth and diversity of available data. An IRS analyses the query submitted by the user and explores collections of data with unstructured or semi-structured nature (e.g. text, image, video, Web page etc.) in order to deliver items that best match his/her intent and interests. In order to achieve this goal, we have moved from considering the query-document matching to consider the user context. In fact, the user profile has been considered, in the literature, as the most important contextual element which can improve the accuracy of the search. It is integrated in the process of information retrieval in order to improve the user experience while searching for specific information. As time factor has gained increasing importance in recent years, the temporal dynamics are introduced to study the user profile evolution that consists mainly in capturing the changes of the user behavior, interests and preferences, and updating the profile accordingly. Prior work used to discern short-term and long-term profiles. The first profile type is limited to interests related to the user's current activities while the second one represents user's persisting interests extracted from his prior activities excluding the current ones. However, for users who are not very active, the short-term profile can eliminate relevant results which are more related to their personal interests. This is because their activities are few and separated over time. For users who are very active, the aggregation of recent activities without ignoring the old interests would be very interesting because this kind of profile is usually changing over time. Unlike those approaches, we propose, in this thesis, a generic time-sensitive user profile that is implicitly constructed as a vector of weighted terms in order to find a trade-off by unifying both current and recurrent interests. User profile information can be extracted from multiple sources. Among the most promising ones, we propose to use, on the one hand, searching history. Data from searching history can be extracted implicitly without any effort from the user and includes issued queries, their corresponding results, reformulated queries and click-through data that has relevance feedback potential. On the other hand, the popularity of Social Media makes it as an invaluable source of data used by users to express, share and mark as favorite the content that interests them. First, we modeled a user profile not only according to the content of his activities but also to their freshness under the assumption that terms used recently in the user's activities contain new interests, preferences and thoughts and should be considered more than old interests. In fact, many prior works have proved that the user interest is decreasing as time goes by. In order to evaluate the time-sensitive user profile, we used a set of data collected from Twitter, i.e a social networking and microblogging service. Then, we apply our re-ranking process to a Web search system in order to adapt the user's online interests to the original retrieved results. Second, we studied the temporal dynamics within session search where recent submitted queries contain additional information explaining better the user intent and prove that the user hasn't found the information sought from previous submitted ones. We integrated current and recurrent interactions within a unique session model giving more importance to terms appeared in recently submitted queries and clicked results. We conducted experiments using the 2013 TREC Session track and the ClueWeb12 collection that showed the effectiveness of our approach compared to state-of-the-art ones. Overall, in those different contributions and experiments, we prove that our time-sensitive user profile insures better performance of personalization and helps to analyze user behavior in both session search and social media contexts
    corecore