593 research outputs found

    The 10th Jubilee Conference of PhD Students in Computer Science

    Get PDF

    Top-k web services compositions: A fuzzy-set-based approach

    Get PDF
    International audienceData as a Service (DaaS) is a flexible way that allows enter- prises to expose their data. Composition of DaaS services provides bridges to answer queries. User preferences are becoming increasingly important to personalizing the com- position process. In this paper, we propose an approach to compose DaaS services in the context of preference queries where preferences are modeled by means of fuzzy sets that allow for a large variety of flexible terms such as 'cheap', 'af- fordable' and 'fairly expensive'. The proposed approach is based on RDF-based query rewritings to take into account the partial matching between individual DaaS services and parts of the user query. Matching degrees between DaaS services and fuzzy preference constraints are computed by means of different constraints inclusion methods. Such de- grees express to which extent a service is relevant to the resolution of the query. A fuzzification of Pareto dominance is also proposed to better rank composite services by com- puting the score of services. The resulting scores are then used to compute the top-k DaaS service compositions that cover the user query

    Faster Multidimensional Data Queries on Infrastructure Monitoring Systems

    Get PDF
    The analytics in online performance monitoring systems have often been limited due to the query performance of large scale multidimensional data. In this paper, we introduce a faster query approach using the bit-sliced index (BSI). Our study covers multidimensional grouping and preference top-k queries with the BSI, algorithms design, time complexity evaluation, and the query time comparison on a real-time production performance monitoring system. Our research work extended the BSI algorithms to cover attributes filtering and multidimensional grouping. We evaluated the query time with the single attribute, multiple attributes, feature filtering, and multidimensional grouping. To compare with the existing prior arts, we made a benchmarking comparison with the bitmap indexing, sequential scan, and collection streaming grouping. In the result of our experiments with large scale production data, the proposed BSI approach outperforms the existing prior arts: 3 times faster than the bitmap indexing approach on single attribute top-k queries, 10 times faster than the collection stream approach on the multidimensional grouping. While comparing with the baseline sequential scan approach, our proposed algorithm BSI approach outperforms the sequential scan approach with a factor of 10 on multiple attributes queries and a factor of 100 on single attribute queries. In the previous research, we had evaluated the BSI time complexity and space complexity on simulation data with various distributions, this research work further studied, evaluated, and concluded the BSI approach query performance with real production data

    Technical debt-aware and evolutionary adaptation for service composition in SaaS clouds

    Get PDF
    The advantages of composing and delivering software applications in the Cloud-Based Software as a Service (SaaS) model are offering cost-effective solutions with minimal resource management. However, several functionally-equivalent web services with diverse Quality of Service (QoS) values have emerged in the SaaS cloud, and the tenant-specific requirements tend to lead the difficulties to select the suitable web services for composing the software application. Moreover, given the changing workload from the tenants, it is not uncommon for a service composition running in the multi-tenant SaaS cloud to encounter under-utilisation and over-utilisation on the component services that affects the service revenue and violates the service level agreement respectively. All those bring challenging decision-making tasks: (i) when to recompose the composite service? (ii) how to select new component services for the composition that maximise the service utility over time? at the same time, low operation cost of the service composition is desirable in the SaaS cloud. In this context, this thesis contributes an economic-driven service composition framework to address the above challenges. The framework takes advantage of the principal of technical debt- a well-known software engineering concept, evolutionary algorithm and time-series forecasting method to predictively handle the service provider constraints and SaaS dynamics for creating added values in the service composition. We emulate the SaaS environment setting for conducting several experiments using an e-commerce system, realistic datasets and workload trace. Further, we evaluate the framework by comparing it with other state-of-the-art approaches based on diverse quality metrics

    ă‚čă‚«ă‚€ăƒ©ă‚€ăƒłć•ćˆă‚ă›ă‚’ćˆ©ç”šă—ăŸć€§èŠæšĄăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čăźæƒ…ć ±éžćˆ„

    Get PDF
    Conventional SQL queries take exact input and produce complete result set. However, with massive increase in data volume in different applications, the large result sets returned by traditional SQL queries are not well suited for the users to take effective decisions. Therefore, there is an increasing interest in queries like top-k queries and skyline queries those produce a more concise result set. Top-k queries rely on the scores of the objects to evaluate the usefulness of the objects. In this type of queries, users require to define their own scoring function by combining their interests. Based on the user defined scoring function, the system sorts the objects by their scores and outputs the top-k objects in the ranking list as the result. However, defining a scoring function by the users is a major draw of the top-k queries as in the large data sets where there are many conflicting criteria exist, it is very difficult for the users to define the scoring functions by themselves.

ćșƒćł¶ć€§ć­Š(Hiroshima University)ćšćŁ«(ć·„ć­Š)Engineeringdoctora

    Searching and mining in enriched geo-spatial data

    Get PDF
    The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in rĂ€umlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten ĂŒber sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhĂ€renter Unsicherheit. Beispiele fĂŒr solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusĂ€tzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hĂ€lt viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in rĂ€umlichen Daten. In all diesen FĂ€llen haben die entwickelten Methoden signifikante ForschungsbeitrĂ€ge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gĂ€ngiges Anwendungsgebiet rĂ€umlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulĂ€rer Maße optimieren, d.h., der kĂŒrzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusĂ€tzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder PrĂ€ferenzen erfĂŒllen. Diese Arbeit prĂ€sentiert zwei AnsĂ€tze, solche multi-enriched Daten in Straßennetze einzufĂŒgen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen ĂŒber NutzerprĂ€ferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden prĂ€sentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework prĂ€sentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfĂŒgbaren Ressourcen. Eine Anwendung ist die Suche nach ParkplĂ€tzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulĂ€sst. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant ĂŒber den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit prĂ€sentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukĂŒnftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfĂŒgbar und Anfragen lassen sich aufgrund der \P-VollstĂ€ndigkeit des Problems nicht ohne Weiteres auf grĂ¶ĂŸere DatensĂ€tze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wĂ€re. Diese Arbeit prĂ€sentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

    Full Solution Indexing and Efficient Compressed Graph Representation for Web Service Composition

    Get PDF
    Service-oriented computing enhances business scalability and flexibility; providers who expect to benefit from it may bring explosive growth of web services. Searching an optimal composition solution with both functional and non-functional requirements is a computationally demanding problem: the time and space requirements may be infeasible due to the high number of available services. In this thesis, we study QoS-aware service composition problems which satisfy functional requirements as well as non-functional requirements. We use optimization algorithms to enhance accuracy of our searching algorithms. In the first approach, we propose a database-based approach to search a service composition solution. Current in-memory methods are limited by expensive and volatile physical memory, to deal with this problem, we want to use the large space available in relational database on persistence disk. In our database-based approach, all possible service combinations are generated beforehand and stored in a relational database. When a user request comes, SQL queries are composed to search in the database and K best solutions are returned. We test the performance of the proposed approach with a service challenge data set; experiment results demonstrate that this approach can always successfully find top-K valid solutions.We offer three main contributions in this approach. First, we overcome the disadvantages of in-memory composition algorithms, such as volatile and expensive, and provide a solution suitable to cloud environments. Second, we fetch top-K solutions in case the optimal solution is not available as backup solutions to the user. Third, compared with other pre-computing composition methods, we use a single SQL query: there is no need to eliminate spurious services iteratively. Then, we propose the application of a skyline operator to reduce the search space and improve the scalability. Skyline analysis returns all of the elements that are not dominated by another element. We use skyline analysis to find a set of candidate services referred to as "skyline services", therefore, less competitive services are reduced. This allows us to find a solution for a large composition problem with less storage and increased speed. In reality, different users may have same requests, we are motivated to pick some popular requests and generate paths for fast delivery. These paths are stored in a separate table of the relational database. When a user request comes, we first search to find a nearly ready-made solution. Only as a last resort do we search the table with whole paths to find a solution. Finally, to deal with the problem that the search space may explore, we apply a compressed data structure to represent the service composition graph. The goal is to allow algorithms running in in-memory over larger graphs. In this approach, we use compact K2-trees to represent the service composition graph. When a user request comes, we search the K2-tree for a satisfactory solution. We use an array to store values in the last level of the compact tree, which represents relationships between services and concepts. In our algorithms, we find services' inputs (resp. outputs) by locating elements in this array directly, therefore, decompressing the graph is unnecessary. To the best of our knowledge, our work is the first attempt to consider compact structure in solving web service composition problems. Experiment results demonstrate that this approach takes less space and has good scalability when handling a large number of web services. We provide different approaches to search a solution for the user. If the user want to find an optimal solution with fewer services, he may use the database-based approach to search for a solution. If the user want to get a solution in a short time, he may choose the in-memory approach
    • 

    corecore