532 research outputs found
Ideal Keyword Match in a Big Data Application Using Keyword Aware Service Recommendation Method
The big data movement additionally influenced service recommender systems. The emergence of alternative providers has created a big research issue in providing clients with relevant suggestions for services they want. Service recommender systems have proven to be helpful tools that help users manage the multitude of services at their disposal and provide pertinent recommendations. Because the quantity of customers, services, and other online information is growing exponentially, service recommender systems function in a "Big Data" context. This poses serious challenges for these systems. In this work, we address these difficulties by contributing the following: This makes use of a collaborative filtering algorithm that is user-input driven. Keywords extracted from user reviews reflect their preferences here. Additionally, we apply it to Hadoop, a distributed computing framework that builds on Map Reduce for processing. by applying a collaborative filtering process that is user-based. In the proposed system, we are using a user-based Collaborative Filtering method. It also has similarities to the existing system. We consider both customer reviews and company rankings. We provide KASR, a method for keyword-aware service recommendation. Key words in KASR serve as indicators of users' preferences, and recommendations are produced by a user-based Collaborative Filtering algorithm. A domain thesaurus and keyword-candidate list are provided to help better understand the preferences of the customers. The active user indicates their choices by selecting keywords and preferences from the keyword-candidate list
Recommended from our members
Network Structures, Concurrency, and Interpretability: Lessons from the Development of an AI Enabled Graph Database System
This thesis describes the development of the SmartGraph, an AI enabled graph database. The need for such a system has been independently recognized in the isolated fields of graph databases, graph computing, and computational graph deep learning systems, such as TensorFlow. Though prior works have investigated some relationships between these fields, we believe that the SmartGraph is the first system designed from conception to incorporate the most significant and useful characteristics of each. Examples include the ability to store graph structured data, run analytics natively on this data, and run gradient descent algorithms. It is the synergistic aspects of combining these fields that provide the most novel results presented in this dissertation. Key among them is how the notion of “graph querying” as used in graph databases can be used to solve a problem that has plagued deep learning systems since their inception; rather than attempting to embed graph structured datasets into restrictive vector spaces, we instead allow the deep learning functionality of the system to natively perform graph querying in memory during optimization as a way of interpreting (and learning) the graph. This results in a concept of natural and interpretable processing of graph structured data.
Graph computing systems have traditionally used distributed computing across multiple compute nodes (e.g. separate machines connected via Ethernet or internet) to deal with large-scale datasets whilst working sequentially on problems over entire datasets. In this dissertation, we outline a distributed graph computing methodology that facilitates all the above capabilities (even in an environment consisting of a single physical machine) while allowing for a workflow more typical of a graph database than a graph computing system; massive concurrent access allowing for arbitrarily asynchronous execution of queries and analytics across the entire system. Further, we demonstrate how this methodology is key to the artificial intelligence capabilities of the system
Performance Analysis and Improvement for Scalable and Distributed Applications Based on Asynchronous Many-Task Systems
As the complexity of recent and future large-scale data and exascale systems architectures grows, so do productivity, portability, software scalability, and efficient utilization of system resources challenges presented to both industry and the research community. Software solutions and applications are expected to scale in performance on such complex systems. Asynchronous many-task (AMT) systems, taking advantage of multi-core architectures with light-weight threads, asynchronous executions, and smart scheduling, are showing promise in addressing these challenges.
In this research, we implement several scalable and distributed applications based on HPX, an exemplar AMT runtime system. First, a distributed HPX implementation for a parameterized benchmark Task Bench is introduced. The performance bottleneck is analyzed where the repeated HPX threads creation costs and a global barrier for all threads limit the performance. The methodologies to retain the spawning threads alive and overlap communication and computation are presented. The evaluation results prove the effectiveness of the improved approach, where HPX is comparable with the prevalent programming models and takes advantages of multi-task scenarios. Second, an algorithms and data-structures SHAD library with HPX support is introduced. The methodologies to support local and remote operations in synchronous and asynchronous manners are developed. The HPX implementation in support of the SHAD library is further provided. Performance results demonstrate that the proposed system presents the similar performance as SHAD with Intel TBB (Threading Building Blocks) support for shared-memory parallelism and is better to explore the distributed-memory parallelism than SHAD with GMT (Global Memory and Threading) support. Third, an asynchronous array processing framework Phylanx is introduced. The methodologies that support a distributed alternating least square algorithm are developed. The implementation of this algorithm along with a number of distributed primitives are provided. The performance results show that Phylanx implementation presents a good scalability. Finally, a scalable second-order method for optimization is introduced. The implementation of a Krylov-Newton second-order method via PyTorch framework is provided. Evaluation results illustrate the effectiveness of scalability, convergence, and robust to hyper-parameters of the proposed method
Social Search: retrieving information in Online Social Platforms -- A Survey
Social Search research deals with studying methodologies exploiting social
information to better satisfy user information needs in Online Social Media
while simplifying the search effort and consequently reducing the time spent
and the computational resources utilized. Starting from previous studies, in
this work, we analyze the current state of the art of the Social Search area,
proposing a new taxonomy and highlighting current limitations and open research
directions. We divide the Social Search area into three subcategories, where
the social aspect plays a pivotal role: Social Question&Answering, Social
Content Search, and Social Collaborative Search. For each subcategory, we
present the key concepts and selected representative approaches in the
literature in greater detail. We found that, up to now, a large body of studies
model users' preferences and their relations by simply combining social
features made available by social platforms. It paves the way for significant
research to exploit more structured information about users' social profiles
and behaviors (as they can be inferred from data available on social platforms)
to optimize their information needs further
Towards Soft Circuit Breaking in Service Meshes via Application-agnostic Caching
Service meshes factor out code dealing with inter-micro-service
communication, such as circuit breaking. Circuit breaking actuation is
currently limited to an "on/off" switch, i.e., a tripped circuit breaker will
return an application-level error indicating service unavailability to the
calling micro-service. This paper proposes a soft circuit breaker actuator,
which returns cached data instead of an error. The overall resilience of a
cloud application is improved if constituent micro-services return stale data,
instead of no data at all. While caching is widely employed for serving web
service traffic, its usage in inter-micro-service communication is lacking.
Micro-services responses are highly dynamic, which requires carefully choosing
adaptive time-to-life caching algorithms. We evaluate our approach through two
experiments. First, we quantify the trade-off between traffic reduction and
data staleness using a purpose-build service, thereby identifying algorithm
configurations that keep data staleness at about 3% or less while reducing
network load by up to 30%. Second, we quantify the network load reduction with
the micro-service benchmark by Google Cloud called Hipster Shop. Our approach
results in caching of about 80% of requests. Results show the feasibility and
efficiency of our approach, which encourages implementing caching as a circuit
breaking actuator in service meshes
Wireless Sensor Network Virtualization: A Survey
Wireless Sensor Networks (WSNs) are the key components of the emerging
Internet-of-Things (IoT) paradigm. They are now ubiquitous and used in a
plurality of application domains. WSNs are still domain specific and usually
deployed to support a specific application. However, as WSN nodes are becoming
more and more powerful, it is getting more and more pertinent to research how
multiple applications could share a very same WSN infrastructure.
Virtualization is a technology that can potentially enable this sharing. This
paper is a survey on WSN virtualization. It provides a comprehensive review of
the state-of-the-art and an in-depth discussion of the research issues. We
introduce the basics of WSN virtualization and motivate its pertinence with
carefully selected scenarios. Existing works are presented in detail and
critically evaluated using a set of requirements derived from the scenarios.
The pertinent research projects are also reviewed. Several research issues are
also discussed with hints on how they could be tackled.Comment: Accepted for publication on 3rd March 2015 in forthcoming issue of
IEEE Communication Surveys and Tutorials. This version has NOT been
proof-read and may have some some inconsistencies. Please refer to final
version published in IEEE Xplor
Recommended from our members
Enhancing Usability and Explainability of Data Systems
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems.
For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction
HIGH PERFORMANCE AGENT-BASED MODELS WITH REAL-TIME IN SITU VISUALIZATION OF INFLAMMATORY AND HEALING RESPONSES IN INJURED VOCAL FOLDS
The introduction of clusters of multi-core and many-core processors has played a major role in recent advances in tackling a wide range of new challenging applications and in enabling new frontiers in BigData. However, as the computing power increases, the programming complexity to take optimal advantage of the machine's resources has significantly increased. High-performance computing (HPC) techniques are crucial in realizing the full potential of parallel computing. This research is an interdisciplinary effort focusing on two major directions. The first involves the introduction of HPC techniques to substantially improve the performance of complex biological agent-based models (ABM) simulations, more specifically simulations that are related to the inflammatory and healing responses of vocal folds at the physiological scale in mammals. The second direction involves improvements and extensions of the existing state-of-the-art vocal fold repair models. These improvements and extensions include comprehensive visualization of large data sets generated by the model and a significant increase in user-simulation interactivity.
We developed a highly-interactive remote simulation and visualization framework for vocal fold (VF) agent-based modeling (ABM). The 3D VF ABM was verified through comparisons with empirical vocal fold data. Representative trends of biomarker predictions in surgically injured vocal folds were observed. The physiologically representative human VF ABM consisted of more than 15 million mobile biological cells. The model maintained and generated 1.7 billion signaling and extracellular matrix (ECM) protein data points in each iteration. The VF ABM employed HPC techniques to optimize its performance by concurrently utilizing the power of multi-core CPU and multiple GPUs. The optimization techniques included the minimization of data transfer between the CPU host and the rendering GPU. These transfer minimization techniques also reduced transfers between peer GPUs in multi-GPU setups. The data transfer minimization techniques were executed with a scheduling scheme that aims to achieve load balancing, maximum overlap of computation and communication, and a high degree of interactivity. This scheduling scheme achieved optimal interactivity by hyper-tasking the available GPUs (GHT). In comparison to the original serial implementation on a popular ABM framework, NetLogo, these schemes have shown substantial performance improvements of 400x and 800x for the 2D and 3D model, respectively. Furthermore, the combination of data footprint and data transfer reduction techniques with GHT achieved high-interactivity visualization with an average framerate of 42.8 fps. This performance enabled the users to perform real-time data exploration on large simulated outputs and steer the course of their simulation as needed
GraphScope Flex: LEGO-like Graph Computing Stack
Graph computing has become increasingly crucial in processing large-scale
graph data, with numerous systems developed for this purpose. Two years ago, we
introduced GraphScope as a system addressing a wide array of graph computing
needs, including graph traversal, analytics, and learning in one system. Since
its inception, GraphScope has achieved significant technological advancements
and gained widespread adoption across various industries. However, one key
lesson from this journey has been understanding the limitations of a
"one-size-fits-all" approach, especially when dealing with the diversity of
programming interfaces, applications, and data storage formats in graph
computing. In response to these challenges, we present GraphScope Flex, the
next iteration of GraphScope. GraphScope Flex is designed to be both
resource-efficient and cost-effective, while also providing flexibility and
user-friendliness through its LEGO-like modularity. This paper explores the
architectural innovations and fundamental design principles of GraphScope Flex,
all of which are direct outcomes of the lessons learned during our ongoing
development process. We validate the adaptability and efficiency of GraphScope
Flex with extensive evaluations on synthetic and real-world datasets. The
results show that GraphScope Flex achieves 2.4X throughput and up to 55.7X
speedup over other systems on the LDBC Social Network and Graphalytics
benchmarks, respectively. Furthermore, GraphScope Flex accomplishes up to a
2,400X performance gain in real-world applications, demonstrating its
proficiency across a wide range of graph computing scenarios with increased
effectiveness
- …