2,911 research outputs found

    Evaluation of Storage Systems for Big Data Analytics

    Get PDF
    abstract: Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model.Dissertation/ThesisMasters Thesis Computer Science 201

    Future of networking is the future of Big Data, The

    Get PDF
    2019 Summer.Includes bibliographical references.Scientific domains such as Climate Science, High Energy Particle Physics (HEP), Genomics, Biology, and many others are increasingly moving towards data-oriented workflows where each of these communities generates, stores and uses massive datasets that reach into terabytes and petabytes, and projected soon to reach exabytes. These communities are also increasingly moving towards a global collaborative model where scientists routinely exchange a significant amount of data. The sheer volume of data and associated complexities associated with maintaining, transferring, and using them, continue to push the limits of the current technologies in multiple dimensions - storage, analysis, networking, and security. This thesis tackles the networking aspect of big-data science. Networking is the glue that binds all the components of modern scientific workflows, and these communities are becoming increasingly dependent on high-speed, highly reliable networks. The network, as the common layer across big-science communities, provides an ideal place for implementing common services. Big-science applications also need to work closely with the network to ensure optimal usage of resources, intelligent routing of requests, and data. Finally, as more communities move towards data-intensive, connected workflows - adopting a service model where the network provides some of the common services reduces not only application complexity but also the necessity of duplicate implementations. Named Data Networking (NDN) is a new network architecture whose service model aligns better with the needs of these data-oriented applications. NDN's name based paradigm makes it easier to provide intelligent features at the network layer rather than at the application layer. This thesis shows that NDN can push several standard features to the network. This work is the first attempt to apply NDN in the context of large scientific data; in the process, this thesis touches upon scientific data naming, name discovery, real-world deployment of NDN for scientific data, feasibility studies, and the designs of in-network protocols for big-data science

    Anlytical study based on issues of Routing & Security in Wireless sensor networks

    Get PDF
    Wireless Sensor Networks (WSN) are receiving significant importance in the present scenario owing to their unlimited potential and world wide applications. The routes in the network are determined by the most secured and energy efficient routing protocols and these energy efficient routing protocols employed for WSNs are the Hierarchical or cluster based routing protocols that are essential for path computation in sensor networks. Since most of the hierarchical routing protocols aim to be developed as energy efficient, the security issues are not given much importance most of the times. But in certain applications such as military or battle field the data is to be maintained secret while communicating between sensor nodes and basin so security issues are also required to be focused in developing routing protocols. Keeping in view above in this paper we intend to present the various security issues involved while designing the hierarchical routing protocol for a specific WSN and the design challenges while studying different hierarchical based routing protocols. Keywords- Wireless Sensor Networks (WSNs), Hierarchical routing, Securityissues

    BIOZON: a system for unification, management and analysis of heterogeneous biological data

    Get PDF
    BACKGROUND: Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. DESCRIPTION: Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. CONCLUSION: The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at

    Software Development in the Fintech Industry: A Literature Review

    Get PDF
    Background: The digital transformation of the financial industry led by technological advances, together with changes in regulations, have created opportunities for companies to provide innovative new financial services. Fintech applies latest technological innovations to provide better and more efficient financial products and business models, disrupting the traditionally rigid banking industry. Objective: The objective of this study was to find out what kind of digital services are provided by the fintech industry, what software or technology related competencies and skills are needed in the development of fintech software, and the special requirements of fintech software. Method: The method of this study was systematic literature review. Following the defined review protocol, 31 primary studies published between the years of 2015 and 2021 were identified as relevant for this review by queries to three scientific databases. The selected primary studies were categorized by services, competencies and requirements. Results: Most of the identified digital services were payment applications but also robo-advisors, budgeting tools and compliance automation tools were found. The technologies and related skills extracted from the studies were divided to software development skills and data science skills and further categorized. Compliance with laws and regulations and various reporting and auditing practices were found to be unique domain requirements for fintech. Security was the most mentioned non-functional requirement of a financial system. Conclusions: Fintech is a cross-disciplinary field with unique requirements for business critical software. However, research on fintech software development is still limited

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Re-examining and re-conceptualising enterprise search and discovery capability: towards a model for the factors and generative mechanisms for search task outcomes.

    Get PDF
    Many organizations are trying to re-create the Google experience, to find and exploit their own corporate information. However, there is evidence that finding information in the workplace using search engine technology has remained difficult, with socio-technical elements largely neglected in the literature. Explication of the factors and generative mechanisms (ultimate causes) to effective search task outcomes (user satisfaction, search task performance and serendipitous encountering) may provide a first step in making improvements. A transdisciplinary (holistic) lens was applied to Enterprise Search and Discovery capability, combining critical realism and activity theory with complexity theories to one of the worlds largest corporations. Data collection included an in-situ exploratory search experiment with 26 participants, focus groups with 53 participants and interviews with 87 business professionals. Thousands of user feedback comments and search transactions were analysed. Transferability of findings was assessed through interviews with eight industry informants and ten organizations from a range of industries. A wide range of informational needs were identified for search filters, including a need to be intrigued. Search term word co-occurrence algorithms facilitated serendipity to a greater extent than existing methods deployed in the organization surveyed. No association was found between user satisfaction (or self assessed search expertise) with search task performance and overall performance was poor, although most participants had been satisfied with their performance. Eighteen factors were identified that influence search task outcomes ranging from user and task factors, informational and technological artefacts, through to a wide range of organizational norms. Modality Theory (Cybersearch culture, Simplicity and Loss Aversion bias) was developed to explain the study observations. This proposes that at all organizational levels there are tendencies for reductionist (unimodal) mind-sets towards search capability leading to fixes that fail. The factors and mechanisms were identified in other industry organizations suggesting some theory generalizability. This is the first socio-technical analysis of Enterprise Search and Discovery capability. The findings challenge existing orthodoxy, such as the criticality of search literacy (agency) which has been neglected in the practitioner literature in favour of structure. The resulting multifactorial causal model and strategic framework for improvement present opportunities to update existing academic models in the IR, LIS and IS literature, such as the DeLone and McLean model for information system success. There are encouraging signs that Modality Theory may enable a reconfiguration of organizational mind-sets that could transform search task outcomes and ultimately business performance

    Extending information retrieval system model to improve interactive web searching.

    Get PDF
    The research set out with the broad objective of developing new tools to support Web information searching. A survey showed that a substantial number of interactive search tools were being developed but little work on how these new developments fitted into the general aim of helping people find information. Due to this it proved difficult to compare and analyse how tools help and affect users and where they belong in a general scheme of information search tools. A key reason for a lack of better information searching tools was identified in the ill-suited nature of existing information retrieval system models. The traditional information retrieval model is extended by synthesising work in information retrieval and information seeking research. The purpose of this new holistic search model is to assist information system practitioners in identifying, hypothesising, designing and evaluating Web information searching tools. Using the model, a term relevance feedback tool called ‘Tag and Keyword’ (TKy) was developed in a Web browser and it was hypothesised that it could improve query reformulation and reduce unnecessary browsing. The tool was laboratory experimented and quantitative analysis showed statistical significances in increased query reformulations and in reduced Web browsing (per query). Subjects were interviewed after the experiment and qualitative analysis revealed that they found the tool useful and saved time. Interestingly, exploratory analysis on collected data identified three different methods in which subjects had utilised the TKy tool. The research developed a holistic search model for Web searching and demonstrated that it can be used to hypothesise, design and evaluate information searching tools. Information system practitioners using it can better understand the context in which their search tools are developed and how these relate to users’ search processes and other search tools
    • 

    corecore