1,602 research outputs found

    Multimodal Approach for Big Data Analytics and Applications

    Get PDF
    The thesis presents multimodal conceptual frameworks and their applications in improving the robustness and the performance of big data analytics through cross-modal interaction or integration. A joint interpretation of several knowledge renderings such as stream, batch, linguistics, visuals and metadata creates a unified view that can provide a more accurate and holistic approach to data analytics compared to a single standalone knowledge base. Novel approaches in the thesis involve integrating multimodal framework with state-of-the-art computational models for big data, cloud computing, natural language processing, image processing, video processing, and contextual metadata. The integration of these disparate fields has the potential to improve computational tools and techniques dramatically. Thus, the contributions place multimodality at the forefront of big data analytics; the research aims at mapping and under- standing multimodal correspondence between different modalities. The primary contribution of the thesis is the Multimodal Analytics Framework (MAF), a collaborative ensemble framework for stream and batch processing along with cues from multiple input modalities like language, visuals and metadata to combine benefits from both low-latency and high-throughput. The framework is a five-step process: Data ingestion. As a first step towards Big Data analytics, a high velocity, fault-tolerant streaming data acquisition pipeline is proposed through a distributed big data setup, followed by mining and searching patterns in it while data is still in transit. The data ingestion methods are demonstrated using Hadoop ecosystem tools like Kafka and Flume as sample implementations. Decision making on the ingested data to use the best-fit tools and methods. In Big Data Analytics, the primary challenges often remain in processing heterogeneous data pools with a one-method-fits all approach. The research introduces a decision-making system to select the best-fit solutions for the incoming data stream. This is the second step towards building a data processing pipeline presented in the thesis. The decision-making system introduces a Fuzzy Graph-based method to provide real-time and offline decision-making. Lifelong incremental machine learning. In the third step, the thesis describes a Lifelong Learning model at the processing layer of the analytical pipeline, following the data acquisition and decision making at step two for downstream processing. Lifelong learning iteratively increments the training model using a proposed Multi-agent Lambda Architecture (MALA), a collaborative ensemble architecture between the stream and batch data. As part of the proposed MAF, MALA is one of the primary contributions of the research.The work introduces a general-purpose and comprehensive approach in hybrid learning of batch and stream processing to achieve lifelong learning objectives. Improving machine learning results through ensemble learning. As an extension of the Lifelong Learning model, the thesis proposes a boosting based Ensemble method as the fourth step of the framework, improving lifelong learning results by reducing the learning error in each iteration of a streaming window. The strategy is to incrementally boost the learning accuracy on each iterating mini-batch, enabling the model to accumulate knowledge faster. The base learners adapt more quickly in smaller intervals of a sliding window, improving the machine learning accuracy rate by countering the concept drift. Cross-modal integration between text, image, video and metadata for more comprehensive data coverage than a text-only dataset. The final contribution of this thesis is a new multimodal method where three different modalities: text, visuals (image and video) and metadata, are intertwined along with real-time and batch data for more comprehensive input data coverage than text-only data. The model is validated through a detailed case study on the contemporary and relevant topic of the COVID-19 pandemic. While the remainder of the thesis deals with text-only input, the COVID-19 dataset analyzes both textual and visual information in integration. Post completion of this research work, as an extension to the current framework, multimodal machine learning is investigated as a future research direction

    A Business Intelligence Solution, based on a Big Data Architecture, for processing and analyzing the World Bank data

    Get PDF
    The rapid growth in data volume and complexity has needed the adoption of advanced technologies to extract valuable insights for decision-making. This project aims to address this need by developing a comprehensive framework that combines Big Data processing, analytics, and visualization techniques to enable effective analysis of World Bank data. The problem addressed in this study is the need for a scalable and efficient Business Intelligence solution that can handle the vast amounts of data generated by the World Bank. Therefore, a Big Data architecture is implemented on a real use case for the International Bank of Reconstruction and Development. The findings of this project demonstrate the effectiveness of the proposed solution. Through the integration of Apache Spark and Apache Hive, data is processed using Extract, Transform and Load techniques, allowing for efficient data preparation. The use of Apache Kylin enables the construction of a multidimensional model, facilitating fast and interactive queries on the data. Moreover, data visualization techniques are employed to create intuitive and informative visual representations of the analysed data. The key conclusions drawn from this project highlight the advantages of a Big Data-driven Business Intelligence solution in processing and analysing World Bank data. The implemented framework showcases improved scalability, performance, and flexibility compared to traditional approaches. In conclusion, this bachelor thesis presents a Business Intelligence solution based on a Big Data architecture for processing and analysing the World Bank data. The project findings emphasize the importance of scalable and efficient data processing techniques, multidimensional modelling, and data visualization for deriving valuable insights. The application of these techniques contributes to the field by demonstrating the potential of Big Data Business Intelligence solutions in addressing the challenges associated with large-scale data analysis

    Assessing learners’ satisfaction in collaborative online courses through a big data approach

    Get PDF
    none4noMonitoring learners' satisfaction (LS) is a vital action for collecting precious information and design valuable online collaborative learning (CL) experiences. Today's CL platforms allow students for performing many online activities, thus generating a huge mass of data that can be processed to provide insights about the level of satisfaction on contents, services, community interactions, and effort. Big Data is a suitable paradigm for real-time processing of large data sets concerning the LS, in the final aim to provide valuable information that may improve the CL experience. Besides, the adoption of Big Data offers the opportunity to implement a non-intrusive and in-process evaluation strategy of online courses that complements the traditional and time-consuming ways to collect feedback (e.g. questionnaires or surveys). Although the application of Big Data in the CL domain is a recent explored research area with limited applications, it may have an important role in the future of online education. By adopting the design science research methodology, this article describes a novel method and approach to analyse individual students' contributions in online learning activities and assess the level of their satisfaction towards the course. A software artefact is also presented, which leverages Learning Analytics in a Big Data context, with the goal to provide in real-time valuable insights that people and systems can use to intervene properly in the program. The contribution of this paper can be of value for both researchers and practitioners: the former can be interested in the approach and method used for LS assessment; the latter can find of interest the system implemented and how it has been tested in a real online course.openElia G.; Solazzo G.; Lorenzo G.; Passiante G.Elia, G.; Solazzo, G.; Lorenzo, G.; Passiante, G

    Faculty Achievements, April 2017

    Get PDF

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Full text link
    We introduce the Never Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks, sorted chronologically and extracted from papers sampled uniformly from computer vision proceedings spanning the last three decades. The resulting stream reflects what the research community thought was meaningful at any point in time. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, crowd counting, scene recognition, and so forth. The diversity is also reflected in the wide range of dataset sizes, spanning over four orders of magnitude. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks, yet with a low entry barrier as it is limited to a single modality and each task is a classical supervised learning problem. Moreover, we provide a reference implementation including strong baselines and a simple evaluation protocol to compare methods in terms of their trade-off between accuracy and compute. We hope that NEVIS'22 can be useful to researchers working on continual learning, meta-learning, AutoML and more generally sequential learning, and help these communities join forces towards more robust and efficient models that efficiently adapt to a never ending stream of data. Implementations have been made available at https://github.com/deepmind/dm_nevis

    e-Skills: The International dimension and the Impact of Globalisation - Final Report 2014

    Get PDF
    In today’s increasingly knowledge-based economies, new information and communication technologies are a key engine for growth fuelled by the innovative ideas of highly - skilled workers. However, obtaining adequate quantities of employees with the necessary e-skills is a challenge. This is a growing international problem with many countries having an insufficient numbers of workers with the right e-Skills. For example: Australia: “Even though there’s 10,000 jobs a year created in IT, there are only 4500 students studying IT at university, and not all of them graduate” (Talevski and Osman, 2013). Brazil: “Brazil’s ICT sector requires about 78,000 [new] people by 2014. But, according to Brasscom, there are only 33,000 youths studying ICT related courses in the country” (Ammachchi, 2012). Canada: “It is widely acknowledged that it is becoming inc reasingly difficult to recruit for a variety of critical ICT occupations –from entry level to seasoned” (Ticoll and Nordicity, 2012). Europe: It is estimated that there will be an e-skills gap within Europe of up to 900,000 (main forecast scenario) ICT pr actitioners by 2020” (Empirica, 2014). Japan: It is reported that 80% of IT and user companies report an e-skills shortage (IPA, IT HR White Paper, 2013) United States: “Unlike the fiscal cliff where we are still peering over the edge, we careened over the “IT Skills Cliff” some years ago as our economy digitalized, mobilized and further “technologized”, and our IT skilled labour supply failed to keep up” (Miano, 2013)
    • …
    corecore