11 research outputs found

    Towards a Conceptual Model for developing a Career Prediction System for Students’ Subject Selection at Secondary School Level

    Get PDF
    Career choice prediction has been a complex phenomenon both in developed and developing countries. Though various theories that describe career prediction have emerged, their practical implementation in the form of a system has been hampered by the shortfalls that come along with each of them. However, there is no existing theoretically based holistic model that merges various theories that can inform the development of such a system in the developing world context. This paper, therefore, aims at proposing a holistic conceptual model that integrates a number of variables to inform the development of a career prediction system in the developing world context. The study focused on the strengths of the various theories identified from literature to develop a conceptual model. Model verification and validation will be undertaken after data collection in the proposed study

    An intelligent approach for data pre-processing and analysis in predictive maintenance with an industrial case study

    Get PDF
    Recent development in the predictive maintenance field has focused on incorporating artificial intelligence techniques in the monitoring and prognostics of machine health. The current predictive maintenance applications in manufacturing are now more dependent on data-driven Machine Learning algorithms requiring an intelligent and effective analysis of a large amount of historical and real-time data coming from multiple streams (sensors and computer systems) across multiple machines. Therefore, this article addresses issues of data pre-processing that have a significant impact on generalization performance of a Machine Learning algorithm. We present an intelligent approach using unsupervised Machine Learning techniques for data pre-processing and analysis in predictive maintenance to achieve qualified and structured data. We also demonstrate the applicability of the formulated approach by using an industrial case study in manufacturing. Data sets from the manufacturing industry are analyzed to identify data quality problems and detect interesting subsets for hidden information. With the approach formulated, it is possible to get the useful and diagnostic information in a systematic way about component/machine behavior as the basis for decision support and prognostic model development in predictive maintenance

    Adapting a quality model for a Big Data application: the case of a feature prediction system

    Get PDF
    En la última década hemos sido testigos del considerable incremento de proyectos basados en aplicaciones de Big Data. Algunos de los tipos más populares de esas aplicaciones han sido: los sistemas de recomendaciones, la predicción de características y la toma de decisiones. En este nuevo auge han surgido propuestas de implementación de modelos de calidad para las aplicaciones de Big data que por su gran heterogeneidad se hace difícil la selección del modelo de calidad ideal para el desarrollo de un tipo específico de aplicación de Big Data. En el presente Trabajo de Fin de Máster se realiza un estudio de mapeo sistemático (SMS, por sus siglas en inglés) que parte de dos preguntas clave de investigación. La primera trata sobre cuál es el estado en la identificación de riesgos, problemas o desafíos en las aplicaciones de Big Data. La segunda, trata sobre qué modelos de calidad se han aplicado hasta la fecha a las aplicaciones de Big Data, específicamente a los sistemas de predicción de características. El objetivo final es analizar los modelos de calidad disponibles y adaptar un modelo de calidad a partir de los existentes que se puedan aplicar a un tipo específico de aplicación de Big Data: los sistemas de predicción de características. El modelo definido comprende un conjunto de características de calidad definidas como parte del modelo y métricas de calidad para evaluarlas. Finalmente, se realiza una aproximación a un caso de estudio donde se aplica el modelo y se evalúan las características de calidad definidas a través de sus métricas de calidad presentándose los resultados obtenidos.In the last decade, we have been witnesses of the considerable increment of projects based on big data applications. Some of the most popular types of those applications have been: Recommendations, Feature Predictions, and Decision making. In this new context, several proposals have arisen for the implementation of quality models applied to Big Data applications. As part of the current Master thesis, a Systematic Mapping Study (SMS) is conducted which starts from two key research questions. The first one is about what is the state of the art about the identification of risks, issues, problems, or challenges in big data applications. The second one, is about which quality models have been applied up to date to big data applications, specifically to feature prediction systems. The main objective is to analyze the available quality models and adapt a quality model from the existing ones that can be applied to a specific type of Big Data application: The Feature Prediction Systems. The defined model comprises a set of quality characteristics defined as part of the model and a set of quality metrics to evaluate them. Finally, an approach is made to a case study where the model is applied, and the quality characteristics defined through its quality metrics are evaluated. The results are presented and discussed.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Máster en Ingeniería Informátic

    Generation and use of unstructured data in the social, behavioural, and economic sciences: challenges and recommendations

    Get PDF
    The increasing digital transformation of society in recent decades has resulted in a number of new data sources for the social, behavioural, and economic sciences. Among many others, they include unstructured data, which are characterised by not being available in a fixed data format and are therefore not easy to process for data analysis (e.g., Facebook posts, Instagram images, YouTube videos, Twitter messages). The use of unstructured data is linked to specific challenges, which arise precisely because the data are not typically collected as part of a controlled, scientific study but are often created in people's natural environments. Building on the results of an expert workshop, we describe the specific challenges of generating and using unstructured data and formulate recommendations for their use. Our recommendations are based on the total error framework and take into account data generation (definition of the units of analysis, coverage and sampling error, non-response, and missing data error), post-collection processing (specification error, validity, measurement error, and error in terms of content), and, lastly, data analysis (record linkage and processing errors, modelling errors, analytical errors). Finally, we discuss open questions and challenges to research using unstructured data. This output paper is aimed at students and researchers in the social, behavioural, and economic sciences on the one hand, and everyone working with unstructured data and drawing inferences from them for practical applications on the other

    Big data analytics as a management tool: An overview, trends and challenges

    Get PDF
    Innovative digital technologies and ever-changing business environment have and will continue to transform businesses and industries around the world. This transformation will be even more evident in view of forthcoming technological breakthroughs, and advances in big data analytics, machine learning algorithms, cloud-computing solutions, artificial intelligence, internet of things, and the like. As we live in a data-driven world, technologies are altering work and work-related activities, and everyday activities and interactions. This paper is focused on big data and big data analytics (BDA), which are viewed in the paper from organisational perspective, as a means of improving firm performance and competitiveness. Based on a review of selected literature and researches, the paper aims to explore the extent to which big data analytics is utilized in companies, and to highlight the valuable role big data analytics may play in achieving better business outcomes. Furthermore, the paper briefly presents main challenges that accompany the adoption of big data analytics in companies

    Measuring Data Quality of Theses and Dissertations in the Data Preparation Stage of Registration Systems

    Get PDF
    Today, academic research plays a very influential role in the economic development of countries. These researches are often recorded and disseminated in the form and structure of theses and dissertations in scientific institutes. The better the quality of this data in the systems that collect and distribute it, the more it can be used and exploited by organizations and businesses. Therefore, providing this data requires proper monitoring to put the output of the recording and dissemination process in good condition. This paper offers a framework for evaluating theses and dissertation data quality. In the framework, the data inconsistency coding structure is introduced and presented in Word and PDF files and in the form of metadata (bibliographic information). The approaches presented in data quality methodologies (TDQM and DWQ) are also used to provide solutions to improve data quality in the provisioning phase. At this stage, approaches such as owner attribution to data or process, root cause analysis, process control, and continuous monitoring are considered. The focus group method determines the operational strategies for quality improvement. Finally, process-oriented techniques, such as quality control checklists and image processing, and data-driven approaches, such as data cleansing, are localized and developed in this section to improve the quality of theses/dissertation documents. The provided improvement solutions were categorized into two different groups. Guiding the user in the "Theses/Dissertations" registration process is identified as a process-driven category. On the other hand, introducing a specific format for "Theses/Dissertations" files and resolving the quality issues of PDF files were among the data-driven solutions

    Erhebung und Nutzung unstrukturierter Daten in den Sozial-, Verhaltens- und Wirtschaftswissenschaften: Herausforderungen und Empfehlungen

    Get PDF
    Die zunehmende Digitalisierung unserer Lebenswelt in den letzten Jahrzehnten hat zu einer Reihe von neuen Datenquellen für die Sozial-, Verhaltens- und Wirtschaftswissenschaften geführt. Hierzu gehören vor allem auch unstrukturierte Daten, die sich dadurch auszeichnen, dass sie nicht in Form eines festen Datenformats vorliegen und daher nicht einfach datenanalytisch weiterverarbeitet werden können (z. B. Facebook-Texte, Instagram-Bilder, YouTube-Videos, Twitter-Nachrichten). Die Nutzung unstrukturierter Daten ist mit spezifischen Herausforderungen verknüpft, die gerade dadurch entstehen, dass die Daten typischerweise nicht in einer kontrollierten wissenschaftlichen Studie erhoben werden, sondern häufig im natürlichen Lebensumfeld anfallen. Aufbauend auf den Ergebnissen eines Expert:innen-Workshops werden die spezifischen Herausforderungen bei der Erhebung und Nutzung unstrukturierter Daten beschrieben und Empfehlungen formuliert. Diese orientieren sich am Total Error Framework und beziehen sich auf die Datengenerierung (Definition von Untersuchungseinheiten, Coverage und Sampling Error, Nonresponse und Missing Data Error), die Datenaufbereitung (Spezifikationsfehler, Validität, Messfehler und inhaltliche Fehler) sowie die Datenanalyse (Record Linkage und Verarbeitungsfehler, Modellierungsfehler, analytische Fehler). Abschließend werden offene Fragen und Herausforderungen bei der Forschung mit unstrukturierten Daten diskutiert. Der Output richtet sich einerseits an Studierende sowie Forschende der Sozial-, Verhaltens- und Wirtschaftswissenschaften, andererseits an alle, die mit unstrukturierten Daten arbeiten und Schlüsse aus diesen für praktische Anwendungsfragen ziehen

    Erhebung und Nutzung unstrukturierter Daten in den Sozial-, Verhaltens- und Wirtschaftswissenschaften

    Get PDF
    Die zunehmende Digitalisierung unserer Lebenswelt in den letzten Jahrzehnten hat zu einer Reihe von neuen Datenquellen für die Sozial-, Verhaltens- und Wirtschaftswissenschaften geführt. Hierzu gehören vor allem auch unstrukturierte Daten, die sich dadurch auszeichnen, dass sie nicht in Form eines festen Datenformats vorliegen und daher nicht einfach datenanalytisch weiterverarbeitet werden können (z.B. Facebook-Texte, Instagram-Bilder, YouTube-Videos, Twitter-Nachrichten). Die Nutzung unstrukturierter Daten ist mit spezifischen Herausforderungen verknüpft, die gerade dadurch entstehen, dass die Daten typischerweise nicht in einer kontrollierten wissenschaftlichen Studie erhoben werden, sondern häufig im natürlichen Lebensumfeld anfallen. Aufbauend auf den Ergebnissen eines Expert:innen-Workshops werden die spezifischen Herausforderungen bei der Erhebung und Nutzung unstrukturierter Daten beschrieben und Empfehlungen formuliert. Diese orientieren sich am Total Error Framework und beziehen sich auf die Datengenerierung (Definition von Untersuchungseinheiten, Coverage und Sampling Error, Nonresponse und Missing Data Error), die Datenaufbereitung (Spezifikationsfehler, Validität, Messfehler und inhaltliche Fehler) sowie die Datenanalyse (Record Linkage und Verarbeitungsfehler, Modellierungsfehler, analytische Fehler). Abschließend werden offene Fragen und Herausforderungen bei der Forschung mit unstrukturierten Daten diskutiert. Der Output richtet sich einerseits an Studierende sowie Forschende der Sozial-, Verhaltens- und Wirtschaftswissenschaften, andererseits an alle, die mit unstrukturierten Daten arbeiten und Schlüsse aus diesen für praktische Anwendungsfragen ziehen

    Data Mining e Twitter na gestão da informação em tempos de pandemia no Brasil

    Get PDF
    Dissertação de mestrado, Cultura Científica e Divulgação das Ciências, Universidade de Lisboa, Faculdade de Ciências, Instituto de Ciências Sociais, Instituto de Educação, 2021A grande quantidade e crescimento de dados produzidos em redes sociais proporciona um empoderamento aos seus usuários. A partir desta realidade e a considerar o contexto pandémico brasileiro, este trabalho aplica técnicas analíticas de análise de redes sociais, nomeadamente o Twitter. Estas análises permitem entender as preocupações, as perceções dos usuários e identificar os sentimentos expressos por usuários desta rede social em relação à pandemia do Covid-19. Dos 138.648 tweets coletados e analisados do Brasil, entre janeiro de 2020 a março de 2021, os resultados identificaram que as postagens, utilizaram palavras ou ideias que remetem principalmente a medidas profiláticas, para evitar a propagação da doença, como a testagem, as vacinas e aos impactos sociais e econômicos do progressivo fechamento da economia. O conjunto de wordclouds, geradas mensalmente, revelou palavras, que dão indícios dos principais fatos para o período no Brasil e no mundo. Também ganham destaques, postagens envolvendo as consequências emocionais e psicológicas do isolamento social, apontando maioria de comentários classificados como negativos. Os dados demonstraram um comportamento crescente similar entre a curva acumulada de casos confirmados de Covid-19 e o número de menções as palavras-chaves adotadas para extração dos tweets. A análise das tendências das principais palavras empregadas (> 1000 menções durante todo o período de análise) revelou como mais citadas: “vacina”, com 8.037 menções; “casa”, com 6.816; “Deus”, com 5.130; “contra”, com 5.115; “caso”, com 4.920; “mortes”, com 4.617; “teste”, com 3.410; “Bolsonaro”, com 2.868 e “tratamento”, com 1.082 citações. Os dados permitiram perceber a preocupação com o modo como é feita a gestão da pandemia e do “tratamento” da doença, suas consequências, por um lado, mas também a necessidade de algo que vai além da ajuda científica, o contraditório e a importância das ações do presidente. A análise de sentimentos apresentou a polaridade negativa (50%) predominantemente, seguida da polaridade neutra com 33%. Já para a análise geográfica, apenas 6% dos tweets apresentavam geolocalização. A Região Sudeste apresentou o maior número de tweets postados (49%) a maioria deles foi classificado também com a polaridade negativa (56%). Os resultados obtidos (64%) de acurácia apontam potencialidades do uso dessas técnicas para análises do Twitter em língua portuguesa. A relação evidente entre tweets e acontecimentos/comentários revelados por organismos responsáveis ou outros meios de comunicação comprovam a importância, que a comunicação de risco pode ter no impacto da comunidade e como uma análise semelhante à efetuada neste estudo poderá contribuir para a tomada de decisões e avaliação do impacte de uma comunicação.The large amount and growth of data produced by users on social networks provides an empowerment to its users. Based on this reality and considering the Brazilian pandemic context, this work applies analytical techniques of social network analysis, namely Twitter. These analyses allow us to understand the concerns, the perceptions of users, and to identify the feelings expressed by users of this social network regarding the Covid-19 pandemic. The 138,648 tweets collected and analyzed from Brazil, between January 2020 to March 2021, identified that the tweets posted, used words or ideas that refer mainly to prophylactic measures, to prevent the spread of the disease, such as testing, vaccines, and to the social and economic impacts of the progressive shutdown of the economy. The set of wordclouds, generated monthly, revealed words, which give indications of the main facts for the period in Brazil and in the world. Posts involving the emotional and psychological consequences of social isolation also stand out, pointing to a majority of comments classified as negative. The data show a similar increasing behavior between the cumulative curve of confirmed cases of Covid-19 and the number of mentions of the keywords adopted to extract the tweets in this work. The trend analysis of the main words used (> 1000 mentions through out the analysis period) revealed as the most cited: "vaccine", with 8,037 mentions; "home", with 6,816; "God", with 5,130; "against", with 5,115; "case", with 4,920; "deaths", with 4,617; "test", with 3,410; "Bolsonaro", with 2,868 and "treatment", with 1,082 citations revealing once again the concern about the way in which the management of the pandemic and the "treatment" of the disease is carried out, its consequences, on the one hand, but also the need for something that goes beyond scientific aid, the contradictory and the importance of the president's actions, on the other. The sentiment analysis showed the negative polarity (50%) predominantly, followed by the neutral polarity with 33%. For the geographic analysis, only 6% of the tweets presented geolocation. The Southeast Region presented the highest number of posted tweets (49%) and most of them were also classified with negative polarity (56%). The obtained accuracy results (64%) point to the potential of using these techniques for Twitter analysis in Portuguese language. The evident relationship between tweets and events/comments revealed by responsible bodies or other media prove the importance that risk communication can have on community impact and how an analysis similar to the one carried out in this study can contribute to decision making and evaluation of the impact of a communication
    corecore