66 research outputs found
Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems
There has been much interest in applying techniques that incorporate knowledge from unlabelled data
into a supervised learning system but less effort has been made to compare the effectiveness of different approaches on
real world problems and to analyse the behaviour of the learning system when using different amount of unlabelled data.
In this paper an analysis of the performance of supervised methods enforced by unlabelled data and some semisupervised
approaches using different ratios of labelled to unlabelled samples is presented. The experimental results
show that when supported by unlabelled samples much less labelled data is generally required to build a classifier
without compromising the classification performance. If only a very limited amount of labelled data is available the
results show high variability and the performance of the final classifier is more dependant on how reliable the labelled
data samples are rather than use of additional unlabelled data. Semi-supervised clustering utilising both labelled and
unlabelled data have been shown to offer most significant improvements when natural clusters are present in the
considered problem
Automation and orchestration of hardware and firmware data mining using a smart data analytics platform
Effective data mining is going to be important for differentiating and succeeding in the digital economy especially with increased commoditization and reduced barrier to entry for infrastructure devices like servers, storage and networking systems. There is lot of telemetry data from manufacturing facilities and customers that can be used to drive improved supportability experience, unmatched product quality and reliability of infrastructure devices like servers and storage devices. Currently data mining of hardware, firmware and platform logs is a challenging task as the domain knowledge is complex with expertise for large multinational organization distributed across the world. With increasing complexity and data mining continuing to be a very time consuming task that requires math/statistics skills, diverse programming & machine learning skills and cross domain knowledge, it is important to look at next generation analytics solution tailored to infrastructure vendors to improve supportability, quality, reliability, performance and security. In this publication we propose a smart, automated and generic data analytics platform that enables a 24/7 data mining solution using an built in platform domain modeler, an expert system for analyzing hardware and firmware logs and a policy manager that allows user defined hypothesis to be verified round the clock based on policies and configurable triggers. This smart data analytics platform will help democratize data mining of hardware and firmware logs and help improve troubleshooting complex issues, improve supportability experience, reliability and quality and reduce warranty costs
Recommended from our members
HUMAN RESOURCE INFORMATION SYSTEMS: IMPLEMENTING DATA ANALYTICS TECHNIQUES IN HUMAN RESOURCE FUNCTIONS
This project investigated analytical techniques used by organizations across many industries for HR functions, and how data analytical techniques can help HR departments work efficiently. The goal was to look into the obstacles and opportunities that companies have when using HR analytics as a tool in their businesses. This project used secondary data acquired from earlier research articles, journals from the years 2016 to 2019, blogs, and websites to investigate theories and applications of HR analytics. It also examined the need of analytics to assist HR leaders in thinking about the implications of these technologies in future work and how data analytics will shape the HR system in the future. The project examined the implementation of different AI and Data Analytics techniques - Predictive, Perspective, Descriptive and Diagnostic analytics that are already in use by organizations to maximize efficiency in the HR department. The project also uncovered the following important issues faced by the organizations: lack of technical skills, cost required for training, and provided possible solutions by using data analytical techniques
Building nonlinear data models with self-organizing maps
We study the extraction of nonlinear data models in high dimensional spaces with modified self-organizing maps. Our algorithm maps lower dimensional lattice into a high dimensional space without topology violations by tuning the neighborhood widths locally. The approach is based on a new principle exploiting the specific dynamical properties of the first order phase transition induced by the noise of the
data. The performance of the algorithm is demonstrated for one- and two-dimensional principal manifolds and for sparse data sets
Effective Knowledge Representation Through Data Modelling Approaches
Data modelling can be seen as knowledge representation in terms of sharing the same philosophical assumptions. In data modelling process, the recognition of the philosophical background on human inquiry and the nature of knowledge pertinent for appreciating the problems is important as different ontological views lead to different conceptions about data models. Recognising and incorporating different forms of organisational knowledge are also important in the data modelling process as a formal representation of some subset of the knowledge, which the organisation needs to carry out its business. This paper discusses the two distinct philosophical foundations for the effective representation of organisational knowledge
Designing a customer data model and defining customer master data in a Finnish SaaS company
In this study a logical customer data model is designed and customer master data in the data
model is defined for a case company. During the process of defining customer data and customer itself, the business glossary of a customer is defined to have clear definitions of a customer and to unify the vocabulary across the case company. Defining the important vocabulary ensures the base to define the customer data and customer master data. In addition, quality aspects are studied or ensuring high-quality customer data in the future.
This study aims to understand what customer data in the case company is and model it to a
logical data model to unify siloed operations, systems, and data. The case company is a Finnish
Software as a Service company. It is in the middle of a merging process due to recent company
acquisitions. The case company wants to have common customer data and customer master
data. The case company does not have master data defined. It is important to identify, which
data is critical to the business so that the case company can have the one truth and the development activities can be targeted into the right direction to ensure the most advantage.
The research method of this study is design research. The empirical part of this study is done by
workshops, and there are two rounds of workshops. First round analyses the current situation
based on the processes and different functions in the company, that are working with customer
data. The outcome of the first workshop round is customer terminology and its definitions and
the customer data model. The second round concentrates on iterative development of the terminology and customer data model, and further identifying development needs, restrictions, and possibilities of having a common customer data model and master data. After the workshops,the terminology and data model are developed with internal experts. Lastly, there is a review event, where the participants get to see and comment the designed customer data model and the identified customer master data.
After this study, the case company has a clear definition of what is a customer, and how it should be modeled in a logical data model in the future to have one common customer data structure to unify the case company. Also, the case company has the most important, necessary, common customer data, the customer master data defined. The next step after this study is to plan the implementation of the customer data designs of this study, taking into account the quality principles, that were defined in this study to support the sustainability of the designs.Tässä tutkimuksessa suunnitellaan kohdeyritykselle asiakastiedon looginen tietomalli ja määritellään asiakkaan ydintieto. Prosessin aikana määritellään, mikä on asiakas ja mitä on asiakastieto, ja sitä myötä tehdään sanasto asiakkaaseen liittyvistä termeistä. Sanaston on tarkoitus selkeyttää ja yhdistää asiakkaaseen liittyvää sanastoa kohdeyrityksessä. Tärkeän sanaston määrittely mahdollistaa asiakastiedon sekä asiakastiedon ydintiedon määrittelyn. Lisäksi tässä tutkimuksessa tutkitaan, mitä täytyy ottaa huomioon, jotta tulevaisuudessa asiakastiedot ovat korkealaatuisia. Tämä tutkimus pyrkii ymmärtämään, mitä asiakastieto on kohdeyritykselle, ja mallintaa sen loogiseksi tietomalliksi, joka yhdistäisi siiloutuneita operaatioita, systeemejä ja dataa. Kohdeyritys on suomalainen ohjelmistopalveluita tarjoava yritys. Kohdeyritys on tehnyt lähimenneisyydessä yritysostoja, ja on nyt keskellä yhdistysmisprosessia. Kohdeyrityksellä ei ole yhteistä määriteltyä ydintietoa. On tärkeää tunnistaa, mikä tieto on kriittistä yritykselle, jotta yrityksellä olisi yksi yhteinen totuus asiakastiedoista, ja kehityshankkeet voitaisiin kohdistaa oikein, jotta voidaan taata
suurin hyöty.
Tämän tutkimuksen tutkimusmetodi on suunnittelututkimus. Tutkimuksen empiirinen osa suoritetaan työpajojen avulla, ja ne järjestetään kahdessa kierroksessa. Ensimmäinen kierros analysoi nykytilannetta prosessien ja eri yrityksen toimintojen kautta, jotka ovat asiakastietojen
kanssa tekemisissä. Ensimmäisen työpajakierroksen tuloksena muodostetaan terminologia asiakkaasta ja asiakkaan tietomalli. Toinen työpajakierros keskittyy ensimmäisen työpajakierroksen tulosten iteratiivisen kehittämiseen sekä tunnistamaan mahdollisuuksia, rajoitteita ja haasteita, jotka liittyvät siihen, että yrityksellä olisi yhteinen asiakastiedon tietomalli ja ydintieto. Työpajojen jälkeen terminologiaa ja tietomallia kehitetään yrityksen sisäisten asiantuntijoiden kanssa. Tutkimuksen viimeisessä vaiheessa järjestetään tilaisuus, jossa esitellään ja käydään läpi suunniteltu tietomalli ja määritelty ydintieto, ja osallistujilla on mahdollisuus kommentoida tuloksia, ja kommenttien perusteella tehdään viimeisiä pieniä tarkennuksia.
Tämän tutkimuksen jälkeen yrityksellä on selkeä määritelmä siitä, mikä on asiakas, ja kuinka asiakastiedot tulisi mallintaa loogiseksi tietomalliksi, jotta kohdeyrityksellä voisi olla yhtenäinen asiakastiedon rakenne, joka yhtenäistäisi kohdeyritystä. Sen lisäksi yrityksellä on määriteltynä asiakastiedon tärkein, välttämättömin, yhteinen ydintieto. Tämän tutkimuksen jälkeen seuraava askel on suunnitella luodun mallin implementointi ottaen huomioon laatuun liittyvät periaatteet,
jotka määriteltiin tässä tutkimuksessa tukemaan suunnitelman kestävyyttä
Synchronic Curation for Assessing Reuse and Integration Fitness of Multiple Data Collections
Data driven applications often require using data integrated from different, large, and continuously updated collections. Each of these collections may present gaps, overlapping data, have conflicting information, or complement each other. Thus, a curation need is to continuously assess if data from multiple collections are fit for integration and reuse. To assess different large data collections at the same time, we present the Synchronic Curation (SC) framework. SC involves processing steps to map the different collections to a unifying data model that represents research problems in a scientific area. The data model, which includes the collections' provenance and a data dictionary, is implemented in a graph database where collections are continuously ingested and can be queried. SC has a collection analysis and comparison module to track updates, and to identify gaps, changes, and irregularities within and across collections. Assessment results can be accessed interactively through a web-based interactive graph. In this paper we introduce SC as an interdisciplinary enterprise, and illustrate its capabilities through its implementation in ASTRIAGraph, a space sustainability knowledge system
A Resource-Aware and Time-Critical IoT Framework
Internet of Things (IoT) systems produce great
amount of data, but usually have insufficient resources to
process them in the edge. Several time-critical IoT scenarios
have emerged and created a challenge of supporting low latency
applications. At the same time cloud computing became a success
in delivering computing as a service at affordable price with great
scalability and high reliability. We propose an intelligent resource
allocation system that optimally selects the important IoT data
streams to transfer to the cloud for processing. The optimization
runs on utility functions computed by predictor algorithms that
forecast future events with some probabilistic confidence based
on a dynamically recalculated data model. We investigate ways of
reducing specifically the upload bandwidth of IoT video streams
and propose techniques to compute the corresponding utility
functions. We built a prototype for a smart squash court and
simulated multiple courts to measure the efficiency of dynamic
allocation of network and cloud resources for event detection
during squash games. By continuously adapting to the observed
system state and maximizing the expected quality of detection
within the resource constraints our system can save up to 70%
of the resources compared to the naive solution
Modeling Black Literature: Behind the Screen with the Black Bibliography Project
The Black Bibliography Project (BBP) plans to produce a bibliographic database of printed works by Black writers from the eighteenth to the twenty-first centuries. With the support of the Beinecke Library and a grant from the Mellon Foundation, project co-PIs and codirectors Jacqueline Goldsby and Meredith McGill collaborated with a team of librarians from Yale to develop the data model for their database. Drawing on Beinecke’s James Weldon Johnson Memorial Collection to pull case studies, the team of librarians developed a Linked Data model for BBP in an instance of Wikibase and trained and supported a group of graduate student bibliographers in a pilot phase of data entry. This essay details our collaboration with the BBP codirectors and other contributing faculty as well as our training of the graduate student bibliographers. It also explores how a project conceived as a scholarly intervention additionally became an intervention in the historic inequalities and gaps in cataloging description and access
- …