1,034 research outputs found

    Outlier Detection using Hybrid Approach for Mixed datasets

    Get PDF
    Several approaches of outlier detection are employed in many study areas amongst which distance based and density based outlier detection techniques have gathered most attention of researchers .So we are using hybrid of these two methods. The proposed model uses hybrid of distance and density outlier detection methods and weighted squeezer method for clustering. Advantages of both outlier methods will be combined giving higher result. The clustering algorithm which does not require to specify number of clusters as input which is drawback of many clustering algorithms. Most of the models deals with only single datatype datasets. Here the project deals with mixed datatype datasets. Here we will compare hybrid system with single method system. From performance measures it will be cleared how hybrid system gives better results as compared to single method. DOI: 10.17762/ijritcc2321-8169.150610

    Factor Analysis Of Ordinal Data Based On Weighted Ranking And Its Application To Reduce Perception Variables To Math Lessons Of Senior High School Student

    Get PDF
    The objectives of research are to reduce dimension of ordinal data using factor analysis based on weighted ranking correlation. In general, the correlation is used Spearman ranking correlation. This papers will discuss about application of both methods on the case of simplify of student’s perception variables on math lessons. The samples of this research are 791 students which consist of questionnaire. Factor analysis based on the weighted ranking correlation have given the results the number of factors and variables domination on factors more good than Spearman ranking correlation. The factors that influence perception of senior high school students towards Math lesson are an internal motivation of students, negative assessment of the Math lesson, teaching methods of Math teacher, habit of learning of students in Math and the support of parents, and knowledge of impact and benefits of Math. Key Words: factor analysis, weighted ranking, perception, mat

    Similarity for Binary Data Based on the Value of Entropy and Structure Patterns Categories

    Get PDF
    Similarity of two objects that have a form of binary data, usually calculated based on the frequencies in thecontingency table that includes all discrete random variables. In this article we will discuss the similaritymeasures for binary data based on entropy values and structural patterns of the two object categories. Measuringsimilarity based on the value of entropy and structural pattern of categories can be used as a validation measureof similarity for binary data

    Semantic Exploration of Text Documents with Multi-Faceted Metadata Employing Word Embeddings: The Patent Landscaping Use Case

    Get PDF
    Die Menge der Veröentlichungen, die den wissenschaftlichen Fortschritt dokumentieren, wächst kontinuierlich. Dies erfordert die Entwicklung der technologischen Hilfsmittel für eine eziente Analyse dieser Werke. Solche Dokumente kennzeichnen sich nicht nur durch ihren textuellen Inhalt, sondern auch durch eine Menge von Metadaten-Attributen verschiedenster Art, unter anderem Beziehungen zwischen den Dokumenten. Diese Komplexität macht die Entwicklung eines Visualisierungsansatzes, der eine Untersuchung der schriftlichen Werke unterstützt, zu einer notwendigen und anspruchsvollen Aufgabe. Patente sind beispielhaft für das beschriebene Problem, weil sie in großen Mengen von Firmen untersucht werden, die sich Wettbewerbsvorteile verschaffen oder eigene Forschung und Entwicklung steuern wollen. Vorgeschlagen wird ein Ansatz für eine explorative Visualisierung, der auf Metadaten und semantischen Embeddings von Patentinhalten basiert ist. Wortembeddings aus einem vortrainierten Word2vec-Modell werden genutzt, um Ähnlichkeiten zwischen Dokumenten zu bestimmen. Darüber hinaus helfen hierarchische Clusteringmethoden dabei, mehrere semantische Detaillierungsgrade durch extrahierte relevante Stichworte anzubieten. Derzeit dürfte der vorliegende Visualisierungsansatz der erste sein, der semantische Embeddings mit einem hierarchischen Clustering verbindet und dabei diverse Interaktionstypen basierend auf Metadaten-Attributen unterstützt. Der vorgestellte Ansatz nimmt Nutzerinteraktionstechniken wie Brushing and Linking, Focus plus Kontext, Details-on-Demand und Semantic Zoom in Anspruch. Dadurch wird ermöglicht, Zusammenhänge zu entdecken, die aus dem Zusammenspiel von 1) Verteilungen der Metadatenwerten und 2) Positionen im semantischen Raum entstehen. Das Visualisierungskonzept wurde durch Benutzerinterviews geprägt und durch eine Think-Aloud-Studie mit Patentenexperten evaluiert. Während der Evaluation wurde der vorgestellte Ansatz mit einem Baseline-Ansatz verglichen, der auf TF-IDF-Vektoren basiert. Die Benutzbarkeitsstudie ergab, dass die Visualisierungsmetaphern und die Interaktionstechniken angemessen gewählt wurden. Darüber hinaus zeigte sie, dass die Benutzerschnittstelle eine deutlich größere Rolle bei den Eindrücken der Probanden gespielt hat als die Art und Weise, wie die Patente platziert und geclustert waren. Tatsächlich haben beide Ansätze sehr ähnliche extrahierte Clusterstichworte ergeben. Dennoch wurden bei dem semantischen Ansatz die Cluster intuitiver platziert und deutlicher abgetrennt. Das vorgeschlagene Visualisierungslayout sowie die Interaktionstechniken und semantischen Methoden können auch auf andere Arten von schriftlichen Werken erweitert werden, z. B. auf wissenschaftliche Publikationen. Andere Embeddingmethoden wie Paragraph2vec [61] oder BERT [32] können zudem verwendet werden, um kontextuelle Abhängigkeiten im Text über die Wortebene hinaus auszunutzen

    Automatization of incident resolution

    Get PDF
    Incident management is a key IT Service Management sub process in every organization as a way to deal with the current volume of tickets created every year. Currently, the resolution process is still extremely human labor intensive. A large number of incidents are not from a new, never seen before problem, they have already been solved in the past and their respective resolution have been previously stored in an Incident Ticket System. Automation of repeatable tasks in IT is an important element of service management and can have a considerable impact in an organization. Using a large real-world database of incident tickets, this dissertation explores a method to automatically propose a suitable resolution for a new ticket using previous tickets’ resolution texts. At its core, the method uses machine learning, natural language parsing, information retrieval and mining. The proposed method explores machine learning models like SVM, Logistic Regression, some neural networks architecture and more, to predict an incident resolution category for a new ticket and a module to automatically retrieve resolution action phrases from tickets using part-of-speech pattern matching. In the experiments performed, 31% to 41% of the tickets from a test set was considered as solved by the proposed method, which considering the yearly volume of tickets represents a significant amount of manpower and resources that could be saved.A Gestão de incidentes é um subprocesso chave da Gestão de Serviços de TI em todas as organizações como uma forma de lidar com o volume atual de tickets criados todos os anos. Atualmente, o processo de resolução ainda exige muito trabalho humano. Um grande número de incidentes não são de um problema novo, nunca visto antes, eles já foram resolvidos no passado e sua respetiva resolução foi previamente armazenada em um Sistema de Ticket de Incidentes. A automação de tarefas repetíveis em TI é um elemento importante do Gestão de Serviços e pode ter um impacto considerável em uma organização. Usando um grande conjunto de dados reais de tickets de incidentes, esta dissertação explora um método para propor automaticamente uma resolução adequada para um novo ticket usando textos de resolução de tickets anteriores. Em sua essência, o método usa aprendizado de máquina, análise de linguagem natural, recuperação de informações e mineração. O método proposto explora modelos de aprendizagem automática como SVM, Regressão Logística, arquitetura de algumas redes neurais e mais, para prever uma categoria de resolução de incidentes para um novo ticket e um módulo para extrair automaticamente ações de resolução de tickets usando padrões de classes gramaticais. Nas experiências realizados, 31% a 41% dos tickets de um conjunto de testes foram considerados como resolvidos pelo método proposto, que considerando o volume anual de tickets representa uma quantidade significativa de mão de obra e recursos que poderiam ser economizados

    AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)

    Get PDF
    This book is a collection of the accepted papers presented at the Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD) in conjunction with the 36th AAAI Conference on Artificial Intelligence 2022. During AIBSD 2022, the attendees addressed the existing issues of data bias and scarcity in Artificial Intelligence and discussed potential solutions in real-world scenarios. A set of papers presented at AIBSD 2022 is selected for further publication and included in this book

    Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users

    Full text link
    [EN] The electricity sector is currently undergoing a process of liberalization and separation of roles, which is being implemented under the regulatory auspices of each Member State of the European Union and, therefore, with different speeds, perspectives and objectives that must converge on a common horizon, where Europe will benefit from an interconnected energy market in which producers and consumers can participate in free competition. This process of liberalization and separation of roles involves two consequences or, viewed another way, entails a major consequence from which other immediate consequence, as a necessity, is derived. The main consequence is the increased complexity in the management and supervision of a system, the electrical, increasingly interconnected and participatory, with connection of distributed energy sources, much of them from renewable sources, at different voltage levels and with different generation capacity at any point in the network. From this situation the other consequence is derived, which is the need to communicate information between agents, reliably, safely and quickly, and that this information is analyzed in the most effective way possible, to form part of the processes of decision taking that improve the observability and controllability of a system which is increasing in complexity and number of agents involved. With the evolution of Information and Communication Technologies (ICT), and the investments both in improving existing measurement and communications infrastructure, and taking the measurement and actuation capacity to a greater number of points in medium and low voltage networks, the availability of data that informs of the state of the network is increasingly higher and more complete. All these systems are part of the so-called Smart Grids, or intelligent networks of the future, a future which is not so far. One such source of information comes from the energy consumption of customers, measured on a regular basis (every hour, half hour or quarter-hour) and sent to the Distribution System Operators from the Smart Meters making use of Advanced Metering Infrastructure (AMI). This way, there is an increasingly amount of information on the energy consumption of customers, being stored in Big Data systems. This growing source of information demands specialized techniques which can take benefit from it, extracting a useful and summarized knowledge from it. This thesis deals with the use of this information of energy consumption from Smart Meters, in particular on the application of data mining techniques to obtain temporal patterns that characterize the users of electrical energy, grouping them according to these patterns in a small number of groups or clusters, that allow evaluating how users consume energy, both during the day and during a sequence of days, allowing to assess trends and predict future scenarios. For this, the current techniques are studied and, proving that the current works do not cover this objective, clustering or dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users are developed. These techniques are tested and validated on a database of hourly energy consumption values for a sample of residential customers in Spain during years 2008 and 2009. The results allow to observe both the characterization in consumption patterns of the different types of residential energy consumers, and their evolution over time, and to assess, for example, how the regulatory changes that occurred in Spain in the electricity sector during those years influenced in the temporal patterns of energy consumption.[ES] El sector eléctrico se halla actualmente sometido a un proceso de liberalización y separación de roles, que está siendo aplicado bajo los auspicios regulatorios de cada Estado Miembro de la Unión Europea y, por tanto, con distintas velocidades, perspectivas y objetivos que deben confluir en un horizonte común, en donde Europa se beneficiará de un mercado energético interconectado, en el cual productores y consumidores podrán participar en libre competencia. Este proceso de liberalización y separación de roles conlleva dos consecuencias o, visto de otra manera, conlleva una consecuencia principal de la cual se deriva, como necesidad, otra consecuencia inmediata. La consecuencia principal es el aumento de la complejidad en la gestión y supervisión de un sistema, el eléctrico, cada vez más interconectado y participativo, con conexión de fuentes distribuidas de energía, muchas de ellas de origen renovable, a distintos niveles de tensión y con distinta capacidad de generación, en cualquier punto de la red. De esta situación se deriva la otra consecuencia, que es la necesidad de comunicar información entre los distintos agentes, de forma fiable, segura y rápida, y que esta información sea analizada de la forma más eficaz posible, para que forme parte de los procesos de toma de decisiones que mejoran la observabilidad y controlabilidad de un sistema cada vez más complejo y con más agentes involucrados. Con el avance de las Tecnologías de Información y Comunicaciones (TIC), y las inversiones tanto en mejora de la infraestructura existente de medida y comunicaciones, como en llevar la obtención de medidas y la capacidad de actuación a un mayor número de puntos en redes de media y baja tensión, la disponibilidad de datos sobre el estado de la red es cada vez mayor y más completa. Todos estos sistemas forman parte de las llamadas Smart Grids, o redes inteligentes del futuro, un futuro ya no tan lejano. Una de estas fuentes de información proviene de los consumos energéticos de los clientes, medidos de forma periódica (cada hora, media hora o cuarto de hora) y enviados hacia las Distribuidoras desde los contadores inteligentes o Smart Meters, mediante infraestructura avanzada de medida o Advanced Metering Infrastructure (AMI). De esta forma, cada vez se tiene una mayor cantidad de información sobre los consumos energéticos de los clientes, almacenada en sistemas de Big Data. Esta cada vez mayor fuente de información demanda técnicas especializadas que sepan aprovecharla, extrayendo un conocimiento útil y resumido de la misma. La presente Tesis doctoral versa sobre el uso de esta información de consumos energéticos de los contadores inteligentes, en concreto sobre la aplicación de técnicas de minería de datos (data mining) para obtener patrones temporales que caractericen a los usuarios de energía eléctrica, agrupándolos según estos mismos patrones en un número reducido de grupos o clusters, que permiten evaluar la forma en que los usuarios consumen la energía, tanto a lo largo del día como durante una secuencia de días, permitiendo evaluar tendencias y predecir escenarios futuros. Para ello se estudian las técnicas actuales y, comprobando que los trabajos actuales no cubren este objetivo, se desarrollan técnicas de clustering o segmentación dinámica aplicadas a curvas de carga de consumo eléctrico diario de clientes domésticos. Estas técnicas se prueban y validan sobre una base de datos de consumos energéticos horarios de una muestra de clientes residenciales en España durante los años 2008 y 2009. Los resultados permiten observar tanto la caracterización en consumos de los distintos tipos de consumidores energéticos residenciales, como su evolución en el tiempo, y permiten evaluar, por ejemplo, cómo influenciaron en los patrones temporales de consumos los cambios regulatorios que se produjeron en España en el sector eléctrico durante esos años.[CA] El sector elèctric es troba actualment sotmès a un procés de liberalització i separació de rols, que s'està aplicant davall els auspicis reguladors de cada estat membre de la Unió Europea i, per tant, amb distintes velocitats, perspectives i objectius que han de confluir en un horitzó comú, on Europa es beneficiarà d'un mercat energètic interconnectat, en el qual productors i consumidors podran participar en lliure competència. Aquest procés de liberalització i separació de rols comporta dues conseqüències o, vist d'una altra manera, comporta una conseqüència principal de la qual es deriva, com a necessitat, una altra conseqüència immediata. La conseqüència principal és l'augment de la complexitat en la gestió i supervisió d'un sistema, l'elèctric, cada vegada més interconnectat i participatiu, amb connexió de fonts distribuïdes d'energia, moltes d'aquestes d'origen renovable, a distints nivells de tensió i amb distinta capacitat de generació, en qualsevol punt de la xarxa. D'aquesta situació es deriva l'altra conseqüència, que és la necessitat de comunicar informació entre els distints agents, de forma fiable, segura i ràpida, i que aquesta informació siga analitzada de la manera més eficaç possible, perquè forme part dels processos de presa de decisions que milloren l'observabilitat i controlabilitat d'un sistema cada vegada més complex i amb més agents involucrats. Amb l'avanç de les tecnologies de la informació i les comunicacions (TIC), i les inversions, tant en la millora de la infraestructura existent de mesura i comunicacions, com en el trasllat de l'obtenció de mesures i capacitat d'actuació a un nombre més gran de punts en xarxes de mitjana i baixa tensió, la disponibilitat de dades sobre l'estat de la xarxa és cada vegada major i més completa. Tots aquests sistemes formen part de les denominades Smart Grids o xarxes intel·ligents del futur, un futur ja no tan llunyà. Una d'aquestes fonts d'informació prové dels consums energètics dels clients, mesurats de forma periòdica (cada hora, mitja hora o quart d'hora) i enviats cap a les distribuïdores des dels comptadors intel·ligents o Smart Meters, per mitjà d'infraestructura avançada de mesura o Advanced Metering Infrastructure (AMI). D'aquesta manera, cada vegada es té una major quantitat d'informació sobre els consums energètics dels clients, emmagatzemada en sistemes de Big Data. Aquesta cada vegada major font d'informació demanda tècniques especialitzades que sàpiguen aprofitar-la, extraient-ne un coneixement útil i resumit. La present tesi doctoral versa sobre l'ús d'aquesta informació de consums energètics dels comptadors intel·ligents, en concret sobre l'aplicació de tècniques de mineria de dades (data mining) per a obtenir patrons temporals que caracteritzen els usuaris d'energia elèctrica, agrupant-los segons aquests mateixos patrons en una quantitat reduïda de grups o clusters, que permeten avaluar la forma en què els usuaris consumeixen l'energia, tant al llarg del dia com durant una seqüència de dies, i que permetent avaluar tendències i predir escenaris futurs. Amb aquesta finalitat, s'estudien les tècniques actuals i, en comprovar que els treballs actuals no cobreixen aquest objectiu, es desenvolupen tècniques de clustering o segmentació dinàmica aplicades a corbes de càrrega de consum elèctric diari de clients domèstics. Aquestes tècniques es proven i validen sobre una base de dades de consums energètics horaris d'una mostra de clients residencials a Espanya durant els anys 2008 i 2009. Els resultats permeten observar tant la caracterització en consums dels distints tipus de consumidors energètics residencials, com la seua evolució en el temps, i permeten avaluar, per exemple, com van influenciar en els patrons temporals de consums els canvis reguladors que es van produir a Espanya en el sector elèctric durant aquests anys.Benítez Sánchez, IJ. (2015). Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59236TESI

    Identifying Optimal Technical and Tactical Performance Characteristics in Australian Football

    Full text link
    This study identified the optimal technical and tactical performance characteristics of Australian football teams. The application of machine learning approaches identified the key indicators of successful AFL teams. The main findings of this research provide an evidence-base for key stakeholders to inform their training and match day decisions

    Geodemographics: creating a classification at the level of the individual

    Get PDF
    This research challenges the existing geodemographics ethos by investigating the benefit to be gained from a move away from conventional areal unit categorisation to systems capable of classifying at the individual level. This research will present a unique framework through which classifications can be developed at this level of resolution. Inherently methodological, a local classification for Leeds (UK) will be presented plus further examples of this applied framework. Issues such as ecological fallacy, Modifiable Areal Unit Problem and generalisation are aspects to be considered when interpreting spatially aggregated data. A move away from such problems is one of the central objectives of this research. Data variables from the UK’s 2001 Small Area Microdata file underpin this research. These variables undergo transformation from categorical states into scale variables based on gross monthly income data present in the British Household Panel Survey therefore enabling effective clustering. Micro-simulation is then employed to create an individual-level population. The framework presented comprises entirely census variables but also demonstrates a linkage capability to other non-census datasets, such as the British Household Panel Survey (now Understanding Society), for deeper profiling, classification validation and enrichment
    corecore