11 research outputs found

    SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

    Get PDF
    The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to di erent type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several di erent domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also signi cantly contributed to new supervised learning paradigms, including multilabel classi cation, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of di erent software packages | from open source to commercial. In this paper, marking the fteen year anniversary of SMOTE, we re ect on the SMOTE journey, discuss the current state of a airs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project 887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016; and the National Science Foundation (NSF) Grant IIS-1447795

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Applied Methuerstic computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Contributions to time series data mining towards the detection of outliers/anomalies

    Get PDF
    148 p.Los recientes avances tecnológicos han supuesto un gran progreso en la recogida de datos, permitiendo recopilar una gran cantidad de datos a lo largo del tiempo. Estos datos se presentan comúnmente en forma de series temporales, donde las observaciones se han registrado de forma cronológica y están correlacionadas en el tiempo. A menudo, estas dependencias temporales contienen información significativa y útil, por lo que, en los últimos años, ha surgido un gran interés por extraer dicha información. En particular, el área de investigación que se centra en esta tarea se denomina minería de datos de series temporales.La comunidad de investigadores de esta área se ha dedicado a resolver diferentes tareas como por ejemplo la clasificación, la predicción, el clustering o agrupamiento y la detección de valores atípicos/anomalías. Los valores atípicos o anomalías son aquellas observaciones que no siguen el comportamiento esperado en una serie temporal. Estos valores atípicos o anómalos suelen representar mediciones no deseadas o eventos de interés, y, por lo tanto, detectarlos suele ser relevante ya que pueden empeorar la calidad de los datos o reflejar fenómenos interesantes para el analista.Esta tesis presenta varias contribuciones en el campo de la minería de datos de series temporales, más específicamente sobre la detección de valores atípicos o anomalías. Estas contribuciones se pueden dividir en dos partes o bloques. Por una parte, la tesis presenta contribuciones en el campo de la detección de valores atípicos o anomalías en series temporales. Para ello, se ofrece una revisión de las técnicas en la literatura, y se presenta una nueva técnica de detección de anomalías en series temporales univariantes para la detección de fugas de agua, basada en el aprendizaje autosupervisado. Por otra parte, la tesis también introduce contribuciones relacionadas con el tratamiento de las series temporales con valores perdidos y demuestra su aplicabilidad en el campo de la detección de anomalías

    The Proceedings of the 23rd Annual International Conference on Digital Government Research (DGO2022) Intelligent Technologies, Governments and Citizens June 15-17, 2022

    Get PDF
    The 23rd Annual International Conference on Digital Government Research theme is “Intelligent Technologies, Governments and Citizens”. Data and computational algorithms make systems smarter, but should result in smarter government and citizens. Intelligence and smartness affect all kinds of public values - such as fairness, inclusion, equity, transparency, privacy, security, trust, etc., and is not well-understood. These technologies provide immense opportunities and should be used in the light of public values. Society and technology co-evolve and we are looking for new ways to balance between them. Specifically, the conference aims to advance research and practice in this field. The keynotes, presentations, posters and workshops show that the conference theme is very well-chosen and more actual than ever. The challenges posed by new technology have underscored the need to grasp the potential. Digital government brings into focus the realization of public values to improve our society at all levels of government. The conference again shows the importance of the digital government society, which brings together scholars in this field. Dg.o 2022 is fully online and enables to connect to scholars and practitioners around the globe and facilitate global conversations and exchanges via the use of digital technologies. This conference is primarily a live conference for full engagement, keynotes, presentations of research papers, workshops, panels and posters and provides engaging exchange throughout the entire duration of the conference

    Systems between information and knowledge : In a memory management model of an extended enterprise

    Get PDF
    The research question of this thesis was how knowledge can be managed with information systems. Information systems can support but not replace knowledge management. Systems can mainly store epistemic organisational knowledge included in content, and process data and information. Certain value can be achieved by adding communication technology to systems. All communication, however, can not be managed. A new layer between communication and manageable information was named as knowformation. Knowledge management literature was surveyed, together with information species from philosophy, physics, communication theory, and information system science. Positivism, post-positivism, and critical theory were studied, but knowformation in extended organisational memory seemed to be socially constructed. A memory management model of an extended enterprise (M3.exe) and knowformation concept were findings from iterative case studies, covering data, information and knowledge management systems. The cases varied from groups towards extended organisation. Systems were investigated, and administrators, users (knowledge workers) and managers interviewed. The model building required alternative sets of data, information and knowledge, instead of using the traditional pyramid. Also the explicit-tacit dichotomy was reconsidered. As human knowledge is the final aim of all data and information in the systems, the distinction between management of information vs. management of people was harmonised. Information systems were classified as the core of organisational memory. The content of the systems is in practice between communication and presentation. Firstly, the epistemic criterion of knowledge is not required neither in the knowledge management literature, nor from the content of the systems. Secondly, systems deal mostly with containers, and the knowledge management literature with applied knowledge. Also the construction of reality based on the system content and communication supports the knowformation concept. Knowformation belongs to memory management model of an extended enterprise (M3.exe) that is divided into horizontal and vertical key dimensions. Vertically, processes deal with content that can be managed, whereas communication can be supported, mainly by infrastructure. Horizontally, the right hand side of the model contains systems, and the left hand side content, which should be independent from each other. A strategy based on the model was defined.Tutkimuksen tavoitteena oli määrittää, miten tietojärjestelmiä voidaan käyttää organisaatioiden tietämyksen hallintaan. Johtopäätöksenä voidaan sanoa, että järjestelmillä voidaan tukea, mutta ei korvata tietojohtamista. Tietojärjestelmiä voidaan käyttää lähinnä organisaation episteemisen tiedon muistina, prosessoitavan tiedon varastointiin. Oleellista lisäarvoa saadaan, jos viestintäteknologiaa käytetään tietojärjestelmien tukena. Kommunikaatiota ei kuitenkaan voida johtaa, sillä se ei perustu prosesseihin, vaan enintään työnkulkuun ja sitä vapaampaan viestintään. Hallitun informaation ja viestinnän välille syntyy knowformaatioksi nimetty kerros, lähinnä organisaatioiden lyhytkestoiseen muistiin. Uusi knowformaatio-käsite on käytännön tapaustutkimusten tulos. Vastaavaa ei aiemmissa tietojohtamisen tutkimuksissa ole esitetty. Tietojohtamisen kirjallisuuden taustaksi tutkittiin fysiikan, filosofian, viestinnän ja tietojenkäsittelytieteen luokitukset. Tapaustutkimuksissa tarkasteltiin useita datan hallinnan, dokumentaation ja tietojohtamisen järjestelmiä organisaation sisäisissä ryhmissä, organisaation laajuisesti sekä organisaation yhteistyökumppaneiden kanssa. Tapauksissa tutkittiin niin järjestelmien ominaisuudet kuin myös eri sidosryhmien kokemukset. Tutkimuksessa tietojärjestelmät luokiteltiin organisaation muistin ytimeen. Knowformaation kerrosta tarvitaan toisaalta koska filosofisen tiedon episteemistä kriteeriä ei edellytetä järjestelmien sisällöltä (eikä tietojohtamisen kirjallisuuden käsitemäärittelyissä) ja toisaalta koska tiedon uudelleenkonstruoinnissa merkitys muuttuu. Tulevien järjestelmien suunnitteluun tarvitaan uusi näkökulma, koska data, informaatio ja knowledge tasojen hierarkia ei erotu eri järjestelmätyyppien käyttäjien sosiaalisesti konstruoidussa todellisuudessa. Tieteen filosofian skaala positivistisesta konstruktivistiseen oli mallin muodostuksessa oleellinen, ja sen validiuden todentamisen jälkeen eksplisiittinen piiloinen -dikotomia mallinettiin uudelleen knowformaatio-käsitteen avulla. Uusi tietomalli ja knowformaatio-käsite tarvitaan työn päätuloksessa, jatketun organisaation muistin hallintamallissa. Sen ääripäihin kuluvat kommunikaatio, jota tuetaan, ja toisessa päässä prosessit, joita hallitaan. Kahden muun entiteetin, järjestelmien ja niiden sisällön, tulisi olla riippumattomia toisistaan. Knowformaatio elää näiden kokonaisuuksien implisiittisillä rajoilla, informaation ja tiedon välisellä harmaalla alueella

    XX Workshop de Investigadores en Ciencias de la Computación - WICC 2018 : Libro de actas

    Get PDF
    Actas del XX Workshop de Investigadores en Ciencias de la Computación (WICC 2018), realizado en Facultad de Ciencias Exactas y Naturales y Agrimensura de la Universidad Nacional del Nordeste, los dìas 26 y 27 de abril de 2018.Red de Universidades con Carreras en Informática (RedUNCI

    XX Workshop de Investigadores en Ciencias de la Computación - WICC 2018 : Libro de actas

    Get PDF
    Actas del XX Workshop de Investigadores en Ciencias de la Computación (WICC 2018), realizado en Facultad de Ciencias Exactas y Naturales y Agrimensura de la Universidad Nacional del Nordeste, los dìas 26 y 27 de abril de 2018.Red de Universidades con Carreras en Informática (RedUNCI

    WICC 2017 : XIX Workshop de Investigadores en Ciencias de la Computación

    Get PDF
    Actas del XIX Workshop de Investigadores en Ciencias de la Computación (WICC 2017), realizado en el Instituto Tecnológico de Buenos Aires (ITBA), el 27 y 28 de abril de 2017.Red de Universidades con Carreras en Informática (RedUNCI
    corecore