Search CORE

6 research outputs found

Developing a Machine Learning based Systematic Investment Startegy: A case study for the Construction Industry

Author: Brea García Eric
Publication venue: Universitat Politècnica de Catalunya
Publication date: 28/01/2020
Field of study

In this research work, an end-to-end systematic investment strategy based on machine learning models and leveraging the construction industry operational and management practices knowledge, is implemented. First, a literature research in the field of behavioral finance is done, presenting the current state of the knowledge and trends in the industry. A suitable investment opportunity exploiting prevailing market inefficiencies around earnings announcements is identified. Second, an extensive literature research is performed identifying the most relevant characteristics of construction companies’ operations and major risk factors they are exposed to. These insights are used to engineer a set of relevant variables. Third, advanced statistical techniques are used to select the most relevant subset of features, which includes market and analysts’ expectation data, macroeconomic indicators, the delay in reporting earnings, and the most important financial dimensions for construction firms. Fourth, the earnings’ surprise classification problem is characterized by a class imbalance and asymmetric misclassification costs. These issues are a consequence of the desired business application, and are addressed by selecting an appropriate evaluation metric. Additionally, considerations on the temporal dimension and generative process of the data are made to select an appropriate validation scheme. Five different state-of-the-art machine learning algorithms are considered: a multinomial logistic regression, a bagging classifier, a random forest, an XGBoost and a linear Support Vector Machine. The multinomial logistic regression is found to be the most suitable model, exhibiting a bias towards predicting positive earnings’ surprises over the rest of classes. The firm size, and the profitability and valuation measures, portrayed by the Return on Assets and Enterprise Value multiples, are found to be the most important variables when predicting earnings surprises. To conclude, the systematic investment strategy based on the investment signals produced by the selected machine learning model is back-tested, being the performance of the long-short portfolio driven by the positive surprise one as a consequence of the selected model bias. Keywords: Quantitative Investing, Machine Learning, Behavioral Financ

UPCommons. Portal del coneixement obert de la UPC

Deep learning approach for epileptic seizure detection

Author: Mäkinen N. (Niina)
Publication venue: University of Oulu
Publication date: 11/05/2020
Field of study

Abstract. Epilepsy is the most common brain disorder that affects approximately fifty million people worldwide, according to the World Health Organization. The diagnosis of epilepsy relies on manual inspection of EEG, which is error-prone and time-consuming. Automated epileptic seizure detection of EEG signal can reduce the diagnosis time and facilitate targeting of treatment for patients. Current detection approaches mainly rely on the features that are designed manually by domain experts. The features are inflexible for the detection of a variety of complex patterns in a large amount of EEG data. Moreover, the EEG is non-stationary signal and seizure patterns vary across patients and recording sessions. EEG data always contain numerous noise types that negatively affect the detection accuracy of epileptic seizures. To address these challenges deep learning approaches are examined in this paper. Deep learning methods were applied to a large publicly available dataset, the Children’s Hospital of Boston-Massachusetts Institute of Technology dataset (CHB-MIT). The present study includes three experimental groups that are grouped based on the pre-processing steps. The experimental groups contain 3–4 experiments that differ between their objectives. The time-series EEG data is first pre-processed by certain filters and normalization techniques, and then the pre-processed signal was segmented into a sequence of non-overlapping epochs. Second, time series data were transformed into different representations of input signals. In this study time-series EEG signal, magnitude spectrograms, 1D-FFT, 2D-FFT, 2D-FFT magnitude spectrum and 2D-FFT phase spectrum were investigated and compared with each other. Third, time-domain or frequency-domain signals were used separately as a representation of input data of VGG or DenseNet 1D. The best result was achieved with magnitude spectrograms used as representation of input data in VGG model: accuracy of 0.98, sensitivity of 0.71 and specificity of 0.998 with subject dependent data. VGG along with magnitude spectrograms produced promising results for building personalized epileptic seizure detector. There was not enough data for VGG and DenseNet 1D to build subject-dependent classifier.Epileptisten kohtausten havaitseminen syväoppimisella lähestymistavalla. Tiivistelmä. Epilepsia on yleisin aivosairaus, joka Maailman terveysjärjestön mukaan vaikuttaa noin viiteenkymmeneen miljoonaan ihmiseen maailmanlaajuisesti. Epilepsian diagnosointi perustuu EEG:n manuaaliseen tarkastamiseen, mikä on virhealtista ja aikaa vievää. Automaattinen epileptisten kohtausten havaitseminen EEG-signaalista voi potentiaalisesti vähentää diagnoosiaikaa ja helpottaa potilaan hoidon kohdentamista. Nykyiset tunnistusmenetelmät tukeutuvat pääasiassa piirteisiin, jotka asiantuntijat ovat määritelleet manuaalisesti, mutta ne ovat joustamattomia monimutkaisten ilmiöiden havaitsemiseksi suuresta määrästä EEG-dataa. Lisäksi, EEG on epästationäärinen signaali ja kohtauspiirteet vaihtelevat potilaiden ja tallennusten välillä ja EEG-data sisältää aina useita kohinatyyppejä, jotka huonontavat epilepsiakohtauksen havaitsemisen tarkkuutta. Näihin haasteisiin vastaamiseksi tässä diplomityössä tarkastellaan soveltuvatko syväoppivat menetelmät epilepsian havaitsemiseen EEG-tallenteista. Aineistona käytettiin suurta julkisesti saatavilla olevaa Bostonin Massachusetts Institute of Technology lastenklinikan tietoaineistoa (CHB-MIT). Tämän työn tutkimus sisältää kolme koeryhmää, jotka eroavat toisistaan esikäsittelyvaiheiden osalta: aikasarja-EEG-data esikäsiteltiin perinteisten suodattimien ja normalisointitekniikoiden avulla, ja näin esikäsitelty signaali segmentoitiin epookkeihin. Kukin koeryhmä sisältää 3–4 koetta, jotka eroavat menetelmiltään ja tavoitteiltaan. Kussakin niistä epookkeihin jaettu aikasarjadata muutettiin syötesignaalien erilaisiksi esitysmuodoiksi. Tässä tutkimuksessa tutkittiin ja verrattiin keskenään EEG-signaalia sellaisenaan, EEG-signaalin amplitudi-spektrogrammeja, 1D-FFT-, 2D-FFT-, 2D-FFT-amplitudi- ja 2D-FFT -vaihespektriä. Näin saatuja aika- ja taajuusalueen signaaleja käytettiin erikseen VGG- tai DenseNet 1D -mallien syötetietoina. Paras tulos saatiin VGG-mallilla kun syötetietona oli amplitudi-spektrogrammi ja tällöin tarkkuus oli 0,98, herkkyys 0,71 ja spesifisyys 0,99 henkilöstä riippuvaisella EEG-datalla. VGG yhdessä amplitudi-spektrogrammien kanssa tuottivat lupaavia tuloksia henkilökohtaisen epilepsiakohtausdetektorin rakentamiselle. VGG- ja DenseNet 1D -malleille ei ollut tarpeeksi EEG-dataa henkilöstä riippumattoman luokittelijan opettamiseksi

University of Oulu Repository - Jultika

Prediction and Ranking of Highly Popular Web Content

Author: Nuno Miguel Pereira Moniz
Publication venue
Publication date: 18/07/2017
Field of study

Repositório Aberto da Universidade do Porto

Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC

Author: Soraia Teixeira Ramos
Publication venue
Publication date: 23/07/2019
Field of study

A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC

Repositório Aberto da Universidade do Porto