60 research outputs found
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
An Anomaly Detection Framework for Heterogeneous and Streaming Data
Anomaly detection has become one of the most important research areas due to its wide range of use such as abnormal behavior detection in network traffic, disease detection in MRI images, and fraud detection in credit card transactions. In many real-world anomaly detection problems, we face heterogeneous data comprising different types of attributes including categorical and continuous attributes. The heterogeneity of data makes it really difficult to compare data instances. Furthermore, the behaviors of data may change over time in streaming environments. Finally, it is hard to get the labels of data since we get too many data per day to manually classify them. To tackle these challenges, in the paper, we propose an anomaly detection framework for heterogeneous and streaming data. By introducing our own distance metric for categorical features and using an ensemble of two outlier detection methods, we effectively deal with both heterogeneous and streaming data. Furthermore, the ensemble model keeps updating its backend information during classification tasks so as to adapt to changing data behaviors. The framework, also, provides the interpretation of detected outliers in order to reduce the effort of human experts to get labeled data. Finally, we train a supervised machine learning algorithm using the feedback from human experts for anomaly detection tasks. Our experiment results show the efficacy of the proposed framework
Effect of Extraction Methods on Antifungal Activity of Sea Cucumber (Stichopus Japonicus)
The objective of this study was to investigate the antifungal activity of the soluble matter (SM) and crude saponins (CS) extracted from Stichopus japonicus using pressurized solvent extraction (PSE) with water or aqueous ethanol as a solvent, in comparison with traditional heat reflux extraction (HRE). The extraction yields were also determined for the SM and CS and compared for each extraction process and solvent. The antifungal activity of the SM and CS, extracted from the body wall of Stichopus japonicus using PSE or HRE with water or 70% aqueous ethanol, were investigated. Both SM and CS exhibited their highest antifungal activity when extracted by HRE with 70% ethanol and by HRE with water, respectively, while their highest yields were obtained when extracted by PSE with water. SM has more antifungal than potassium sorbate but weaker than propyl paraben, while CS has more antifungal than the two antifungal agents
XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided he original work is properly cited.Abstract
Background
The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information.
Results
In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets.
Conclusions
Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at
http://biohealth.snu.ac.kr/software/contextTRA
The Seoul National University AGN Monitoring Project. IV. Hα Reverberation Mapping of Six AGNs and the Hα Size–Luminosity Relation
The broad-line region (BLR) size–luminosity relation has paramount importance for estimating the mass of black holes in active galactic nuclei (AGNs). Traditionally, the size of the Hβ BLR is often estimated from the optical continuum luminosity at 5100 Å, while the size of the Hα BLR and its correlation with the luminosity is much less constrained. As a part of the Seoul National University AGN Monitoring Project, which provides 6 yr photometric and spectroscopic monitoring data, we present our measurements of the Hα lags of high-luminosity AGNs. Combined with the measurements for 42 AGNs from the literature, we derive the size–luminosity relations of the Hα BLR against the broad Hα and 5100 Å continuum luminosities. We find the slope of the relations to be 0.61 ± 0.04 and 0.59 ± 0.04, respectively, which are consistent with the Hβ size–luminosity relation. Moreover, we find a linear relation between the 5100 Å continuum luminosity and the broad Hα luminosity across 7 orders of magnitude. Using these results, we propose a new virial mass estimator based on the Hα broad emission line, finding that the previous mass estimates based on scaling relations in the literature are overestimated by up to 0.7 dex at masses lower than 107M⊙
The Seoul National University AGN Monitoring Project IV: H reverberation mapping of 6 AGNs and the H Size-Luminosity Relation
The broad line region (BLR) size-luminosity relation has paramount importance
for estimating the mass of black holes in active galactic nuclei (AGNs).
Traditionally, the size of the H BLR is often estimated from the optical
continuum luminosity at 5100\angstrom{} , while the size of the H BLR
and its correlation with the luminosity is much less constrained. As a part of
the Seoul National University AGN Monitoring Project (SAMP) which provides
six-year photometric and spectroscopic monitoring data, we present our
measurements of the H lags of 6 high-luminosity AGNs. Combined with the
measurements for 42 AGNs from the literature, we derive the size-luminosity
relations of H BLR against broad H and 5100\angstrom{}
continuum luminosities. We find the slope of the relations to be
and , respectively, which are consistent with the \hb{}
size-luminosity relation. Moreover, we find a linear relation between the
5100\angstrom{} continuum luminosity and the broad H luminosity across
7 orders of magnitude. Using these results, we propose a new virial mass
estimator based on the H broad emission line, finding that the previous
mass estimates based on the scaling relations in the literature are
overestimated by up to 0.7 dex at masses lower than ~M.Comment: Accepted for publication in ApJ (Jun. 25th, 2023). 21 pages, 12
figure
The multiplex bead array approach to identifying serum biomarkers associated with breast cancer
Introduction Breast cancer is the most common type of cancer seen in women in western countries. Thus, diagnostic modalities sensitive to early-stage breast cancer are needed. Antibody-based array platforms of a data-driven type, which are expected to facilitate more rapid and sensitive detection of novel biomarkers, have emerged as a direct, rapid means for profiling cancer-specific signatures using small samples. In line with this concept, our group constructed an antibody bead array panel for 35 analytes that were selected during the discovery step. This study was aimed at testing the performance of this 35-plex array panel in profiling signatures specific for primary non-metastatic breast cancer and validating its diagnostic utility in this independent population. Methods Thirty-five analytes were selected from more than 50 markers through screening steps using a serum bank consisting of 4,500 samples from various types of cancer. An antibody-bead array of 35 markers was constructed using the Luminex (TM) bead array platform. A study population consisting of 98 breast cancer patients and 96 normal subjects was analysed using this panel. Multivariate classification algorithms were used to find discriminating biomarkers and validated with another independent population of 90 breast cancer and 79 healthy controls. Results Serum concentrations of epidermal growth factor, soluble CD40-ligand and proapolipoprotein A1 were increased in breast cancer patients. High-molecular-weight-kininogen, apolipoprotein A1, soluble vascular cell adhesion molecule-1, plasminogen activator inhibitor-1, vitamin-D binding protein and vitronectin were decreased in the cancer group. Multivariate classification algorithms distinguished breast cancer patients from the normal population with high accuracy (91.8% with random forest, 91.5% with support vector machine, 87.6% with linear discriminant analysis). Combinatorial markers also detected breast cancer at an early stage with greater sensitivity. Conclusions The current study demonstrated the usefulness of the antibody-bead array approach in finding signatures specific for primary non-metastatic breast cancer and illustrated the potential for early, high sensitivity detection of breast cancer. Further validation is required before array-based technology is used routinely for early detection of breast cancer.Kenny HA, 2008, J CLIN INVEST, V118, P1367, DOI 10.1172/JCI33775Shah FD, 2008, INTEGR CANCER THER, V7, P33, DOI 10.1177/1534735407313883Carlsson A, 2008, EUR J CANCER, V44, P472, DOI 10.1016/j.ejca.2007.11.025Nolen BM, 2008, BREAST CANCER RES, V10, DOI 10.1186/bcr2096Brogren H, 2008, THROMB RES, V122, P271, DOI 10.1016/j.thromres.2008.04.008Varki A, 2007, BLOOD, V110, P1723, DOI 10.1182/blood-2006-10-053736Madsen CD, 2007, J CELL BIOL, V177, P927, DOI 10.1083/jcb.200612058Levenson VV, 2007, BBA-GEN SUBJECTS, V1770, P847, DOI 10.1016/j.bbagen.2007.01.017VAZQUEZMARTIN A, 2007, EUR J CANCER, V43, P1117GARCIA M, 2007, GLOBAL CANC FACTS FIMoore LE, 2006, CANCER EPIDEM BIOMAR, V15, P1641, DOI 10.1158/1055-9965.EPI-05-0980Borrebaeck CAK, 2006, EXPERT OPIN BIOL TH, V6, P833, DOI 10.1517/14712598.6.8.833Zannis VI, 2006, J MOL MED-JMM, V84, P276, DOI 10.1007/s00109-005-0030-4Jemal A, 2006, CA-CANCER J CLIN, V56, P106Silva HC, 2006, NEOPLASMA, V53, P538Chahed K, 2005, INT J ONCOL, V27, P1425Jain KK, 2005, EXPERT OPIN PHARMACO, V6, P1463, DOI 10.1517/14656566.6.9.1463Abe O, 2005, LANCET, V365, P1687Paradis V, 2005, HEPATOLOGY, V41, P40, DOI 10.1002/hep.20505Molina R, 2005, TUMOR BIOL, V26, P281, DOI 10.1159/000089260Furberg AS, 2005, CANCER EPIDEM BIOMAR, V14, P33Benoy IH, 2004, CLIN CANCER RES, V10, P7157Song JS, 2004, BLOOD, V104, P2065, DOI 10.1182/blood-2004-02-0449Schairer C, 2004, J NATL CANCER I, V96, P1311, DOI 10.1093/jnci/djh253Hellman K, 2004, BRIT J CANCER, V91, P319, DOI 10.1038/sj.bjc.6601944Roselli M, 2004, CLIN CANCER RES, V10, P610Zhou AW, 2003, NAT STRUCT BIOL, V10, P541, DOI 10.1038/nsb943Hapke S, 2003, BIOL CHEM, V384, P1073Miller JC, 2003, PROTEOMICS, V3, P56Amirkhosravi A, 2002, BLOOD COAGUL FIBRIN, V13, P505Bonello N, 2002, HUM REPROD, V17, P2272Li JN, 2002, CLIN CHEM, V48, P1296Louhimo J, 2002, ANTICANCER RES, V22, P1759Knezevic V, 2001, PROTEOMICS, V1, P1271Di Micco P, 2001, DIGEST LIVER DIS, V33, P546Ferrigno D, 2001, EUR RESPIR J, V17, P667Webb DJ, 2001, J CELL BIOL, V152, P741Gion M, 2001, EUR J CANCER, V37, P355Schonbeck U, 2001, CELL MOL LIFE SCI, V58, P4Blackwell K, 2000, J CLIN ONCOL, V18, P600Carriero MV, 1999, CANCER RES, V59, P5307Antman K, 1999, JAMA-J AM MED ASSOC, V281, P1470Loskutoff DJ, 1999, APMIS, V107, P54Molina R, 1998, BREAST CANCER RES TR, V51, P109Bajou K, 1998, NAT MED, V4, P923Chan DW, 1997, J CLIN ONCOL, V15, P2322Chu KC, 1996, J NATL CANCER I, V88, P1571vanDalen A, 1996, ANTICANCER RES, V16, P2345Yamamoto N, 1996, CANCER RES, V56, P2827KOCH AE, 1995, NATURE, V376, P517HADDAD JG, 1995, J STEROID BIOCHEM, V53, P579FOEKENS JA, 1994, J CLIN ONCOL, V12, P1648GEARING AJH, 1993, IMMUNOL TODAY, V14, P506HUTCHENS TW, 1993, RAPID COMMUN MASS SP, V7, P576DECLERCK PJ, 1992, J BIOL CHEM, V267, P11693GABRIJELCIC D, 1992, AGENTS ACTIONS S, V38, P350BIEGLMAYER C, 1991, TUMOR BIOL, V12, P138DNISTRIAN AM, 1991, TUMOR BIOL, V12, P82VANDALEN A, 1990, TUMOR BIOL, V11, P189KARAS M, 1988, ANAL CHEM, V60, P2299, DOI 10.1021/ac00171a028LERNER WA, 1983, INT J CANCER, V31, P463WESTGARD JO, 1981, CLIN CHEM, V27, P493TROUSSEAU A, 1865, CLIN MED HOTEL DIEU, V3, P654*R PROJ, R PROJ STAT COMP1
Overseas direct investment and exports in Korea: A time series approach
Overseas direct investment is one of the most important and influencing variables in globalizing economy. And it continues to be a driving force of the globalization process that characterizes the modern world economy. Among others, export related effects are prominent. In this study, the relationship between overseas direct investment and exports is investigated using quarterly data and time series approach. Firstly, the stationarity of variables is examined using unit root test. And the adequacy of using VAR model is tested with co-integration test. Next, the relationship between overseas direct investment and exports is analyzed using Granger causality method. The reliability of Granger causality tests depends on the correct specification of the information universe because the omission of relevant in a third variable exists, which cause both overseas direct investment and export, the causality tests may be spurious, reflecting the influence of the omitted variable. To avoid this problem, exchange rate is included as an exogenous variable. Finally, impulse response analysis is performed to assess the quantitative impact of overseas direct investment on exports and vice versa
Essays in Financial Innovation and FinTech
In my thesis, Essays in Financial Innovation and FinTech, I investigate two growing innovative asset classes, structured retail products (SRPs) and cryptocurrencies.
Chapter 1 examines the market for SRPs. This market has grown rapidly in sales volume and complexity in last two decades. Using a comprehensive dataset on SRPs, I examine investors extrapolation as an explanation of this phenomenon. I find that products with higher past returns have enjoyed higher sales growth, even though past returns do not predict future performance. This effect is stronger for more complex products, leading to the observation of greater popularity of more complex products that happen to deliver better past performance. While there is some evidence of financial intermediaries exploiting investors in early part of the sample, my results further suggest that the rapid market growth induced by extrapolation has led to more competition among intermediaries, which in turn disciplines exploitation.
In Chapter 2, my coauthors and I study Initial Coin Offerings (ICOs). Certification by a crowd of online analysts and early investors can generate excitement among potential token investors, leading to successful ICOs. We test "wisdom of crowds" using novel data on over 1,500 ICOs, including sequential investor subscriptions during token sales. We find that favorable analyst opinions on the underlying project are associated with aggressive initial token subscriptions, which predicts subsequent token sales. Overall, our results suggest that the wisdom of crowds could mitigate information asymmetry in the ICO market.
In Chapter 3, my coauthors and I study pervasive market manipulation activities, Pump-and-dump schemes (P&Ds) in the cryptocurrency market. We study these events using trade-by-trade data and a sample of P&Ds with precisely identified starting time. We find that P&Ds lead to short-term cryptocurrency bubbles featuring dramatic increases in prices, volume and volatility. Prices reach the peak in one minute and a quick reversal follows. Evidence, including a significant price run-up before the start of P&Ds, implies significant wealth transfers from insiders to outsiders. Bittrex, a cryptocurrency exchange, banned P&Ds on November 24, 2017. Using a difference-in-differences, we provide causal evidence that P&Ds are detrimental to the liquidity and price of cryptocurrencies
Essays in Financial Innovation and FinTech
In my thesis, Essays in Financial Innovation and FinTech, I investigate two growing innovative asset classes, structured retail products (SRPs) and cryptocurrencies.
Chapter 1 examines the market for SRPs. This market has grown rapidly in sales volume and complexity in last two decades. Using a comprehensive dataset on SRPs, I examine investors extrapolation as an explanation of this phenomenon. I find that products with higher past returns have enjoyed higher sales growth, even though past returns do not predict future performance. This effect is stronger for more complex products, leading to the observation of greater popularity of more complex products that happen to deliver better past performance. While there is some evidence of financial intermediaries exploiting investors in early part of the sample, my results further suggest that the rapid market growth induced by extrapolation has led to more competition among intermediaries, which in turn disciplines exploitation.
In Chapter 2, my coauthors and I study Initial Coin Offerings (ICOs). Certification by a crowd of online analysts and early investors can generate excitement among potential token investors, leading to successful ICOs. We test "wisdom of crowds" using novel data on over 1,500 ICOs, including sequential investor subscriptions during token sales. We find that favorable analyst opinions on the underlying project are associated with aggressive initial token subscriptions, which predicts subsequent token sales. Overall, our results suggest that the wisdom of crowds could mitigate information asymmetry in the ICO market.
In Chapter 3, my coauthors and I study pervasive market manipulation activities, Pump-and-dump schemes (P&Ds) in the cryptocurrency market. We study these events using trade-by-trade data and a sample of P&Ds with precisely identified starting time. We find that P&Ds lead to short-term cryptocurrency bubbles featuring dramatic increases in prices, volume and volatility. Prices reach the peak in one minute and a quick reversal follows. Evidence, including a significant price run-up before the start of P&Ds, implies significant wealth transfers from insiders to outsiders. Bittrex, a cryptocurrency exchange, banned P&Ds on November 24, 2017. Using a difference-in-differences, we provide causal evidence that P&Ds are detrimental to the liquidity and price of cryptocurrencies
- …