287 research outputs found

    Effective use of multi-omics data for prediction of cancer outcomes

    Get PDF

    3β,12β,14α-Trihydroxy­pregnan-20-one

    Get PDF
    The title compound, C21H34O4, is a steriod of the pregnane family prepared by the sequential oxidation and reduction of 3β,12β-diacet­oxy-20-ethyl­enedioxy­pregnan-14-ene. The con­formations of the six-membered rings are close to chair forms, while the five-membered ring adopts an envelope conformation. All the rings are trans-fused and an intra­molecular O—H⋯O hydrogen bond occurs. In the crystal structure, inter­molecular O—H⋯O hydrogen bonds link the mol­ecules into a two-dimensional network

    An Empirical Examination of IPO Underpricing in Hong Kong and Singapore

    Get PDF
    The objective of this thesis is to investigate the main determinants of IPO underpricing for firms listed in Hong Kong and Singapore from 2004 to 2008. Data collected from the Datastream and Reuters, together with the information disclosure in both stock exchanges is used to examine the significance of different variables in order to explain the IPO underpricing level. We find that operating margin, financial leverage, firm size, IPO offer size and overallotment option exercised, to some extent, influence the IPO underpricing for both markets. Based on the regressions, we could conclude that the difference between the levels of IPO underpricing in Hong Kong and Singapore can be explained by the financial leverage and firm size. Firm size is the primary determinant as compared to financial leverage

    Profiling and Evolution of Intellectual Property

    Full text link
    In recent years, with the rapid growth of Internet data, the number and types of scientific and technological resources are also rapidly expanding. However, the increase in the number and category of information data will also increase the cost of information acquisition. For technology-based enterprises or users, in addition to general papers, patents, etc., policies related to technology or the development of their industries should also belong to a type of scientific and technological resources. The cost and difficulty of acquiring users. Extracting valuable science and technology policy resources from a huge amount of data with mixed contents and providing accurate and fast retrieval will help to break down information barriers and reduce the cost of information acquisition, which has profound social significance and social utility. This article focuses on the difficulties and problems in the field of science and technology policy, and introduces related technologies and developments.Comment: 11 pages. arXiv admin note: text overlap with arXiv:2203.1259

    The Political Economy of Property Rights In China: Local Officials, Incentive Structure, And Private Enterprises

    Get PDF
    My dissertation tries to solve two puzzling questions in China’s economic miracle: why some private firms enjoy better property rights protection than others under similar political and economic institutional conditions. Secondly, under the incentive structure designed by the central government, why some local officials in some localities actively protected private property rights than others hence foster local economic development. Existing literature either emphasizes social networks or focuses on fiscal decentralization and personnel control system. But neither these accounts is able to systematically explain the considerable variation of private property rights protection across regions and over time. I develop a game-theoretical model to study the political economy of property rights in China, that is, a bargaining game between a firm and a local political official to explain these empirical puzzles. Under the current institutional arrangements, resources and constraints created by the incentive structure of the Cadre Evaluation System (CES) affect goals and strategies of both local political officials and private investors in that bargaining relationship. A firm with a high level of Firm Specific Assets (FSAs) may possess strong post-entry bargaining power, and thus enjoy better protection of the local official, but a weak firm is vulnerable to the local officials’ predatory activities, and therefore needs to rely on other mechanisms, such as their social networks or bribery, to overcome the commitment problem. In particular, all else being equal, the availability of an exit option greatly enhances the private firms’ post-entry bargaining power vis-à-vis local political actors. In addition, the degree of symbiotic relationship between the local officials and indigenous private entrepreneurs determines the level of private property protection. Compared to rotated officials, native local officials have fewer political connections with higher-level officials and therefore have less chance to be promoted in the power hierarchy. This condition makes them seek the support of local economic actors for their political survival. Moreover, native local political leaders who were born in their jurisdiction have lower transaction costs, and therefore a symbiotic relationship between native local officials and local private entrepreneurs is feasible. The chances were high that the native leaders would have close ties to local economic actors and face more strict social constraints; therefore, they were more likely to protect local property rights. I statistically test the effects of firms’ characteristics on the security of property rights using different types of evidence including a nationwide survey data in 2012 and media reports on government-related property rights expropriation cases. With the proxy of property rights protection, I also provide systematic empirical evidence from Guangdong province both cross-sectionally and over time to test the effects of different types of local political leaders by investigating the characteristics of 203 prefecture-level political leaders between 1992 and 2008. A detailed case study of local B&B industry policymaking in Xiamen, Fujian province, utilizing government reports, news reports, as well as interview notes of fieldwork, further supports the model

    Benchmark study of feature selection strategies for multi-omics data

    Get PDF
    BACKGROUND: In the last few years, multi-omics data, that is, datasets containing different types of high-dimensional molecular variables for the same samples, have become increasingly available. To date, several comparison studies focused on feature selection methods for omics data, but to our knowledge, none compared these methods for the special case of multi-omics data. Given that these data have specific structures that differentiate them from single-omics data, it is unclear whether different feature selection strategies may be optimal for such data. In this paper, using 15 cancer multi-omics datasets we compared four filter methods, two embedded methods, and two wrapper methods with respect to their performance in the prediction of a binary outcome in several situations that may affect the prediction results. As classifiers, we used support vector machines and random forests. The methods were compared using repeated fivefold cross-validation. The accuracy, the AUC, and the Brier score served as performance metrics. RESULTS: The results suggested that, first, the chosen number of selected features affects the predictive performance for many feature selection methods but not all. Second, whether the features were selected by data type or from all data types concurrently did not considerably affect the predictive performance, but for some methods, concurrent selection took more time. Third, regardless of which performance measure was considered, the feature selection methods mRMR, the permutation importance of random forests, and the Lasso tended to outperform the other considered methods. Here, mRMR and the permutation importance of random forests already delivered strong predictive performance when considering only a few selected features. Finally, the wrapper methods were computationally much more expensive than the filter and embedded methods. CONCLUSIONS: We recommend the permutation importance of random forests and the filter method mRMR for feature selection using multi-omics data, where, however, mRMR is considerably more computationally costly. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04962-x

    Synergistic Effects of Different Levels of Genomic Data for the Staging of Lung Adenocarcinoma: An Illustrative Study

    Get PDF
    Lung adenocarcinoma (LUAD) is a common and very lethal cancer. Accurate staging is a prerequisite for its effective diagnosis and treatment. Therefore, improving the accuracy of the stage prediction of LUAD patients is of great clinical relevance. Previous works have mainly focused on single genomic data information or a small number of different omics data types concurrently for generating predictive models. A few of them have considered multi-omics data from genome to proteome. We used a publicly available dataset to illustrate the potential of multi-omics data for stage prediction in LUAD. In particular, we investigated the roles of the specific omics data types in the prediction process. We used a self-developed method, Omics-MKL, for stage prediction that combines an existing feature ranking technique Minimum Redundancy and Maximum Relevance (mRMR), which avoids redundancy among the selected features, and multiple kernel learning (MKL), applying different kernels for different omics data types. Each of the considered omics data types individually provided useful prediction results. Moreover, using multi-omics data delivered notably better results than using single-omics data. Gene expression and methylation information seem to play vital roles in the staging of LUAD. The Omics-MKL method retained 70 features after the selection process. Of these, 21 (30%) were methylation features and 34 (48.57%) were gene expression features. Moreover, 18 (25.71%) of the selected features are known to be related to LUAD, and 29 (41.43%) to lung cancer in general. Using multi-omics data from genome to proteome for predicting the stage of LUAD seems promising because each omics data type may improve the accuracy of the predictions. Here, methylation and gene expression data may play particularly important roles

    A Diagnostic Model for Kawasaki Disease Based on Immune Cell Characterization From Blood Samples

    Get PDF
    Background: Kawasaki disease (KD) is the leading cause of acquired heart disease in children. However, distinguishing KD from febrile infections early in the disease course remains difficult. Our goal was to estimate the immune cell composition in KD patients and febrile controls (FC), and to develop a tool for KD diagnosis. Methods: We used a machine-learning algorithm, CIBERSORT, to estimate the proportions of 22 immune cell types based on blood samples from children with KD and FC. Using these immune cell compositions, a diagnostic score for predicting KD was then constructed based on LASSO regression for binary outcomes. Results: In the training set (n = 496), a model was fit which consisted of eight types of immune cells. The area under the curve (AUC) values for diagnosing KD in a held-out test set (n = 212) and an external validation set (n = 36) were 0.80 and 0.77, respectively. The most common cell types in KD blood samples were monocytes, neutrophils, CD4(+)-naïve and CD8(+) T cells, and M0 macrophages. The diagnostic score was highly correlated to genes that had been previously reported as associated with KD, such as interleukins and chemokine receptors, and enriched in reported pathways, such as IL-6/JAK/STAT3 and TNFα signaling pathways. Conclusion: Altogether, the diagnostic score for predicting KD could potentially serve as a biomarker. Prospective studies could evaluate how incorporating the diagnostic score into a clinical algorithm would improve diagnostic accuracy further

    Distributed Graph Neural Network Training: A Survey

    Full text link
    Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains. Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs. As a remedy, distributed computing becomes a promising solution of training large-scale GNNs, since it is able to provide abundant computing resources. However, the dependency of graph structure increases the difficulty of achieving high-efficiency distributed GNN training, which suffers from the massive communication and workload imbalance. In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed. Yet, there is a lack of systematic review on the optimization techniques for the distributed execution of GNN training. In this survey, we analyze three major challenges in distributed GNN training that are massive feature communication, the loss of model accuracy and workload imbalance. Then we introduce a new taxonomy for the optimization techniques in distributed GNN training that address the above challenges. The new taxonomy classifies existing techniques into four categories that are GNN data partition, GNN batch generation, GNN execution model, and GNN communication protocol. We carefully discuss the techniques in each category. In the end, we summarize existing distributed GNN systems for multi-GPUs, GPU-clusters and CPU-clusters, respectively, and give a discussion about the future direction on distributed GNN training
    corecore