30 research outputs found

    Exact Single-Source SimRank Computation on Large Graphs

    Full text link
    SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-kk SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than 10610^6 nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-kk SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

    Inefficiency of K-FAC for Large Batch Size Training

    Full text link
    In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch. However, for both training loss and testing error, recent results analyzing large batch Stochastic Gradient Descent (SGD) have found sharp diminishing returns, beyond a certain critical batch size. In the hopes of addressing this, it has been suggested that the Kronecker-Factored Approximate Curvature (\mbox{K-FAC}) method allows for greater scalability to large batch sizes, for non-convex machine learning problems such as neural network optimization, as well as greater robustness to variation in model hyperparameters. Here, we perform a detailed empirical analysis of large batch size training %of these two hypotheses, for both \mbox{K-FAC} and SGD, evaluating performance in terms of both wall-clock time and aggregate computational cost. Our main results are twofold: first, we find that both \mbox{K-FAC} and SGD doesn't have ideal scalability behavior beyond a certain batch size, and that \mbox{K-FAC} does not exhibit improved large-batch scalability behavior, as compared to SGD; and second, we find that \mbox{K-FAC}, in addition to requiring more hyperparameters to tune, suffers from similar hyperparameter sensitivity behavior as does SGD. We discuss extensive results using ResNet and AlexNet on \mbox{CIFAR-10} and SVHN, respectively, as well as more general implications of our findings

    Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

    Full text link
    Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further. We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at most 2.3%2.3\% performance degradation, even with ultra-low precision quantization down to 2 bits, corresponding up to 13×13\times compression of the model parameters, and up to 4×4\times compression of the embedding table as well as activations. Among all tasks, we observed the highest performance loss for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as well as visualization, we show that this is related to the fact that current training/fine-tuning strategy of BERT does not converge for SQuAD

    Social media mining under the COVID-19 context: Progress, challenges, and opportunities

    Full text link
    Social media platforms allow users worldwide to create and share information, forging vast sensing networks that allow information on certain topics to be collected, stored, mined, and analyzed in a rapid manner. During the COVID-19 pandemic, extensive social media mining efforts have been undertaken to tackle COVID-19 challenges from various perspectives. This review summarizes the progress of social media data mining studies in the COVID-19 contexts and categorizes them into six major domains, including early warning and detection, human mobility monitoring, communication and information conveying, public attitudes and emotions, infodemic and misinformation, and hatred and violence. We further document essential features of publicly available COVID-19 related social media data archives that will benefit research communities in conducting replicable and repro�ducible studies. In addition, we discuss seven challenges in social media analytics associated with their potential impacts on derived COVID-19 findings, followed by our visions for the possible paths forward in regard to social media-based COVID-19 investigations. This review serves as a valuable reference that recaps social media mining efforts in COVID-19 related studies and provides future directions along which the information harnessed from social media can be used to address public health emergencies

    Identifying the Conformational Isomers of Single-Molecule Cyclohexane at Room Temperature

    Get PDF
    构象异构是化学中的基本问题。然而对于环己烷等柔性分子,由于其在室温下极快的互变异构过程,基于系综的表征方法(如核磁等)只能得到所有构象平均贡献的结果。为了应对这一挑战,化学化工学院洪文晶教授与夏海平教授课题组为在室温条件下对柔性分子构象的定量分析与表征这一挑战,课题组成功实现了在室温条件下对环己烷两种椅式构象的电学表征与比例识别。同时,通过纳米电极间隙对分子的限域作用,发现在宏观尺度下极不稳定的扭船式中间体得以在单分子尺度稳定存在,这为不稳定中间体的研究提供了重要表征方法。 这一研究工作是在化学化工学院洪文晶教授、夏海平教授共同指导下完成的,iChEM直博生唐淳与化工系研究生唐永翔为论文共同第一作者。师佳副教授与刘俊扬副研究员为该工作提供了指导,博士后陈志昕、博士研究生陈李珏以及研究生叶艺玲、严哲玮、张珑漪共同参与了该工作。【Abstract】Isomerism reflects the ubiquitous nature that molecules with the same molecular formula show different structures. The interconversion between conformational isomers of flexible molecules is quite fast owing to the low barriers of around 10 kcal mol−1, leading to average signal contributed by all the possible isomers characterized by ensemble methods. On this account, identifying the conformational isomers of flexible molecules at room temperature has a substantial challenge. Here, we develop a single-molecule approach to identify the conformational isomers of cyclohexane at room temperature through the single-molecule electrical characterization. By noise analysis and feature extraction of the conductance of single-molecule junctions, we quantificationally identified two chair isomers of cyclohexane at room temperature, while such identification is only feasible at low temperatures by ensemble characterization. The strategy to apply the single-molecule approach to identify conformational isomers paves the avenue to investigate the isomerization of flexible molecules beyond the ensemble methods.This work was supported by the National Natural Science Foundation of China (nos, 21722305, 21673195, 21703188, and U1705254), the National Key R&D Program of China (2017YFA0204902), China Postdoctoral Science Foundation (no. 2017M622060), and the Fundamental Research Funds for Xiamen University (20720190002).该工作获得了科技部国家重点研发计划、国家自然科学基金等项目的资助,也得到了固体表面物理化学国家重点实验室、能源材料化学协同创新中心的支持

    Deep learning assisted diagnosis system: improving the diagnostic accuracy of distal radius fractures

    Get PDF
    ObjectivesTo explore an intelligent detection technology based on deep learning algorithms to assist the clinical diagnosis of distal radius fractures (DRFs), and further compare it with human performance to verify the feasibility of this method.MethodsA total of 3,240 patients (fracture: n = 1,620, normal: n = 1,620) were included in this study, with a total of 3,276 wrist joint anteroposterior (AP) X-ray films (1,639 fractured, 1,637 normal) and 3,260 wrist joint lateral X-ray films (1,623 fractured, 1,637 normal). We divided the patients into training set, validation set and test set in a ratio of 7:1.5:1.5. The deep learning models were developed using the data from the training and validation sets, and then their effectiveness were evaluated using the data from the test set. Evaluate the diagnostic performance of deep learning models using receiver operating characteristic (ROC) curves and area under the curve (AUC), accuracy, sensitivity, and specificity, and compare them with medical professionals.ResultsThe deep learning ensemble model had excellent accuracy (97.03%), sensitivity (95.70%), and specificity (98.37%) in detecting DRFs. Among them, the accuracy of the AP view was 97.75%, the sensitivity 97.13%, and the specificity 98.37%; the accuracy of the lateral view was 96.32%, the sensitivity 94.26%, and the specificity 98.37%. When the wrist joint is counted, the accuracy was 97.55%, the sensitivity 98.36%, and the specificity 96.73%. In terms of these variables, the performance of the ensemble model is superior to that of both the orthopedic attending physician group and the radiology attending physician group.ConclusionThis deep learning ensemble model has excellent performance in detecting DRFs on plain X-ray films. Using this artificial intelligence model as a second expert to assist clinical diagnosis is expected to improve the accuracy of diagnosing DRFs and enhance clinical work efficiency

    Electric-Field-Induced Connectivity Switching in Single-Molecule Junctions

    Get PDF
    Summary(#br)The manipulation of molecule-electrode interaction is essential for the fabrication of molecular devices and determines the connectivity from electrodes to molecular components. Although the connectivity of molecular devices could be controlled by molecular design to place anchor groups in different positions of molecule backbones, the reversible switching of such connectivities remains challenging. Here, we develop an electric-field-induced strategy to switch the connectivity of single-molecule junctions reversibly, leading to the manipulation of different connectivities in the same molecular backbone. Our results offer a new concept of single-molecule manipulation and provide a feasible strategy to regulate molecule-electrode interaction

    Tunable near-infrared epsilon-near-zero and plasmonic properties of Ag-ITO co-sputtered composite films

    No full text
    <p>Series of co-sputtered silver-indium tin oxide (Ag-ITO) films are systematically fabricated. By tuning the atomic ratio of silver, composite films are manifested to have different microstructures with limited silver amount (<3 at.%). Two stages for film morphology changing are proposed to describe different status and growth mechanisms. The introduction of silver improves the preferred orientations of In<sub>2</sub>O<sub>3</sub> component significantly. Remarkably, dielectric permittivity of Ag-ITO films is highly adjustable, allowing the cross-over wavelengths <i>λ</i> <sub><i>c</i></sub> to be changed by more than 300 nm through rapid post-annealing, and thus resulting in tunable epsilon-near-zero and plasmonic properties in the near-infrared region. Lower imaginary permittivity compared with pure metal films, as well as larger tunability in <i>λ</i> <sub><i>c</i></sub> than pure ITO films suggest the potentiality of Ag-ITO films as substituted near-infrared plasmonic materials. Extended Maxwell-Garnett model is applied for effective medium approximation and the red-shifting of epsilon-near-zero region with the increase of silver content is well-fitted. Angle-variable prism coupling is carried out to reveal the surface plasmon polariton features of our films at optical communication wavelength. Broad dips in reflectance curves around 52–56° correspond to the SPP in Ag-ITO films.</p
    corecore