27 research outputs found
Mapping Computer Science Research: Trends, Influences, and Predictions
This paper explores the current trending research areas in the field of
Computer Science (CS) and investigates the factors contributing to their
emergence. Leveraging a comprehensive dataset comprising papers, citations, and
funding information, we employ advanced machine learning techniques, including
Decision Tree and Logistic Regression models, to predict trending research
areas. Our analysis reveals that the number of references cited in research
papers (Reference Count) plays a pivotal role in determining trending research
areas making reference counts the most relevant factor that drives trend in the
CS field. Additionally, the influence of NSF grants and patents on trending
topics has increased over time. The Logistic Regression model outperforms the
Decision Tree model in predicting trends, exhibiting higher accuracy,
precision, recall, and F1 score. By surpassing a random guess baseline, our
data-driven approach demonstrates higher accuracy and efficacy in identifying
trending research areas. The results offer valuable insights into the trending
research areas, providing researchers and institutions with a data-driven
foundation for decision-making and future research direction.Comment: 7 pages, 8 figures, 1 tabl
Recommended from our members
Hybrid Analog-Digital Co-Processing for Scientific Computation
In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms.
This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking.
I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer.
The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around.
The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations,
and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads.
The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms.
The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation.
As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads
PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation Learning
There has been a rapid growth in biomedical literature, yet capturing the
heterogeneity of the bibliographic information of these articles remains
relatively understudied. Although graph mining research via heterogeneous graph
neural networks has taken center stage, it remains unclear whether these
approaches capture the heterogeneity of the PubMed database, a vast digital
repository containing over 33 million articles. We introduce PubMed Graph
Benchmark (PGB), a new benchmark dataset for evaluating heterogeneous graph
embeddings for biomedical literature. PGB is one of the largest heterogeneous
networks to date and consists of 30 million English articles. The benchmark
contains rich metadata including abstract, authors, citations, MeSH terms, MeSH
hierarchy, and some other information. The benchmark contains three different
evaluation tasks encompassing systematic reviews, node classification, and node
clustering. In PGB, we aggregate the metadata associated with the biomedical
articles from PubMed into a unified source and make the benchmark publicly
available for any future works
The coverage of Microsoft Academic: Analyzing the publication output of a university
This is the first detailed study on the coverage of Microsoft Academic (MA).
Based on the complete and verified publication list of a university, the
coverage of MA was assessed and compared with two benchmark databases, Scopus
and Web of Science (WoS), on the level of individual publications. Citation
counts were analyzed, and issues related to data retrieval and data quality
were examined. A Perl script was written to retrieve metadata from MA based on
publication titles. The script is freely available on GitHub. We find that MA
covers journal articles, working papers, and conference items to a substantial
extent and indexes more document types than the benchmark databases (e.g.,
working papers, dissertations). MA clearly surpasses Scopus and WoS in covering
book-related document types and conference items but falls slightly behind
Scopus in journal articles. The coverage of MA is favorable for evaluative
bibliometrics in most research fields, including economics/business,
computer/information sciences, and mathematics. However, MA shows biases
similar to Scopus and WoS with regard to the coverage of the humanities,
non-English publications, and open-access publications. Rank correlations of
citation counts are high between MA and the benchmark databases. We find that
the publication year is correct for 89.5% of all publications and the number of
authors is correct for 95.1% of the journal articles. Given the fast and
ongoing development of MA, we conclude that MA is on the verge of becoming a
bibliometric superpower. However, comprehensive studies on the quality of MA
metadata are still lacking
Machine Learning for Actionable Warning Identification: A Comprehensive Survey
Actionable Warning Identification (AWI) plays a crucial role in improving the
usability of static code analyzers. With recent advances in Machine Learning
(ML), various approaches have been proposed to incorporate ML techniques into
AWI. These ML-based AWI approaches, benefiting from ML's strong ability to
learn subtle and previously unseen patterns from historical data, have
demonstrated superior performance. However, a comprehensive overview of these
approaches is missing, which could hinder researchers/practitioners from
understanding the current process and discovering potential for future
improvement in the ML-based AWI community. In this paper, we systematically
review the state-of-the-art ML-based AWI approaches. First, we employ a
meticulous survey methodology and gather 50 primary studies from 2000/01/01 to
2023/09/01. Then, we outline the typical ML-based AWI workflow, including
warning dataset preparation, preprocessing, AWI model construction, and
evaluation stages. In such a workflow, we categorize ML-based AWI approaches
based on the warning output format. Besides, we analyze the techniques used in
each stage, along with their strengths, weaknesses, and distribution. Finally,
we provide practical research directions for future ML-based AWI approaches,
focusing on aspects like data improvement (e.g., enhancing the warning labeling
strategy) and model exploration (e.g., exploring large language models for
AWI)