87 research outputs found
The Impact of Name-Matching and Blocking on Author Disambiguation
In this work, we address the problem of blocking in the context of author name disambiguation. We describe a framework that formalizes different ways of name-matching to determine which names could potentially refer to the same author. We focus on name variations that follow from specifying a name with different completeness (i.e. full first name or only initial). We extend this framework by a simple way to define traditional, new and custom blocking schemes. Then, we evaluate different old and new schemes in the Web of Science. In this context we define and compare a new type of blocking schemes. Based on these results, we discuss the question whether name-matching can be used in blocking evaluation as a replacement of annotated author identifiers. Finally, we argue that blocking can have a strong impact on the application and evaluation of author disambiguation
QueryVis: Logic-based diagrams help users understand complicated SQL queries faster
Understanding the meaning of existing SQL queries is critical for code
maintenance and reuse. Yet SQL can be hard to read, even for expert users or
the original creator of a query. We conjecture that it is possible to capture
the logical intent of queries in \emph{automatically-generated visual diagrams}
that can help users understand the meaning of queries faster and more
accurately than SQL text alone. We present initial steps in that direction with
visual diagrams that are based on the first-order logic foundation of SQL and
can capture the meaning of deeply nested queries. Our diagrams build upon a
rich history of diagrammatic reasoning systems in logic and were designed using
a large body of human-computer interaction best practices: they are
\emph{minimal} in that no visual element is superfluous; they are
\emph{unambiguous} in that no two queries with different semantics map to the
same visualization; and they \emph{extend} previously existing visual
representations of relational schemata and conjunctive queries in a natural
way. An experimental evaluation involving 42 users on Amazon Mechanical Turk
shows that with only a 2--3 minute static tutorial, participants could
interpret queries meaningfully faster with our diagrams than when reading SQL
alone. Moreover, we have evidence that our visual diagrams result in
participants making fewer errors than with SQL. We believe that more regular
exposure to diagrammatic representations of SQL can give rise to a
\emph{pattern-based} and thus more intuitive use and re-use of SQL. All details
on the experimental study, the evaluation stimuli, raw data, and analyses, and
source code are available at https://osf.io/mycr2Comment: Full version of paper appearing in SIGMOD 202
When Did We Begin to Spell 'Heteros*Edasticity' Correctly?
Using digitized texts scanned by Google and subjected to optical character recognition, I show that heteroskedasticity overtook heteroscedasticity as the preferred spelling in 2001 and has continued to dominate, except for 2005, up to 2008. The latest trends indicate that writers are moving toward the k variant. However, for words such as homoskedasticity, heteroskedastic, and homoskedastic, the corresponding spellings using c are still overwhelmingly dominant, albeit slowly shifting.Unter Verwendung von Texten, die durch Google digitalisiert und einem Texterkennungsprogramm unterzogen wurden, zeige ich, dass sich heteroskedasticity gegenüber heteroscedasticity 2001 als bevorzugte Schreibweise durchsetzte und, mit Ausnahme von 2005, bis 2008 weiterhin vorgeherrscht hat. Die aktuellen Entwicklungen deuten darauf hin, dass Verfasser vermehrt die k-Schreibweise verwenden. Allerdings ist für Wörter wie homoskedasticity, heteroskedastic und homoskedastic die Schreibweise mit c immer noch deutlich stärker verbreitet, wenngleich auch hier ein langsamer Wandel stattfindet
The coming decade of digital brain research: a vision for neuroscience at the intersection of technology and computing
In recent years, brain research has indisputably entered a new epoch, driven by substantial methodological advances and digitally enabled data integration and modelling at multiple scales— from molecules to the whole brain. Major advances are emerging at the intersection of neuroscience with technology and computing. This new science of the brain combines high-quality research, data integration across multiple scales, a new culture of multidisciplinary large-scale collaboration and translation into applications. As pioneered in Europe’s Human Brain Project (HBP), a systematic approach will be essential for meeting the coming decade’s pressing medical and technological challenges. The aims of this paper are to: develop a concept for the coming decade of digital brain research, discuss this new concept with the research community at large, to identify points of convergence, and derive therefrom scientific common goals; provide a scientific framework for the current and future development of EBRAINS, a research infrastructure resulting from the HBP’s work; inform and engage stakeholders, funding organisations and research institutions regarding future digital brain research; identify and address the transformational potential of comprehensive brain models for artificial intelligence, including machine learning and deep learning; outline a collaborative approach that integrates reflection, dialogues and societal engagement on ethical and societal opportunities and challenges as part of future neuroscience research
Query Optimization
Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the type of problem that query optimizers are called to solve. Given a query, there are many plans that a database management system (DBMS) can follow to process it and produce its answer. All plans are equivalent in terms of their final output but vary in their cost, i.e., the amount of time that they need to run. What is the plan that needs the least amount of time? Such query optimization is absolutely necessary in a DBMS. The cost difference between two alternatives can be enormous. For example, consider the following database schema, which will be..
- …