Comparison of methods for clustering scientific publications based on citations

Event Date: 
Wednesday, 29 March, 2017 - 13:00
Assist Prof Lovro Šubelj, PhD
There is an extensive literature on graph partitioning and community detection in networks. This literature studies methods for partitioning the nodes in a network into groups or clusters, where nodes belonging to the same cluster should be relatively strongly connected to each other. However, the literature does not provide a clear answer on which methods perform best in practice.
In this lecture, we compare clustering methods in one specific context. We are interested in grouping scientific publications into clusters based on their direct citation relations and we expect each cluster to represent a set of publications that are topically related to each other. We therefore compare the performance of clustering methods when applied to citation networks collected from the Web of Science bibliographic database.
First, we conduct a pair-wise comparison of the clusterings obtained using different methods. Despite a large number of methods considered, these can be divided into only a handful of truly different classes. Next, we compare standard statistical properties of the clusterings, while we also focus on a number of properties that are of special relevance in the context of citation networks of publications. Finally, to obtain a deeper understanding of the differences between the methods, we perform an expert-based assessment of the clusterings for publications in the field of Library & information science.
Since none of the considered methods performs indeed satisfactory according to all desired criteria, we discuss strengths and weaknesses of different methods, and highlight methods that seem to provide a reasonable trade-off.
Joint work with Nees Jan van Eck and Ludo Waltman from Leiden University.

About IBMI

Institute for Biostatistics and Medical Informatics (IBMI), formerly Institute for BioMedical Informatics (so still IBMI) was founded by the Faculty of Medicine as a result of a need for a unit which would perform, or coordinate, tasks related to data analysis and providing information, relevant for research in medicine. The programme of the institute, and its development, have been adjusting thorugh time to changes in financing and technological progress, but the basic aim remain the same: to support research in medicine. This is achieved through the following tasks:


Institute for Biostatistics and Medical Informatics
University of Ljubljana, Faculty of Medicine
Vrazov trg 2, 1000 Ljubljana

tel: +386 1 543-77-70
fax: +386 1 543-77-71
email: ibmi (at)