Comparison of methods for clustering scientific publications based on citations

Event Date: 
Wednesday, 29 March, 2017 - 13:00
Assist Prof Lovro Šubelj, PhD
There is an extensive literature on graph partitioning and community detection in networks. This literature studies methods for partitioning the nodes in a network into groups or clusters, where nodes belonging to the same cluster should be relatively strongly connected to each other. However, the literature does not provide a clear answer on which methods perform best in practice.
In this lecture, we compare clustering methods in one specific context. We are interested in grouping scientific publications into clusters based on their direct citation relations and we expect each cluster to represent a set of publications that are topically related to each other. We therefore compare the performance of clustering methods when applied to citation networks collected from the Web of Science bibliographic database.
First, we conduct a pair-wise comparison of the clusterings obtained using different methods. Despite a large number of methods considered, these can be divided into only a handful of truly different classes. Next, we compare standard statistical properties of the clusterings, while we also focus on a number of properties that are of special relevance in the context of citation networks of publications. Finally, to obtain a deeper understanding of the differences between the methods, we perform an expert-based assessment of the clusterings for publications in the field of Library & information science.
Since none of the considered methods performs indeed satisfactory according to all desired criteria, we discuss strengths and weaknesses of different methods, and highlight methods that seem to provide a reasonable trade-off.
Joint work with Nees Jan van Eck and Ludo Waltman from Leiden University.

