Citation network analysis and the social production of knowledge

I’m currently enrolled in a social network analysis (SNA) class at BU, and its proving both extremely difficult and very interesting. My primary interest in learning this method and corresponding theories is to, someday, look at the network of global health delivery NGOs and try to map the field of action in a way that could provide some structural explanation of NGO policy, structure, and action.

For now though, I’m working on a project to understand the citation network of academic / scientific papers written about global noncommunicable diseases. This builds off of my previous work with the Lancet Commission on Reframing NCDIs Amongst the Poorest Billion. Specifically, I’m hoping to explore the network of citation connections across different domains of knowledge production and look at which forms, framings, and issues tend to dominate.

I was able to scrape the Web of Science of all papers that had a topic that included one of the noncommunicable diseases (list generated from those included by the IHME) and also included the term “global health”. The search generated 9,809 total articles. I used the the CRAN bibiometrics R script to turn this data set into a sociomatrix and plotted it. Here is the result:

ncd-paper-citation-network

Basically, each node / vertex is a paper and the size of the node is proportional to the number of times it has been cited by another paper in this network.

The top ten cited papers are:

AARONSON NK, 1993, J NATL CANCER I, V85, P365, DOI 10.1093/JNCI/85.5.365
367
WARE JE, 1992, MED CARE, V30, P473, DOI 10.1097/00005650-199206000-00002
Citations: 351
AMERICAN PSYCHIATRIC ASSOCIATION, 1994, DIAGN STAT MAN MENT
Citations: 321
AMERICAN PSYCHIATRIC ASSOCIATION, 2000, DIAGN STAT MAN MENT
Citations: 221
FRIES JF, 1980, ARTHRITIS RHEUM, V23, P137, DOI 10.1002/ART.1780230202
Citations: 204
MURRAY CJL, 2012, LANCET, V380, P2197, DOI 10.1016/S0140-6736(12)61689-4
Citations: 201
LOZANO R, 2012, LANCET, V380, P2095, DOI 10.1016/S0140-6736(12)61728-0
Citations: 192
COHEN J., 1988, STAT POWER ANAL BEHA
Citations: 179
MURRAY CJL, 1997, LANCET, V349, P1498, DOI 10.1016/S0140-6736(96)07492-2
Citations: 170
LIM SS, 2012, LANCET, V380, P2224, DOI 10.1016/S0140-6736(12)61766-8
Citations: 157

I was really happy that I was able to make it work and it looks pretty cool in my opinion! But, unfortunately, it doesn’t really tell us much about the true structure of the network. I’ll have to do much more analysis. I’m going to try to do block modeling and perhaps exponential random graph models (ERGM) to see what stands out about this network.

Here are some plots that I was able to easily make with the CRAN package:

rplot

rplot02

rplot03

rplot04

Some quick takeaways? Chris Murray is dominant, especially in the modern literature on global NCDs. Similarly, the US is dominant in terms of production of these papers. Finally there has been a stable increase in the number of papers published annually, after a burst in citation of some subset of papers in the network around 1993. It will be interesting to see what those papers are and what might have triggered this sudden citation boom and the subsequent growth in the volume of literature. It will also be very interesting to see in what domain these papers fall (clinical, basic science, social sciences, engineering, etc) and see if we can develop some measure of their “framing” of global NCDs.