Tuesday, May 30, 2006

OA debate - Eysenbach and Harnad

I’ve been following a rather lively debate on the American Scientist Open Access Forum between Gunther Eysenbach (a professor at the University of Toronto and editor-in-chief of JMIR), and Stevan Harnad (a professor at the University of Southampton and Open Archives "archivangelist"). Eysenbach published an article that showed the citation benefits of OA publishing: OA articles (articles which are freely accessible to the public) in Proceedings of the National Academy of Sciences (PNAS) were more than twice as likely to be cited one year later than non-OA articles (articles that must be paid for to access) published in PNAS.

Although Eysenbach and Harnad are both OA proponents, what appears to have stirred up the trouble was that Eysenbach’s article criticized several of the studies that Harnad was involved in (and failed to point to two recent studies), pointing out that they lacked a certain amount of statistical rigor and had some inherent fallacies. Eysenbach gives a detailed account on his website about the methodology of his paper which used multivariate analysis to account for known confounders (variables which are strongly associated with the outcome of interest) like the number of co-authors of a paper. Eysenbach argues that if a paper has multiple authors, it is more likely to be self-archived (green OA- see below). This is intuitively true (my paper on search engine coverage of the OAI-PMH corpus was self-archived by Xiaoming before I even gave it a second thought). But a paper is also more likely to be cited if it has more authors since each author is vested in citing their work. It’s also possible papers with multiple authors are of higher caliber (and hence will get cited more often) since there were more heads looking at the problem. Other factors like this one definitely need to be considered when trying to determine if OA is causing the increase in citations or not.

A big part of the argument stems around what is OA. There are two different flavors:
1. green OA - articles (including dissertations and preprints) are published in closed-access journals but are self-archived in an OA repository/archive or personal website. Green journals explicitly allow authors to self-archive their work.
2. gold OA – articles are published in OA journals where they are immediately accessible to the public for free. A gold journal may make all articles freely accessible or make only certain articles freely accessible by charging a fee to the author (which is usually paid by the author's institution or research foundation).

Although green OA is currently the most popular form of OA (5% gold, 90% green), it is sometimes difficult to test for since it’s possible an author will make their article publicly accessible the day it is accepted for publication or months after its been published. Gold OA is easier to test since the status is determined the first day it is published. Eysenbach tested for gold vs. green to see if papers that were self-archived but had closed access were any more likely to be cited than articles that were gold OA (it’s not clear how he discovered if a paper was self-archived; maybe he searched Google or maybe there was a way for an author to indicate if the paper was self-archived). He found that “self-archiving OA status did not remain a significant predictor for being cited.” This point appears to have also really bothered Harnad about the study.

I’ve learned a lot about OA from this debate. I just wish there was a little less animosity (zealousness?) from both sides. It’s a he-said/I-didn’t-say exchange which is now well documented on a public email forum which is archived on the Web, a blog, and in a letter to the editor: a prime example of how scientists air their differences today.

By the way, I just came across a really cool slide illustrating the access-impact problem between the Harvards and the have-nots (nice pun!) is on page 4 of Leslie Chin’s slides.