Last Modified on
These are my current thoughts about how I plan to run POPL '12. The first part is an overview of the whole process. The second part is further discussion (linked from the first part) of why I've decided what I have, along with some indication of what I'm not sure about. Your comments are very welcome!
higher standard: at least one champion and no strong detractors. PC Chair gets final say on whether the paper is in the top half of accepted papers; accept if so.
In addition to studies, I think that the claims that single-blind reviewing lead to (perhaps unintentional) bias are believable based on what I know of human nature. Though we may think we make decisions on a completely rational basis, sometimes we don't. I see this in my own behavior, and behavioral economists have been pointing out this phenomenon for years. If you haven't read the book Predictably Irrational by Dan Ariely, I'd recommend it! He points out that we are strongly influenced by first impressions in many circumstances, even to our detriment. There may be no malice or intent in many decisions we make, but these decisions can nevertheless not depend on the claimed criteria under consideration. To the extent we can set up a process that systematically avoids the some roots of unfair/irrational judgments, the more fair that process will be.
stands on its ownand does not attempt to inherit credibility of prior paper, as much as possible. Revealing authorship after submitting review can ameliorate concerns about credibility if they come up during first look. If an external review might be desirable, committee member can suggest it after the initial review.
Overall my feeling is that double-blind reviewing may help, and does not hurt, so it is worth the extra effort. To restate: my goal with double-blind reviewing is to aim that reviewers approach a paper with as few preconceptions as possible. I think there is a benefit to eliminating the authors' names, and citing work in the third person; there is enough uncertainty that I think reviewers will be a bit more measured. I am trying to account for unconscious bias, not conscious bias, so I am not worried about reviewers searching around the web to attempt to unmask an author. By revealing the authors immediately to a reviewer after she submits her first review (just as other reviews for the same paper are not revealed until after the paper is submitted), problems with the review due to non-assumptions about the authors (e.g., that there is no conflict, or that the authors' claims of system X are not believable) can be corrected directly.
Here are some anecdotes I've received from various people that led them to be in favor of double-blind reviewing.
Andrew Myers shared this brief note with me about his his experience.I have only been program chair once with anonymized submissions, and I have mixed feelings about the approach. A few observations: 1. It makes a difference. Some discussions that took place during the program committee meeting would have gone quite differently if authors were revealed. In the cases where this was apparent to me, it seemed to be helpful that the paper was anonymous. On the other hand, Kathryn raises some good points about how keeping the paper anonymous can be a problem. if papers are unblinded during the meeting, I think it is important to set clear ground rules for what kinds of arguments regarding acceptance or rejection are acceptable. 2. The effect of anonymous submissions on reviewers differs by reviewer. Some reviewers feel that it makes no difference to their views, whereas others report that they feel it helps them review more fairly. So I think there is a good case to be made that submissions should be anonymous during the writing of reviews.
He also expressed this concern, addressed by unblinding papers prior to the PC meeting:
One fear that in a blinded process someone will review a paper, realize who the likely authors are, and intentionally bias their review because they think they can get away with it since others don't know who the authors are. I have seen this happen, even so far as this: the program chair of a major conference pushed papers across the reject-accept border on which a conflict might easily be said to exist. Hard to detect unless you know whose paper is being pushed. To me this is a serious concern with blinding.David Wagner wrote this e-mail the steering committee of USENIX security in support of double blind reviewing.
-------- Original Message -------- Subject: Usenix: blind/non-blind submissions Date: Tue, 5 Oct 2010 16:57:43 -0700 (PDT) From: David WagnerEmery Berger shared the following experience:To: sec11pc@taverner.cs.berkeley.edu Hi everyone, I want to solicit the committee's views on the question of blind vs non-blind submissions to Usenix Security. I see that there has already been some discussion on this, and that's great. Let me share a little more background about why I'm raising this issue. About a year or two ago, I had an eye-opening experience with several female grad students working in computer security, who initially approached me to raise concerns use of non-blind reviewing in computer security conferences. That's led me on an interesting exploration of the literature, and since then I've had some unexpected conversations with folks working in the field, which I wanted to share with you. The female grad students raised concerns about the potential for gender bias in our reviewing, pointing out that there are well-documented cases of gender bias in other areas of human endeavor -- even in the sciences. For instance: * There's the famous story of gender bias in orchestra try-outs, where moving to blind auditions seems to have increased the hiring of female musicians by up to 33% or so. http://www.economics.harvard.edu/faculty/goldin/files/orchestra.pdf Today some orchestras even go so far as to ask musicians to remove their shoes (or roll out thick carpets) before auditioning, to try to prevent gender-revealing cues from the sound of the auditioner's shoes. * One study mailed out c.v.'s for a faculty position, but randomly swapped the gender of the name on some of them. They found that both men and women reviewers ranked supposedly-male job applicants higher than supposedly-female applicants -- even though the contents of the c.v. were identical. http://web.archive.org/web/20060910151719/http://www.case.edu/president/aaction/ImpactofGender.pdf Presumably, none of the reviewers thought of themselves as biased, yet their evaluations in fact exhibited gender bias. (However: in contrast to the gender bias at hiring time, if the reviewers were instead asked to evaluate whether a candidate should be granted tenure, the big gender differences disappeared. For whatever that's worth.) * The Implicit Association Test illustrates how factors can bias our decision-making, without us realising it. For instance, a large fraction of the population has a tendency to associate men with career (professional life) and women with family (home life), without realizing it. The claim is that we have certain gender stereotypes and schemas which unconsciously influence the way we think. The interesting thing about the IAT is that you can take it yourself. If you want to give it a try, select the Gender-Career IAT here: https://implicit.harvard.edu/implicit/demo/ There's evidence that these unconscious biases affect our behavior. For instance, one study of recommendation letters written for 300 applicants (looking only at the ones who were eventually hired) found that, when writing about men, letter-writers were more likely to highlight the applicant's research and technical skills, while when writing about women, letter-writers were more likely to mention the applicant's teaching and interpersonal skills. * There's a study of postdoctoral funding applications in Sweden, which found that women needed to be about 2.5 times as productive (in terms of papers published) as men, to be ranked equivalently. http://www.advancingwomen.org/files/7/127.pdf Other studies have suggested that the Swedish experience may be an anomaly. (For instance, one meta-analysis I saw estimated that, on average, it appears men win about 7% more grant applications than women, but since this is not controlled according to the objective quality of the application, it does not necessarily imply the presence of gender bias in reviewing of grant applications.) Maybe many of you had heard of these examples before, but several of them were new to me. The students raised the obvious question about whether these effects occur in our field, or in scientific peer reviewing. The story there seems to be less clear. As I studied the issue further, I discovered that there is some literature on gender bias in peer reviewing. I first found this study: http://www.onepoint.ca/Budden%20et%20al%202008.pdf The study reports experience from an ecology journal that switched from non-blind to blind reviewing. After the switch, they found a significant (~8%) increase in the acceptance rate for female-first-authored submissions. To put it another way, they saw a 33% increase in the fraction of published papers whose first author is female (28% -> 37%). Keep in mind that this is not a controlled experiment, so it proves correlation but not causation, and there appears to be controversy in the literature about the work. So it as at most a plausibility result that gender bias could be present in the sciences, but far from definitive. There are many other studies in the literature on gender bias in peer reviewing. Taken as a whole, they seem... inconclusive. For instance: * A study of a neurology journal found no significant differences in accept rates by author's gender. * A study of an economics journal found no statistically significant difference in acceptance rates by gender. However the data also suggested that, while women's acceptance rates were the same under blind and non-blind review, men's acceptance rates were measurably higher under non-blind review, thus blind review seemed to reduce the difference between male and female acceptance rates. * Snodgrass reports a third field that experienced significant increases in acceptance rates of female-authored papers after a change to blind reviewing. * Some studies have had readers evaluate essays where the authors' names were randomly chosen, and found that essays supposedly written by women were on average rated lower than essays supposed written by men. But, other studies failed to replicate those effects. And still other studies suggested that the magnitude of the effect may depend upon how male-dominated the field is and the expertise of the reviewer. In any case, the first big eye-opener for me was the weight of evidence that unconscious gender bias can affect even contexts that we might expect to be free of bias. Overall, I'd say it's far from clear what the situation is for our field. I don't know of anyone who has attempted to measure whether unconscious gender bias affects reviewing in security conferences, so we don't know whether it is an issue for us. On the one hand, there is sufficient evidence of gender bias in some other contexts to make it at least plausible that our reviewing could be affected unconscious gender bias. On the other hand, it's also entirely plausible that it's not an issue for us. The second big eye-opener for me was the strength of feelings that surround this issue, at least for some folks. After hearing initial concerns from a few women grad students, I met with a local organization of female Berkeley CS grad students to hear their thoughts on blind vs non-blind reviewing. I was thoroughly surprised by the extent of their feelings on this issue. Out of about 15 female grad students, they were almost unanimous in opposition to, and skepticism about, non-blind reviewing. I was peppered with questions about how our field can defend non-blind reviewing -- phrased in a polite but clearly skeptical way. The tenor of the conversation was something along the lines of: "If there's even a chance of bias, how can program committees justify allowing non-blind reviewing to persist? Do PCs really allow administrative efficiency to trump equality? Why do they discount the potential for bias, when we know it affects so many other walks of life? Am I missing something? Are PCs just unaware of these issues? What's the best way that we can raise awareness about these concerns?" It was this meeting, more than anything else, which prompted me to learn more about the issue. At the meeting, the students cited two main concerns about non-blind reviewing: (1) the potential for gender bias, and (2) the potential for 'name bias', i.e., bias in favor of well-known, well-established authors over unknowns, students, and junior researchers. They acknowledged that they don't know whether there is in fact any bias at all, but argued that we don't know there isn't any bias, either, and it seems naive to ignore the possibility. They also argued that program committee members are, by their very nature, selected from among well-established researchers with a good reputation, and thus are least likely to be personally concerned or feel strongly about the potential for 'name bias'. My take: I wonder if we have a perception issue here. I wonder if others share these views. Even if our process is 100% fair, if it triggers this kind of negative reaction among a substantial fraction of authors, that seems undesirable. And, despite my focus on gender bias issues above, it's possible that the perception issue and the 'name bias' issue may be at least as important as gender bias. Of course, this was a conversation with a small group of students at one institution only, and may not be representative of the community as a whole, so it must be taken with a large grain of salt. Anyway, all of this makes me think that it might be a good time for us to review our submission policy. So, I call for your continued input and views on what would be best for the community. Thank you to everyone who has already weighed in -- and I welcome further discussion! -- David P.S. Sorry for the length of this email. At risk of really blowing past all reasonable length limits, I'll mention a few of my reactions to some arguments I've heard in this context, and some of the students' reactions: * "I don't feel biased." or "I haven't noticed bias in reviewing." My view: By definition, you won't notice your own unconscious biases. Nor is it clear you'd notice other reviewer's biases. The concern is not that reviewers are deliberately biased; but rather that unconscious effects may be in play. I would bet that, in most of the examples I cited earlier where researchers eventually measured significant gender bias effects, the evaluators thought at the time they were unbiased and had no idea about their own unconscious biases. * "But reviewers can often guess who the authors are anyway." My view: Agreed. On the other hand, the question is whether blind submission is better than non-blind, not whether it is perfect. If papers are effectively anonymous 50% or 75% of the time, is that better than 0%? If author names are not explicitly in front of the reviewer on the front page, does that help at all even for the remaining submissions where it would be possible to guess? * "Blind submission sometimes creates a funny dance during PC meetings where some reviewers know who the authors are, but have to pretend they don't, and others don't know. I'd rather have all reviewers on a level playing field." My view: Point well-taken. * "Blind submission makes it impossible to cite your own prior work without compromising your own anonymity." My view: I don't buy it. This is a solved problem. You cite your own work in the third person. It's true that in some cases it may be possible to guess your identity based upon the content of the paper (e.g., if the latest submission is a continuation of your past work in a direction that no one else is looking at), but it's just not true that blind submission prohibits citing your own work. We can certainly provide guidelines to authors on how to cite their own work; other conferences have done so. * "Blind submission has created injustices in the past, where a paper was inappropriately rejected based upon supposedly-prior work which was actually by the same authors and not previously published." My view: This is indeed a serious issue. Perhaps this can be addressed by the program chair (since the chair has access to author names). I'm willing to take on the responsibility and extra work to watch out for such injustices, if the PC would like to move to blind submissions. * "Blind submissions may increase the number of submissions, since authors won't be embarassed to submit crap, so blind submissions may increase committee workload." My view: I don't know how to evaluate this one. It would be interesting to see some quantitative data on this. Did CCS 2010 get fewer submissions than CCS 2009? If yes, how much less? Does OSDI get fewer submissions than SOSP? If yes, how much less? Does Oakland get many fewer submissions than Usenix Security? * "A blind submission policy probably won't change much anyway, so either way makes little difference." My view: Could well be right. It's an unknown, for sure.
Unfortunately, I don't think reviewers (or people, in general) are any good at self reporting much of anything, especially not bias. This is a well-known problem in any survey-based approach (e.g., people report consuming 500-1000 fewer calories per day on average than they actually consume). Based on surveys, you would probably discover that everyone picks up litter, helps old ladies across the street, and drives courteously. Based on real life, you might conclude differently :). In any event, asking for a scientific answer to this question seems like an awfully high bar considering how completely unscientific and imperfect the entire reviewing process is. What are we trying to do here? We are trying to make the reviewing process more fair, not *perfectly* fair. I also don't care if it is occasionally possible to correctly ascertain who wrote a paper in certain cases. I know that in one instance, I was *sure* a paper was written by a particular group, only to discover later that I was wrong. Of course, there are cases where, due to out-of-band knowledge (like a talk at their institution), a reviewer *can* know for sure who wrote a given paper. But who cares? No process is perfect. The point here is to do whatever one can to make sure that technical papers are being judged fairly. Blinding all the way through the decision process helps a lot more than the opposite. Managing conflicts is not hard (e.g., send each reviewer a list of who claims a conflict with them for them to vet such conflicts). In any event, I think I do have some anecdotal evidence about double-blind reviewing and its effects. On one committee I was on, a lot of people were surprised by the relative obscurity of the sources of some of the accepted papers. A number of papers that were (correctly, IMHO) savaged by reviewers were by "famous" people, who subsequently got their papers accepted at other, non-blinded venues -- not proof, but suggestive. I have also seen noticeable influence on acceptances at non-blind PC meetings by features such as the fact that a particular paper comes fromKathryn McKinley shared these experiences with me that reveal gender bias, and problems with knowing paper authors.. Whether or not you agree that such features deserve or do not deserve to be taken into account, I think it is difficult to argue that it does not make a difference. In the interest of fairness, I personally prefer double-blind reviewing both as a reviewer and a committee member. I don't want to be affected (even subconsciously) by people's names / institutions when reviewing papers, since I believe they should be judged on their merit. I also don't want other people unduly influenced, especially people who say things like "yeah, that theorem *does* look wrong, but I am sure it works, because does good work" (sadly, true story - and I have more). Another data point that unintentionally supports my belief that DBR is the way to go comes from two people who were each horrified by the idea of DBR: one (a European) because he was convinced that this would mean fewer European papers would get accepted, because "Europeans don't know how to write papers as well" (!), and another because "but if we reject papers from person X, he won't come to the conference!" It's true: DBR discriminates against poorly written papers regardless of who wrote them. Sounds great to me.
Hi Mike, Did you read some of the source material [of Snodgrass's paper]? It is astonishingly compelling. [2] does a statistical analysis of citation rates for double and single blind venues and finds that double blind venues are cited more frequently, controlling for other possible influences (authors, etc.). Snodgrass is not correct if he says that [2] is not conclusive on the difference between single and double. [7] and [8] show statistical bias against women researchers, and [8] also shows nepotism is rampant--we are however more sexist than nepotistic. For example, even if your advisor leaves the room, your paper is more likely to be accepted when your advisor or other senior mentor is on the selection committee (which actually argues for keeping the process blind to the end). I much prefer these statistics to anecdotes, but I have plenty of those as well. ** I was on two PCs (that did not employ double-blind) at which I watched senior people, who had not read the submissions, trash a woman's papers with no technical claims, saying things like she had enough PLDI papers on this topic already!?, and in one case got her paper rejected. I think it was ranked in the top ten. ** I was on a paper award committee (not blinded) where a woman's paper was vastly more influential than any other paper that year, but some crazy person argued for 15 minutes that an order of magnitude more citations than any other paper that year did not mean it had more influence! ** I was on another PC in which a senior guy, formerly of U, argued that we should _not_ hold the paper by a researcher at U to the same experimental standards as the rest of the papers. (I think this conflict was hidden because the PC did not reveal the names, so I only discovered that they were close friends later..) ** On a recent NSF panel, a MIT researcher wrote a bad proposal, and researcher X just kept repeating that it was interesting and important area, a good researcher, etc., rather than saying why the actual research idea or evaluation plan was any good. ** I argued to reject a paper where conflict-of-interest did not always require leaving the room. I argued to reject a paper of person X. The former PhD student of X stayed in the room and then made an enemy of X for me. Thanks so much. If you reveal names at the meeting, ask people to leave with declared and any undeclared to also leave, who the reviewers of a rejected paper also are more likely to remain private.Kathryn also expressed an opinion about when to unblind the papers:
I think both never unblinding or unblinding part way through can work but here was my reasoning. 1) Scientists want to think they are unbiased, but they are not. 2) Blinding helps them be less biased; their initial paper score counts a lot with ranking, etc. 3) PC members are unlikely to change their score immediately upon getting author names. You could audit when scores change, but I did not. 4) The community is small so some PC members will _know_ who the authors are. [Fact] 5) Unblinding at the meeting puts everyone in command of similar facts. For example, if the PC members are biased by "experience" or "MIT" or whatever, the other PC members will see it at the meeting. I had gone to Architecture PC meetings that were blind all the way through in which PC members advocated for papers they had conflicts with, but since it was blinded, no one knew until after the fact. Revealing at some point protects against PC member abuse. Unblinding after rebuttal, vs. at the meeting, helps reviewers put the work in context of the authors own work. If author A did X and then X++, they should probably compare with X, whereas if author B did X++, it might not be possible to replicate X.
Here I compare an ERC with two alternatives: ad-hoc external
reviewers, and a light
PC, as used in some systems conferences.
External review committee:
Advantages:
Light
PC as used in some systems conferences with
tiered reviewing.
Advantages:
heavyPC members in proceedings.
Ad hoc reviewers: The advantage here is highly targeted reviews: the best review can be picked for each paper when we know what the papers are. The main disadvantage is that with double-blind reviewing doing this is too much work for the PC Chair, and he is unlikely to do a good job. In addition, it loses the additional benefit of a separate committee to handle PC papers.
I am persuaded that an in-person PC meeting, preceded by electronic discussion, is preferable to not having one.
Advantages:
Evaluation
A. Strong accept: possible award paper |
B. Accept: will argue to accept |
C. Weak accept: will not argue for |
D. Weak reject: will not argue against |
E. Reject: will argue to reject |
F. Strong reject: hopeless |
Novelty
A: Extreme. Exposes a new field or way of thinking about a field. |
B: Solid. A new approach in an established field. |
C: Incremental. A straightforward next step to an existing idea. |
D: Known. This paper does not have anything new. |
Worth solving but prior solutions are not great.
A: Critical. The paper is in an area that desperately needs a solution. |
B: Useful. The paper is in an area that already has reasonable solutions |
C: Okay. The paper is in an area that already has good solutions. |
D: Irrelevant. The paper solves a problem that is not worth solving. |
Convincing
A: Totally convincing. The paper presents bullet-proof evidence (argument,proof, or data) to demonstrate its main points. |
B: Typical. The evidence is not bullet-proof but is typical for papers in the area |
C: Weak. The paper presents weak evidence to demonstrate its main points. |
D: Inadequate. I don't believe the main points in the paper. |
Expertise
A. Expert - this is my area |
B. Very knowledgable - this a familiar topic |
C. Passing knowledge - know something about the area |
D. Little knowledge - seen this area at a distance, at best |
The first is the overall score. The second three are the reasons underlying the overall score: Is the idea in the paper new? Does it address an important problem? Is the paper convincing? This last speaks not just about what the authors did (what experiments, proofs, etc. they describe) but also about issues like the paper's exposition. As such, an expert can temper her review if the paper was poorly written, e.g., if a paper is in a reviewer's field, she can proclaim herself an expert (A), but indicate the paper is not convincing (D), if it is too hard to follow.
Amer Diwan, the PLDI 2009 chair, also had each reviewer give a score for each of the three criteria (convincing, worth solving, and novelty) according to how important they considered them to be in general, with the three scores adding up to 100. Note that this is one set per PC member, not one per paper. This allows the PC Chair to calibrate how a reviewer, in general, might use these three criteria to determine the overall score and to calibrate reviews from different sorts of reviewers. I think this is a useful exercise for reviewers, too, since it helps them expose otherwise unconscious motivations for decisions.