Over the last decade, there have been increasing calls for robust impact evaluations of voluntary agricultural sustainability standards (VSS). In response, this study reviews the literature regarding 13 major agricultural standards, asking: where are certified crops being studied? Which sustainability outcomes and indicators are measured? And finally, what does the current evidence base suggest about VSS outcomes? The analysis of 45 peer-reviewed articles suggests a mismatch between what is certified and what is studied. Some crops and standards are over-represented in the literature as compared to their amount of certified production (e.g. coffee and Fairtrade certification), while others are under-represented (cotton, sugar, cocoa, soy, and palm oil, in addition to Organic certification). The review also identifies countries which appear to be under-represented in the literature, including Brazil, Australia, Malaysia, the Ivory Coast, and the United States. When measuring success, economic indicators are the most frequently evaluated, and only 20% of studies analyze economic, social, and environmental indicators simultaneously. When grouped by case, the indicator results tend to be positive on average (51%), followed by no difference (41%) and negative (8%) outcomes. There are no significant differences among sustainability pillars in terms of the average proportion of positive and negative results. These findings should be interpreted carefully, since the evidence base is heavily weighted towards coffee certification (75% of cases analyzed), and impacts are highly context dependent. Finally, the review identifies best practices in conducting robust evaluations, including the importance of addressing sustainability trade-offs and appropriately measuring environmental outcomes. While significant gaps remain, the findings indicate an increase in research credibly measuring VSS impacts. Published under Creative Commons BY-NC-ND 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/).