How to Detect Attacks - Bot Prevention

All data collected online should be checked for bots or other fraudulent responses. In fact, we argue that journals should require researchers who collect data online to briefly describe how they checked for bots in their article (including via crowdsourcing platforms like Prolific). There is no perfect (or easy) way to check data, so researchers should consider multiple factors listed below.

Time to Complete Survey

Prior to collecting data, get a sense of how quickly you believe participants will fill out the survey. This is challenging to do, but we knew that no one could reasonably complete our survey in less than 5 minutes, so we removed all of those responses from our dataset.

Batches of Surveys

Check whether surveys were completed all around the same time. We found fraudulent data were often from surveys submitted at the same time, or from surveys submitted at regular intervals (e.g., all about a minute apart).

Data Inconsistencies

Make sure the data makes sense. For example, we had people reporting that they were employed at their current occupation longer than they were employed in their field. If you ask participants the same question multiple ways (see methods to prevent bots), check to make sure the responses are consistent.

Answered Honeypot Questions

An answered honeypot question = not a real person.

Too Polished or Confusing Open-ended Question Responses

Previously, bots’ open-ended question responses often didn’t make sense. However, generative AI makes it much easier for bots to give reasonable responses to open-ended questions. In our study, we actually found these responses to be too polished and formal with perfect grammar/punctuation and responses that sounded like they came straight from a scale measuring a similar construct. We also found that when surveys were completed around the same time (in batches), the structure of the responses were the same – giving us reason to believe it was fraudulent data.

Suspicious Email Addresses

Email addresses (submitted by participants for gift card payment) with random letters and numbers exceeding 4 digits can be a red flag.

Bot Detecting Algorithms

These can be used as a resource or first step to detect bots, but they cannot detect bots with 100% accuracy. They tend to be best used as a starting point, but require additional manual checking by researchers.