How we learnt to battle the bots
When Charlotte Betts, Dr Nicola Power and Dr Dermot Lynott discovered their data was in danger, they resolved to strengthen their detection methods. Here, they share their advice.
23 January 2024
Can you trust your own survey data? How can you tell if the responses are genuine or not? The escalating presence of bots is a concerning trend within online academic research. Bots are computer programs that can automatically complete online surveys, and, as we found in two of our recent research projects, distinguishing between a genuine human response and an automated input is becoming increasingly difficult.
Worryingly, the prevalence of bot responses seems to be on the rise, perhaps related to the latest advances in artificial intelligence. Researchers must now adopt innovative strategies to reduce the impact of bots on survey data and to preserve the integrity of the research and the validity of the underlying data.
Here, we share our own recent experiences of encountering bots with practical advice on how to spot them.
Raising the alarm
We came together after a chance encounter on X (formerly Twitter) where Dermot reached out to Nikki and Charlotte having noticed that they were collecting online survey data. Dermot, having recently battled the bots in his own research (see Silverstein et al., 2023), raised the alarm that survey responses were at risk. Nikki and Charlotte looked at their data and realised something was not right.
Their survey had been live for a matter of days, and they'd had an influx of responses. Considering that the survey was targeted at a hard-to-reach population (the emergency services), it was evident that something was amiss. The survey was closed immediately, and the responses were examined to check validity. From a total of 657 responses, and overwhelming 95 per cent were found to be completed by bots.
Dermot's experience was similar in his research exploring the open science practices of psychologists. Following initial slow recruitment where participants were directly emailed and asked to participate in their research, the research team created a social media version of their survey. When reviewing their data, it was found that 87 per cent of responses were classed as suspicious responders. By contrast, completed surveys via direct emails to participants had zero cases that were identified as bots or considered suspicious.
In both cases we were shocked by the extent of bots responding but saw that our experience was shared by others. For example, in two online surveys of the US Beekeeping Industry, Goodrich and colleagues found 72 per cent and 96 per cent of responses to be fraudulent. Evidently, survey bots are agnostic to the disciplines they target, and have become an increasing problem for online data collection, posing a threat to data and research integrity.
Based on our experiences, we share some examples of different detection methods that can be used to spot a bot in your online data. These methods were generated from mixed methods research involving both quantitative and qualitative online surveys. We recommend that researchers combine detection methods to maximise identification as bots can be very convincing! Further, we suggest some strategies that researchers can adopt to prevent them from sneaking into your data.
Bot-spotting examples
We propose that one of the key methods of bot detection is to examine responses to open-text questions. We found that bots would provide inconsistent, duplicate, inappropriate and/or grammatically-suspicious open-text responses. Such responses were not immediately deemed as suspicious, however, examination of the entire dataset revealed important patterns indicative of bot activity.
For example, in Power and Betts' survey, participants were asked an open-text question about what features of training participants deemed important for joint emergency working. The response: 'Strengthen cooperation and communication: Promote cooperation and exchange among institutions to share best practices and experiences to enhance interoperability training' might seem genuine at first glance, however, upon examination of the full dataset, this response was repeated verbatim by 10 different 'participants'. We had spotted a bot.
Quantitative responses can also be examined to identify potential bots. Dermot identified that when presenting a rating on a 1 to 7 scale, bots tended to present peculiar rating patterns and exhibited lower variance than human responses. For example, while a set of ratings on a 1-7 Likert scale from a genuine participant might look like this: [3, 7, 2, 2, 4, 4, 6, 1], a bot response might look more like this: [5, 5, 6, 5, 5, 5, 6, 5].
Further, fraudulent responses can be identified from the metadata recorded by a survey platform or contact information participants provide. Several surveys may start and/or end at the same or similar times. For example, one survey recorded 10 participants starting at 10:17am, then 8 starting at 10:18am, then 8 more at 10:19am. Additionally, for both surveys participants could provide their email address to enter a prize draw, where bot responses were characterised by unusual formatting, capitalisation, and the inclusion of numbers or random additional letters.
Based on our shared experiences, we have created a list of detection methods that can be used to spot bots in your data set.
• Exploring qualitative responses. The content, structure, patterns, and possible inconsistencies in open-text responses can reveal whether the respondent is a human or bot. This may be immediately apparent or may require closer inspection of the data set.
• Verifying quantitative responses. Bot responses may be more likely to go undetected in quantitative responses. We recommend that such data is analysed and compared to genuine human responses to examine peculiar rating patterns (e.g., low variability).
• Consider the data holistically. Researchers should look at their data set holistically to uncover unusual patterns, such as repeated verbatim qualitative responses and similar survey start and end times.
• Working as a team. Bots are increasingly able to resemble human responses, it is very easy for one person to miss something. We recommend that a second researcher should view the data. Even better is to blind code a sample and compare responses, iteratively sifting the data to increase confidence that the right data are being retained.
• Spotting multiple red flags. Do not consider red flags in isolation. Rather, if a researcher sees a constellation of such flags for any given participant, it is more likely to be a bot response. For example, an unusual email address combined with a word-perfect qualitative response, or a preponderance of responses claiming affiliation with a particular institution. Bots can be very convincing, so combining detection methods is vital to maximise identification.
Can you block a bot?
In the face of such a diverse array of bot patterns, how should researchers protect themselves from a bot assault? Can we devise methods that will block bots completely? Or, at least, help us to more easily spot those bots who get the through net?
We implemented several measures in our surveys to prevent and identify bots, ranging from using Captcha and fraud detection tools, to implementing additional questions, and planning how our surveys would be distributed. Some of the methods we used within our own research include:
• Use of survey fraud tools. We included Captcha verification at the beginning of our surveys, which presented respondents with words or characters that must be correctly typed out to proceed with the survey. Additionally, fraud detection methods can prevent participants using the same IP address from completing the survey multiple. However, such measures are not infallible as we found bot responses to receive low detection scores.
• Survey distribution. We created two versions of our surveys – one for email distribution and one for social media. This strategy ensured that the email survey was entirely bot-free and could be considered as reliable data. We recommend that recipients of survey links are explicitly asked not to forward or share via social media to protect the email survey link.
• Implementing additional questions. The use of attention check questions can be used to help better detect bots that get through the net. For example, in both our surveys we asked whether participants had recently been on a trip to Mars. Those stating yes or unsure could be quickly removed from the data set. In addition, including cross-validation questions, such as asking for country of residence and current university, allowed researchers to identify incompatible responses.
• Mixed-methods approach. Open-text responses were extremely useful in detecting bots. Even if you are conducting quantitative research, we recommend the use of at least one open-text response, such as asking participants if they have 'any final comments?' at the end of the survey.
A bot-free future?
Given the persistent growth of artificial intelligence, particularly in conjunction with large language models like ChatGPT, researchers must continually refine strategies to discern human responses from bot-generated ones. This is task is not easy, but raising awareness on how to spot a bot is a first step in our collective need to protect the integrity of psychological research.
Although several preventative measures can be implemented with some success, we have learned that online surveys continue to be plagued by bots. Additionally, we anticipate that it will become increasingly difficult to distinguish between genuine responses and bots in the future, as they learn how to evade verification measures and generate more realistic responses. Our list of preventative measures is by no means exhaustive, so we'd love to hear what methods other researchers have also used.
Miss Charlotte Betts1, Dr. Nicola Power1, Dr. Dermot Lynott2
1University of Liverpool Management School, University of Liverpool
2Department of Psychology, Maynooth University
References
Goodrich, B., Fenton, M., Penn, J., Bovay, J. & Mountain, T. (2023). Battling bots: Experiences and strategies to mitigate fraudulent responses in online surveys. Applied Economic Perspectives and Policy, 45(2), 762-784.
Griffin, M., Martino, R. J., LoSchiavo, C., Comer-Carruthers, C., Krause, K. D., Stults, C. B. & Halkitis, P. N. (2021). Ensuring survey research data integrity in the era of internet bots. Quality & quantity, 1-12.
Silverstein, P., Pennington, C. R., Branney, P., O'Connor, D. B., Lawlor, E., O'Brien, E., & Lynott, D. (2023, May 22). A Registered Report Survey of Open Research Practices in Psychology Departments in the UK and Ireland.
Storozuk, A., Ashley, M., Delage, V. & Maloney, E. A. (2020). Got bots? Practical recommendations to protect online survey data from bot attacks. The Quantitative Methods for Psychology, 16(5), 472-481.