A new study, published in journal Science, claims that researchers who mine data from social networks including Facebook and Twitter should be vary of their usefulness as serious pitfalls may arise from working with such data.
Such erroneous results can have huge implications as thousands of research papers each year are now based on data gleaned from social media.
“Publicly available data feeds used in social media research do not always provide an accurate representation of the platform’s overall data – and researchers are generally in the dark about when and how social media providers filter their data streams,” explained Derek Ruths, assistant professor at McGill University in Montreal, Canada.
“A large number of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behaviour,” Ruths said.
The design of social media platforms can dictate how users behave and, therefore, what behaviour can be measured.
“For instance, on Facebook the absence of a “dislike” button makes negative responses to content harder to detect than positive “likes,” added study co-author Jurgen Pfeffer of Carnegie Mellon University’s Institute for Software Research.
Researchers often report results for groups of easy-to-classify users, topics and events – making new methods seem more accurate than they actually are.
For instance, efforts to infer political orientation of Twitter users achieve barely 65 percent accuracy for typical users – even though studies (focusing on politically active users) have claimed 90 percent accuracy, the authors contended.
“The common thread in all these issues is the need for researchers to be more acutely aware of what they are actually analysing when working with social media data,” Ruths concluded.