Developers at IBM on Friday released information on a new algorithm that predicts Twitter users’ location using the metadata contained in their last 200 non-geotagged tweets.
The researchers claim that the formula has an accuracy of about 70 percent. The basic idea behind the new algorithm is that the users’ tweets contain valuable information related to their probable location.
“We present a new algorithm for inferring the home location of Twitter users at different granularities, including city, state, time zone or geographic region, using the content of users’ tweets and their tweeting behavior,” the researchers explained.
“Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities.”
Research head Jalal Mahmud said IBM began the process by seeing whether they could predict the location of a Twitter account by analysing tweets and matching the content against their geotagged metadata.
The team with researchers Jalal Mahmud, Jeffrey Nichols, and Clemens Drews, started by tracking geotagged tweets from the 100 largest cities in America between July and August 2011, and isolated 100 users out of each location.
The last 200 tweets from each user were then examined by the team. Discounting private tweets from the mix, the team was left with 1.5 million geotagged tweets from almost 10,000 users.
Keeping apart 10 percent of the data for later tests, the researchers started analysing the remaining 90 percent, layer upon layer to create this location-estimating algorithm.
According to the research paper, the key to the algorithm is the additional information that the Twitter users include in their tweets.
100,000 pulled from the team’s data collection were submitted by users linking their Twitter accounts to the popular Foursquare location-based social networking platform. In the cases of 300,000 other tweets, users included the names of cities from the U.S. Geological Service gazetteer.