by Daniel Velkov, for comments go to Hacker News
The story starts with an insightful graphic which was published in The Economist. The original work was done by the McKinsey Global Institute which itself is based on data collected by the University of Groningen (link to the raw spreadsheet).
The graphic plots the movement of the world's geographic center of economic activity for the years between 1 and 2025. There was one thing which bothered me, and I hope it bothers you too, the points from the 20th century are all positioned in or above Scandinavia which seems unlikely to be the center of anything in the world. You can find the reason for that in the caption on the McKinsey webpage. Their report looks at the Earth as a sphere and finds the economic center of gravity which falls somewhere inside the sphere. To plot it on the map, they take a radius through the center of gravity and intersect it with the surface. To see why this is a problem, consider the extreme case of only USA and China existing and their economies being equally sized. In this setup the economic center will be close to the North Pole because the two countries happen to be on almost the exact opposite sides of the Earth and roughly at the same latitude.
I had an idea how to visualize the same data in a better and more intuitive way. Instead of the original method, I did all the calculations in 2D with respect to the usual map coordinates. This has the nice property that in the USA/China case from above, the center will end up somewhere near Spain which is the midway point between USA and China on most maps. The result of this approach is:
You can see that it shows the same directional trends as the first chart. In the year 1 AD the world's economic center is close to China and India. Looking back at the spreadsheet for that year, you can see that the total economic output of Asia is 5 times larger than Europe's. There is a slight move to the east, in the years 1-1600, which is due to the decline of the Roman Empire and Europe in general during that time. As time progresses, Europe's economy rises and the center starts shifting west. With the help of the US, this movement continues until 1950 when the trend reverses. The return of the Asian economies starts pushing the point east and in 2000 the position is at the same level on the east-west axis as it was in 1900.
The rest of this post describes how to gather the necessary data and produce the plot. It includes processing the raw spreadsheet data with the help of pandas and plotting with matplotlib and its basemap toolkit.
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline
import pandas as pd
The first step was to find a source for historical GDP data by country. One of the top results when searching for "historical gdp data by country" was this spreadsheet which is what The Economist and McKinsey are using.
gdp = pd.read_csv('horizontal-file_03-2007 - GDP.csv', skip_footer=4, thousands=',')
gdp.ix[:5,:10]
The values are in inflation adjusted 1990 US dollars.
gdp.rename(columns = {'Unnamed: 0':'Country'}, inplace=True)
Some of the country names had an extra space at the end:
gdp.Country[:10].values
Let's clean those up. Also set the Country column as an index which we'll use when joining with the geo coordinates data later.
gdp.Country = gdp.Country.map(lambda s: s.strip())
gdp.set_index('Country', inplace=True)
As you saw in the table above there are some NaN values. To ensure a smooth series of values for each country I fill the missing values with the nearest known value to the left. To handle the countries with an NaN in the first column we can set those cells to 0.
gdp_fill = gdp.applymap(lambda v: v if type(v) == np.float64 else np.nan)
gdp_fill.astype(float)
gdp_fill.ix[:,0].fillna(0, inplace=True)
gdp_fill.fillna(method='ffill', axis=1, inplace=True)
gdp_fill.ix[:5,:10]
The second piece of data that I need is a list of the geographic centers for each country. A quick search led me to http://gothos.info/resources/ which has exactly what I need.
coord = pd.read_table('country_centroids_primary.csv', sep='\t')
coord.rename(columns = {'SHORT_NAME':'Country'}, inplace=True)
coord.ix[:5]
One problem that I found was that there were countries present in one table but not the other. Here are the lists of all those cases:
print "Has coordinates but no GDP:", sorted(set(coord.Country) - set(gdp_fill.index))
print "\nHas GDP but no coordinates:", sorted(set(gdp_fill.index) - set(coord.Country))
Looking through the two lists the only country which was worth a fix was Russia (or Russian Federation).
coord.Country[coord.Country=='Russia'] = 'Russian Federation'
A related problem were the separate entries for Russia and USSR. To reconcile those I filled the empty values for Russia with the ones from USSR.
rf = gdp.ix['Russian Federation']
ussr = gdp_fill.ix['Total Former USSR']
gdp_fill.ix['Russian Federation'][rf.isnull()] = ussr[rf.isnull()]
Set the Country column as the index for the coordinates table and join the two tables.
coord.set_index('Country', inplace=True)
joined = pd.merge(gdp_fill, coord, left_index=True, right_index=True, how='inner', sort=True)
The new table named joined now contains the columns from both tables. We'll need to know where the GDP table's columns end:
joined.columns[187:191]
N_years = 189
Now we have everything ready to plot the data. First let's look at the number of GDP entries per year (before we padded the gdp table):
plt.plot([sum(gdp.ix[:,x].notnull()) for x in range(189)])
Those initial years with the lowest counts will result in overlapping points in the plot because the padding we did takes the values from the previous column. When there are a low number of original entries this means that the consecutive centers will be almost identical. To remove them I picked a threshold of at least 15 valid entries per year in order to plot that center.
threshold = 14
It's time to plot what we have. The most important lines in the next code block are:
lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
total = sum(year_gdp)
x, y = world(sum(lon)/total, sum(lat)/total)
To find the economic center we use the formula for the center of mass. We weight the coordinates for each country by the respective GDP, take the sum of the weighted coordinates and divide by the sum of all the GDPs.
def plot1(llon, llat, ulon, ulat):
plt.figure(figsize=(20, 12))
plt.title('World\'s Center of Economic Activity (years 1-2003)', fontsize=24)
# Initialize the map and configure the style
world = Basemap(resolution='l',projection='merc', area_thresh=10000,
llcrnrlon=llon, llcrnrlat=llat, urcrnrlon=ulon, urcrnrlat=ulat)
world.drawcoastlines(linewidth=0.1)
world.drawcountries(linewidth=0.1)
world.drawlsmask(land_color='#E1E1D1', ocean_color='#F0F0E8')
# indices of year columns with enough GDP data
years = [x for x in range(N_years)
if sum(gdp.ix[:,x].notnull()) > threshold]
N = len(years)
for (c, i) in enumerate(years):
year_gdp = joined.ix[:,i]
# weight the coordinates for each country by the corresponding GDP
lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
total = sum(year_gdp)
# find the center of mass and convert to map coordinates
x, y = world(sum(lon)/total, sum(lat)/total)
world.plot(x, y, 'o', color=cm.Spectral(float(c)/N),
markersize=10, label=joined.columns[i])
# Pick the first 4 points and then every 20th for the legend
handles_labels = zip(*plt.gca().get_legend_handles_labels())
handles_labels = [handles_labels[i] for i in range(4)+range(5,N,20)]
handles, labels = zip(*handles_labels)
legend = plt.legend(handles, labels, title='Year', fontsize=16, numpoints=1)
plt.setp(legend.get_title(),fontsize=18)
return world
plot1(-110, -40, 140, 65)
The plot shows the expected east-west-east move of the economic center. There is a big cluster of points between 1900 and 1940. To see it better we can zoom in:
plot1(-10, 35, 30, 45)
My guess is that the small reversal in the trend is caused by World War I and the Great Depression. Those events didn't affect Asia as much as the US and Europe.
Another zoomed-in version of the map omits the legend and instead plots year labels next to a sample of the points.
def plot2(llon, llat, ulon, ulat):
plt.figure(figsize=(20, 15))
plt.title('World\'s Center of Economic Activity (years 1-2003)', fontsize=24)
world = Basemap(resolution='l',projection='merc', area_thresh=10000,
llcrnrlon=llon, llcrnrlat=llat, urcrnrlon=ulon, urcrnrlat=ulat)
world.drawcoastlines(linewidth=0.1)
world.drawcountries(linewidth=0.1)
world.drawlsmask(land_color='#E1E1D1', ocean_color='#F0F0E8')
t = 0
years = [x for x in range(45) + range(50, N_years, 5) + [N_years-1]
if sum(gdp.ix[:,x].notnull()) > threshold]
N = len(years)
labels = [0, 1, 3, 5, 7, 11, 18, 24, 28, 36]
for (c, i) in enumerate(years):
year_gdp = joined.ix[:,i]
lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
total = sum(year_gdp)
x, y = world(sum(lon)/total, sum(lat)/total)
world.plot(x, y, 'o', color=cm.Spectral(float(c)/N),
markersize=12, label=joined.columns[i])
if t in labels:
plt.annotate(joined.columns[i], xy=(x-150000, y+140000),
family='cursive', size=23)
t += 1
return world
plot2(-20, 20, 80, 50)
No comments:
Post a Comment