Tutorial: Using Tabular Data in QGIS

Tutorial moved to  http://www.qgistutorials.com/en/docs/performing_table_joins.html

Not every dataset you want to use comes as a shapefile, or in a spatial format. Often the data would come as a table or a spreadsheet and you would need to link it with your existing spatial data for use in your analysis. For example, you may be working with census data for a country where the data is in a table, csv, excel file, access database table or any other non-spatial format. To map this data, you have to merge this with a shapefile containing boundaries of census tracts. This operation is known as ‘Table Join’ and this tutorial will cover how to carry out table joins in QGIS.
For this tutorial, we will use country polygons layer 10m_admin_0_map_units from the Natural Earth dataset and merge attributes from a table from World Bank database to create a ‘wealth distribution map’. I downloaded the 2008, 2009 and 2010 data for Gross National Income (GNI) per Capita from the World Bank’s Indicator website. (Click on Databank, and follow the wizard to export the data) This data is offered in several tabular formats, and I chose CSV as the output format.
  • Examine the tabular data. You can open the CSV file in a text editor, or use any spreadsheet program such as Open Office. You will notice that there is a header row with name of the column, such as, ‘Indicator Name’. The GNI data that we want to map is contained in the columns - 2008,2009 and 2010. Also there is a 3-letter ‘Country Code’ column which assigns a unique 3 digit identifier to each country.
  • Open QGIS and load the 10m_admin_0_map_units layer via Layer → Add Vector Layer. This layer contains polygons representing countries.
  • Open the attribute table of the vector layer by right clicking on the layer and choosing ‘Open Attribute Table’. You will notice that the attribute table also contains a 3-letter country code. We can use this unique identifier to join the GNI data to this vector layer. You can use any common field between the two layers to perform the Table Join.
  • Before we go ahead and join these table, you will need to be aware of some limitations of importing tabular data to QGIS. These limitations are mostly due to the OGR’s CSV driver. I encourage you to refer to this if your import does not work as expected. I have outlined some common limitations and workarounds. 
    • Compatible formats: The tabular data must be in CSV format. Most spreadsheet programs or databases will allow you to export the data as ‘Comma Separated Values(CSV)’ file. Make sure your CSV format is compatible with OGR’s CSV Format. In our example, we downloaded the data from the website is a compatible csv format, so we need not make any changes.
    • Header rows: You must have noticed that our data had a header row which specifies the title of the data column. These are useful in identifying the columns. The OGR driver recognizes header rows, but if your header row contains any number, such as “2008”, it will treat the row as ‘data’ and not as ‘header’. To prevent this, you must put number-only headers in double quotes “”. Open the CSV file in a text editor and add the quotes as follows.
    • Numbers: By default, OGR driver treats all data in CSV file as Text. If your data contains numbers it will be treated as text and it will be useless for analysis or mapping. To overcome this, you will need to create a ‘CSVT’ file. See this post for more details on using CSVT files. Essentially, you will have to create a text file specifying what is the data type for each column. For the above dataset the CSVT file will look as follows. Remember to name the csv file the same way as the original CSV. So if our data is in gni.csv, the CSVT file should be named gni.csvt
  • Now we are ready to add this data to QGIS. It’s a bit odd, but CSV data tables can be imported via the ‘Add Vector Layer’ menu item. Go to Layers → Add Vector Layer. Select your CSV file and click Open.
  • The table will appear as a layer in QGIS Table of Contents.
  • Right click on the gni layer and select Open Attribute Table. Make sure all the columns in the CSV files are correctly imported.
  • Now we will join this table layer with the spatial layer. Double-click on the 10m_admin_0_map_units layer to open Properties dialog. Go to the ‘Joins’ tab.
  • Click on the ‘+’ to create a new join. The ‘Join field’ will be the unique identifier from the tabular data. The target field will be the unique identifier that will link the tabular data to the polygon data. Click OK. Once the join is created, click OK and return to the QGIS canvas.
  • Open the attribute table for the 10m_admin_0_map_units layer. You will notice that the table now contains additional fields from the CSV file.
  • Remember that this ‘join’ is temporary. It is not part of the attribute table for the 10m_admin_0_map_units layer, but just linked dynamically to the CSV layer. If you want to permanently join the attributes, you must save it as a new layer. Right click on the 10m_admin_0_map_units layer, select “Save As ...”. Name the new layer ‘gni_ne.shp’
  • Add the new layer to QGIS. Now you can style this layer to create a map showing the wealth distribution in the world. Use the technique in this tutorial for instructions on selecting appropriate styling method.
Hope this tutorial helps you in working with new and interesting datasets for your project. Let me know in the comments what obstacles you faced in importing tabular data and how you solved them.