Tutorial: Nearest Neighbor Analysis using QGIS

Tutorial moved to http://www.qgistutorials.com/en/docs/nearest_neighbor_analysis.html

GIS is very useful is analyzing spatial relationship between features. One such analysis is finding out which features are closest to a given feature. QGIS has a tool called ‘Distance Matrix’ which helps with such analysis. In this tutorial, we will use 2 datasets and find out which points from one layer are closest to which point from the second layer.




The topics covered by this tutorial are
Let’s get started. In this tutorial, we will walk through the process and answer this question. Given the locations of all known significant earthquakes, find out the nearest populated place for each location where the earthquake happened.

We will be using the Natural Earth Populated Places dataset along with NOAA’s National Geophysical Data Center’s Significant Earthquake Database.

Follow the instructions in this tutorial to import the Significant Earthquake CSV file to QGIS. Also open the Natural Earth populated places layer using Layer → Add Vector Layer.



In the screenshot, each green point represents the location of a significant earthquake and each blue point represents the location of a populated place. We need a way to find out the nearest point from the populated places layer for each of the points in the earthquake layer.



We will use a tool called ‘Distance Matrix’ for this analysis. Open the tool from Vector → Analysis Tools → Distance Matrix.



Here select the earthquake layer as the Input point layer and the populated places as the target layer. You also need to select a unique field from each of these layers which is how your results will be displayed. In this analysis, we are looking to get only 1 nearest point, so check the box next to ‘Use only the nearest(k) target points, and enter 1. Name your output file ‘matrix.csv’, and click OK.



Once your file is generated, you can view it in Notepad or any text editor. QGIS can import CSV files as well, so we will add it to QGIS and view it there. Click Layer → Vector Layer. Navigate to the path where you saved ‘matrix.csv’ and click OK.



You will the the CSV file loaded as a table. 



Right click on the table layer and select ‘Open Attribute Table’.



Now you will be able to see the content of our results. The InputID field contains the field name from the Earthquake layer. The TargetID field contains the name of the feature from the Populated Places layer that was the closest to the earthquake point. The Distance field is the distance between the 2 points.



This is very close to the result we were looking for. For some uses, this table would be sufficient. I will demonstrate, how we can use Table Joins to integrate this results in our original Earthquake layer. Look at this tutorial for more details on Table Joins. Right click on the Earthquake layer, and select Properties.



Go to the Joins tab and click on the ‘+’ button.


We want to join the data from our analysis result (matrix.csv) to this layer. We need to select a field from each of the layers that has the same values. Select the fields as shown below.



You will see the join appear in the Joins tab. Click Ok.



Now open the attribute table of the Earthquakes layer.



You will see that for every Earthquake feature, we now have an attribute which is the nearest neighbor (closest populated place) and the distance to the nearest neighbor.


A useful thing to note is that you can even perform the analysis with only 1 layer. Select the same layer as both Input and Target. The result would be a nearest neighbor from the same layer instead of a different layer as we used here.

Let me know in the comments how you have used this tool and what kind of cool applications you can think of using it.