Spatial files can quickly balloon to a huge file size. This can use up valuable storage space, slow data processing, increase processing costs, and needlessly delay your project.
In this article, we examine methods to simplify your polygons in QGIS 3, consider the trade-offs, and demonstrate the efficiency gains. We will do this with a hypothetical example, through the non-existent game company FishFlop Games.
FishFlop Games is creating a new open world underground racing game that reflects the real-world built environment. Naturally, they have chosen the street racing capital of the world: Tasmania.
For efficiency's sake, they only want to use buildings data and create models for the world in close proximity to the driveable areas. So, they have added a 100 metre buffer to their planned road network, and saved this as a shapefile.
tas-roads-7-highways-etc-buffer.zip
Rather than digitise these buildings manually, FishFlop Games decides to purchase the accurate and authoritative Geoscape Buildings data for all of Tasmania. And so, a developer goes to their database to extract the building footprints that intersect with their shapefile. They are dismayed to see that after 30 minutes, the query isn't even 1% complete. It could take days to select and export the buildings within the buffer. Why is this?
To compare the locations of two sets of polygons, there is a process which iterates through every single vertex. This process calculates any overlaps between the vertices, and the areas that the polygons cover. Every vertex must be compared with every other vertex. More vertices means the query will take longer.
We can see part of the problem by making the shape editable, and then selecting the end of one of the roads with QGIS’s vertex tool.
Click Toggle Editing at the top.
Then click Vertex Tool, and you can click and drag over an area to select vertices.
Here the vertices at the end of a line are selected.
That graphic is not a blue curved line. Rather, it is a series of some 200 overlapping blue circles; each circle represents a vertex to produce this single end cap. Our example shapefile has thousands of such end caps.
Because of how finely-drawn this polygon is, it has a whopping 1,057,656 vertices! For context, the original road network this buffer was based on had only 190,375 vertices.
The developer is not concerned with capturing which buildings are within exactly 100.00 metres of the roads; modern graphics processors can handle the additional models, and so in this case an approximation is appropriate. Since perfect accuracy won't adversely affect the result in this case, we can solve the problem of the query length by simplifying the buffer shape, and reducing the pool of vertices to compare.
The developer decides that 10 metres is within their tolerances; they don't think a player of their game is likely to notice a missing or extra building from 90-110m away while racing.
There are a number of algorithms which are used for simplifying complex shapes. The specifics of these algorithms are beyond the scope of this article (read more here).
The ‘tolerance’ parameter when simplifying refers to how closely the simplified shape will match the original shape. In other words, any point along the simplified shape could be up to 10 metres inside or outside the original shape.
To simplify a polygon in QGIS, we'll use the built-in Simplify function. If it's not already open, click Toolbox on the top toolbar
This opens the Processing Toolbox panel. Type 'simplify' into the Search box. There may be more than one function with 'simplify' in the name, double click Simplify under Vector geometry (with the QGIS logo ).
First confirm you have the correct layer selected as the Input Layer. Then choose a simplification method from the dropdown, set a tolerance distance and click Run
We’ll now compare the outputs of each of the available simplification methods in QGIS. The original area, shown in blue, is 16.1MB with 1,057,656 vertices.
Douglas-Peucker
41,899 vertices, 657kB, 3.97% of original file size, 98.40% coverage of original area
Snap-to-grid
386,012 vertices, 5.89MB, 36.51% of original file size, 99.98% coverage of original area.
This simplification matches the original shape very closely, so it’s difficult to see any discrepancies from a distance. You can see some small discrepancies with the zoomed-in version on the right.
Area
103,746 vertices, 1.58MB, 9.82% of original file size, 99.74% coverage of original area
The Douglas-Peucker simplification doesn't match the original area as accurately as the others, but it has far fewer vertices. The developer assumes this will make the query time faster. They run the query with the Douglas-Peucker simplified buffer, which takes only 5 minutes and 30 seconds to complete.
The FishFlop developers are finished extracting the data using the Douglas-Peucker simplification, but they're curious to see how long it would have taken with the other simplification types which have more vertices.
Douglas-Peucker: 5 minutes 30 seconds
Snap-to-grid: 31 minutes
Area: 9 minutes
Now the query can run in a fraction of the time and be 98.4% to 99.98% as accurate.