Approaches to aerial photo quality with GDAL + openCV

For the past while I’ve worked with FrontierSI on a quality assurance (QA) project for aerial photographs (see: It is a huge undertaking, attempting to build tooling which lets people step back from manually inspecting massive datasets and let computers do some of the work for them. Essentially this project was all about ingesting pixels and returning some kind of score, or indication. Millions of pixels to a single number…

Quick heads up – it isn’t a story about machine learning or 3D amazingness, it’s about embedding some really well trodden principles of digital imagery / photogrammetry into some code which can run on large geo-datasets. It’s also a little about working from ideas to ready-to-MVP.

I also need to send a massive shout to PyImageSearch and Adrian Rosebrock – a lot of what we built is based directly on his published tutorials. Go and read them if you’re interested in Python-based image analysis; and buy his books/courses if you can.

OpenCV and GDAL images are numpy arrays

This is really the key for getting geo-stuff happening with Python, GDAL and openCV: both GDAL’s Python bindings and openCV create numpy arrays. Conveniently both use the same array coordinate system – top left is [0,0], bottom right is [N_X_Pixels, N_Y_Pixels] .

A key difference is thinking about band orders and numbers. OpenCV thinks in terms of three bands, GDAL can have a lot. Using OpenCV functionality, you need to decide what bands OpenCV should think of as ‘red, green and blue’.

For RGB aerial photos this is kinda straightforward – since the input imagery we deal with have 4 bands at most – RGB and Alpha – and we can drop our alpha band by rearranging the numpy array we got from GDAL’s readAsArray():

    imchunk = np.dstack((image_chunk[0],image_chunk[1],image_chunk[2]))

Designing for more bands, we would need to think a little harder about how to use OpenCV tooling. Most likely we’d look to split analysis along single bands and figure out how to relay the results efficiently.

Design for many things…

When we write code to do a QA check, it has two modes of operation we know about:

  1. Read in a single photo before any orthorectification (ie straight off a camera) and do some checks
  2. Read in an orthorectified, georeferenced image and do some checks

In the first case all we need is openCV – straightforward ‘read image and do stuff’ work. In the second case, we can use openCV but we’re better off to use GDAL, since we can perform additional geo-checks on the image. And in the second case, we may be dealing with a huge mosaic we can’t process all at once…

With this in mind, all the image analysis functionality is built to work on OpenCV style images, translating from GDAL style images as needed.

Chunking things up

Orthorectified, georeferenced imagery is a key deliverable for aerial imagery programmes. Sometimes these come tiled, sometimes these come as giant mosaics – it depends on what the client asks for. Sometimes tiles can be ingested all at once, although lets think about it a bit – a single 1 x 1 km tile at 5 cm resolution is 20000 pixels by 20000 pixels, so quite a chunk to suck into memory and do stuff to.

Mosaics can be massive! Tens to hundreds of kilometres in extent, and getting far too big to rationally process as a chunk. Because these are also the product of merging many images, it’s also possible (common) that QA results will be different between different parts of the image. With this in mind a big part of this work was developing a chunking strategy, and a QA traffic light scheme for the code running on each chunk.

We had an additional constraint – mosaics in proprietary formats. The GDAL used by the QA tooling needed to be built with a specific driver; and convenience wrappers for GDAL (eg RasterIO) couldn’t be used. In a way its fortunate we’d been down this road just a few months earlier on another project.

Raiding the work we did there (and we made a tile index and chopped little bits from mosaics to run QA tests on. Results were compiled in a GeoJSON compatible layer, which was then directly use-able in a desktop GIS or web viewer. Here’s a sample output, showing QA traffic lights for chunks of a reasonably large mosaic:

Ingesting this entire mosaic into a single image process is impractical from both compute and validity viewpoints. We could test a GeoOBIA process to create segments, however that takes time that QA processors don’t have – a simple grid seemed to be the best balance between granularity and efficiency.

When to vector and when to raster and when to array

GDAL + Numpy gives us another analytical superpower – we can do a lot of the work on fast Numpy arrays and metadata, applying analytical tools to pixels at the last possible moment, when we’ve figured out exactly what pixels we need to work on.

We use Shapely and PyProj a lot here also – deriving geometries from pixel values using GDAL, then pushing them to Shapely for translation to EPSG:4326 for tasks where we need a common CRS (finding image overlaps) and for GeoJSON standard-meeting. PyProj handles straightforward point coordinate transformation. A lot of the geometry creation is done long hand, writing out the parts we need rather than using a converter library.

This is done mainly to expose the inner workings and keep dependencies really low. From my perspective this project was all about doing enough to hand over easily, not make my own life as convenient as possible at the expense of downstream application developers and, well, people I like and want to keep working with!

Keeping to a mantra of ‘compute late and compute light’, Numpy does most of the heavy lift to figure out what parts of the raster we want to look at and analyse. We use values from image metadata plus Numpy indexes to create vectors for some tasks (and output). We dig out just the raster parts we need to analyse-at-once.

…and of course, working in indices and metadata allows for a really low overhead switch to parallel processing of image parts, if thats where the application developers go…

Scoring imagery

The cats are already out of the bag about using a ‘traffic light’ approach for QA scoring. To explain further, it means for a given grid square, a green tile means ‘definitely good’; an orange tile means ‘near some threshold’ and a red tile means ‘definitely inspect’. For QA purposes, ‘fail’ is tricky – so we step back and say ‘please look at this part’. These tools are designed to guide a human operator to problematic areas rather than make an absolute decision; and always open to some interpretation.

For most of these processes, thresholds are set along well known photography principles. What means ‘overexposed’? a histogram that shows a lot of pixels at the high end of the dynamic range. What means ‘low contrast’? pixels spread across the dynamic range with a low intensity peak. What means ‘blurry’? examining the result of passing a laplacian kernel across an image. For all of these, the decisions about where to draw lines are data-informed but also mutable – an end user needs to be able to draw their own lines in the sand about what is acceptable.

It would also be horrifically frustrating for a vendor to deal with false negatives based on some computational misinterpretation – so we need to design for the possibility that our pre-guesses are wrong. So we also bake in adjustability!

Not everything worked so well…

One of the QA tests was about ‘offsets to reference data’. The first cut at this task measures differences between ORB keypoint detectors to see how far identified features were from each other. It’s experimental, and may be fully replaceable by a simpler image correlation metric. It did offer some interesting challenges, including how to report the results and how to georeference detected keypoints.

This worked really well for some test imagery – deliberately shifting an image by a know distance, it reliably returned the right numbers in terms of keypoint offsets. Throwing it out into the wild for real imagery? Not so great. Some of the issues stem from finding enough keypoints, others from the chain-of-processing in terms of locating the keypoints in space. Here’s an example of what has been going awry – all the blue and yellow dots are meant to be keypoints detected from one image ‘block’ in space. The blocks are in the right place in each of these images (with different pixel resolution), not so for keypoints. It’s a work in progress, and will get there!

One thing that did work really well was identifying the intersection between test and reference imagery, to only test overlapping parts. Working with different resolutions also works reasonably well, with accounting for different keypoint pixel locations worked in.

Unpacking and improving the processing strategy, though, will be straightforward, by design – which leads to the next section neatly!


Designing research code for general use is a bit of an artform. In a way it helps that I’m not a software engineer – if my code gets too complicated I can’t figure out what on earth I did, so I like to keep things as simple as possible. This likely means I’m not taking advantage of some advanced capabilities of Python – however, it also means I can hand my work directly to another part of the project team and they ‘get’ it. Or, I can walk through everything with stakeholders who can see and understand most of what’s going on.

Which is actually perfect for my role here – productionising research. I can think of so many things in terms of what knobs users need to tweak, what should be hard coded, what needs to be parameterised, how things should speak to each other. For the rest, I need advice from people who are building other parts, and who are going to be using the thing – and for good advice I need to explain easily what on earth I’m doing!

The imagery shown in this post are all drawn from rendering output products in QGIS. The ultimate aim, however, is a set of web and desktop applications which will render the same results. This drove a decision to do the hard yards and make all the output standard GeoJSON. It would have been a lot easier to make some bespoke JSON result file, but that way leads to downstream nightmares. Stakeholders can ingest results in whatever desktop GIS they like, application developers have an open standard to work from, everyone is happy.

Learnings from, agile, open projects

This is pretty normal stuff for many operations, I’m laying it out in the hope that it will help people gain confidence about remote, agile, open ways of work.

Spatialised worked fully remotely on this project, and helped guide FrontierSI’s practice for remote team interactions. The basic structure was a fortnightly (sprint) catchup with stakeholders, at which I presented all the work undertaken, issues, goals, and gathered feedback – alongside other parts of the project. The key contribution from this end was an asynchronous daily standup – or ‘slack standup’. A video call version was too time consuming and tedious, so we did a daily three dot point dump in the team slack on ‘yesterday’s work, today’s plan, and blockers’.

This worked really effectively – it sometimes led to more conversation, and generally led to excellent visibility of who was doing what, all consumable whenever it worked best. It’s not a new tactic, I’ve used it a lot before – and I hope FrontierSI gain from having it introduced (having said that, it may have already been a thing in other projects, just not any I’ve worked on). And I’ll definitely do this again. Being a fully remote worker who sometimes needs to coordinate multiple projects at once, I really appreciate asynchronous communications – they let me design my ‘space’ for work well.


This project was both fun and frustrating. Getting computers to see is hard – translating ‘oh this is what I mean’ to code is hard. Hopefully in this project I’ve done OK, and provided some useful tools. In the end these will all be thrown away/modified/changed beyond recognition – this part was all about a first cut of translating between loose ideas and something a computer can do. The rest is beyond me!

I really appreciated working with FrontierSI over a very difficult fire season – while accountability is key for any small operation, it was great to know that I could drop the work ball and focus on safety from time to time, without risking my place in this project. And if you’re in the business of obtaining and assessing aerial imagery, go and talk to them about the product! See:

The sales pitch

Spatialised is a fully independent, full time consulting business. The tutorials and write-ups here are free for you to use, without ads or tracking.

If you find the content here useful to your business or research or billion dollar startup idea, you can support production of more words, ideas and open source geo-recipes via Paypal or Patreon; or hire me to do stuff; or hire me to talk about stuff; or just give me a seat on your advisory board and a 1% stake. Enjoy!