In May and June 2019, Spatialised teamed up with 2Pi Software to deliver a new image processing service to the Australian Capital Territory government Emergency Services Agency (ACT ESA). The client problem was delivering web map tiles to an air-gapped Emergency Management System (EMS), coupled with a burgeoning city.
The ACT government responded to the burgeoning city issue by increasing aerial imagery collection frequency to four times a year. However, ACT ESA had no way of getting these images into their EMS in an efficient fashion – which resulted in 000 (911) call responders dealing with interesting location issues – because imagery and maps in the off-air system were not updated as quickly as suburbs are built.
Spatialised (me) and 2Pi Software had been talking a while about working on some projects – so we pushed in a tender for this job and won! For our first cooperative foray, it was a big one – and we learned a lot. Here’s the story of our short, sharp venture into massive image manipulation…
One of the initial choices we faced was ‘how will we do this task?’. There were a few requirements around solutions needing to be open source, and scalable using Amazon Web Services. After thinking hard about running a pile of geoservers to drive geowebcache, we decided on a pure GDAL + Python system.
What we wanted was a system where slippy map tile creation engines could be spawned essentially infinitely, each scaled according to the requirements of the image chopping task at hand, and each totally independent of the other. We also wanted complete control over memory and hardware capacity. Finally, we needed a system which could be spun up ‘in cloud’ or ‘at home’ depending on the user needs. It all seemed so straightforward…
Reinventing a few wheels
I first looked into the existing gdal2tiles.py – thinking ‘here is my solution!’. But not so fast. It turns out the airgapped EMS had very specific requirements about how slippy map tiles were named, and chopped up. It was an openlayers-based viewer, but very snarky about its tile naming and spacing.
We were provided with a geowebcache tiling schema, which I reverse-engineered in Python to make that happen.
I also plundered gdal2tiles.py to work out basic stuff like converting geospatial queries to pixel queries and back.
It was not the cleanest process – but it worked
GDAL + AWS
Here’s where 2Pi Software came into the picture. They designed and delivered a scalable image munching architecture based on AWS Batch and Elastic Compute Services. So everything I built needed to work in that architecture.
…and then we discovered GDAL’s shiny new (at the time) /vsis3/ drivers. These allowed us to forget all about local storage for compute instances, dump everything in s3 buckets, and focus on compute power. All the puzzle pieces started to come together.
2Pi Software were able to identify and solve myriad issues about S3 access, permission passing, long-running process issues – in short, their AWS expertise was absolutely key to the success of the project. Getting the GDAL parts right was just the start…
The final stack
So here’s what we did with it. The client was able to upload new imagery to a location on S3. Then, copy it into a processing bucket in our system and press go. What happened next?
- A Python process used GDAL + Shapely to create an index of aerial imagery inside the new data location, and store it in an s3 temporary store
- The index was used to generate a series of GDAL virtual rasters (VRTs), again stored in s3. These VRTs covered 0.1 x 0.1 degrees, approximately 10 x 10 km.
- Each virtual raster was transformed from its native CRS to the ACT government system target CRS, resulting in a stack of two virtual rasters – one listing input images with no transformation, and another saying how to transform input pixels.
- A compute engine was fired up for each generated mosaic (generally 0.1 x 0.1 degrees, or about 10 x 10 km), and each zoom level
- PNG tiles were cut and stored using the VRTs as input, delivering results straight to an S3 location, ignoring any tiles with partial nodata regions.
People familiar with slippy map tiling will spot one issue right away: what happens at low zoom levels (eg zoom level 12/11/10)? don’t you need to create a huge computer to spit out one tiny tile?
….yes. Every pixel in the raw imagery for a 10 x 10km mosaic needed to be read into RAM to create a single 256 x 256 pixel output tile! The gdal2tiles.py approach is to cut the highest zoom level from raw imagery, then interpolate up. We discussed this a lot with our client – however, they were continually impressed with the image quality delivered by our approach. In essence, we used a classic ‘delayed compute’ pattern.
By stacking up a bunch of VRTs we delayed any image processing until a single interpolation step from the raw airphotos to the output slippy map tile. This significantly improves image quality, especially using GDAL’s more complex interpolators, at the expense of compute time. We felt was a fair choice when the output is used to guide emergency responders around.
More so, by using GeoJSON-like data as the common language of indexes, we were able to strip dependencies right down. We started out heading into Rasterio and Fiona and a bunch of stuff. We ended up with Python’s standard library, AWS boto3, shapely and GDAL.
Removing dependencies felt GREAT!
The whole thing is available here: https://github.com/Spatialised/gdal-tiler , and we’re happy to take critical insights and pull requests!
Team building insights
A key feature of this work was merging two organisations with established practices / ways of work into a single unit over a really short time frame! Most of the job was undertaken remotely, with a couple of client site visits and a ‘team day’ held in 2Pi Software offices in the wonderful town of Bega.
My biggest lesson was that project managers are not always interruptions to work (lets just say I’ve had a number of pretty bad experiences with project managers being more or less a spare wheel getting paid more than anyone else on the team for doing a lot less). 2Pi Software have very established engineering practice and my ‘way of work’ is very much in the research-and-development oriented ‘press buttons till something works’ vein. In the end, working via 2Pi Software’s project manager made a lot of things run more smoothly.
Become like waterBruce Lee
I’m not sure Spatialised will grow engineering rigour anytime soon – its still very much a ‘do the weird new stuff’ shop. However, we know how to bend with the current when we need to! And become less frustrated when our co-conspirators do have that need for more structure. Like water, we fit the shape that makes itself apparent.
A second insight is that we can all take less on. At the start, both 2Pi Software and myself were attempting to wrestle the same problems, which ended up in a bunch of wrestling each other. Clarifying roles was really useful – figuring out what each of us were great at, and where we could help each other out.
There is always a discovery phase where this happens. But it has to end! I tend to be overly-democratic, and basically had to draw into some more authoritarianism in order to get all the wheels turning in time to get the job done.
And finally – be honest! 2Pi Software and I shared many terse words over this short period; however – to borrow from 2Pi Software CEO Liam O’Duibhir, we had eloped! and now we have to work out how to live with each other.
Would we do this again?
This project happened really fast. Spatialised gained an enormous amount of experience in handling GDAL and S3 in a really short time. 2Pi Software delivered a new-to-them processing stack which ran everything and was incredibly simple to fire up and tear down.
If we did this again we would pay more attention to tuning some of the GDAL + Python code to reduce memory requirements. We would also pay some more attention to tuning the AWS Batch environment, with careful matching of machinery requirements to jobs at hand. This part is limited by how the client has their AWS account set up – and 2Pi were excellent at advising the client on how to gear up for processing massive amounts of imagery for this job.
We learned a lot – about what worked and what could be better – and will adjust any future work / pricing / expertise requirements accordingly.
…and as a credit to all, the Spatialised + 2Pi Software + ACT ESA partnership survived this pressure cooker, and we look forward to working together again!
In addition to the usual sales pitch (below), we’d love to hear from you about your large-scale image processing needs. We’ve delivered an awesome (we think), fully open source, easy to pull apart process and would be happy to do it again! If you’re after an AWS partner certified software engineering team from regional Australia, I will happily recommend heading to 2Pi Software for help. And of course,
The sales pitch
Spatialised is a fully independent, full time consulting business. The tutorials and write-ups here are free for you to use, without ads or tracking.
If you find the content here useful to your business or research or billion dollar startup, you can support production of more words, ideas and open source geo-recipes via Paypal or Patreon; or hire me to do stuff; or hire me to talk about stuff; or just give me a seat on your advisory board and a 1% stake. Enjoy!