Software carpentry – achievement unlocked

Last week I became a certified Carpentries instructor. This means that I have done some training, and demonstrated some skill in, teaching grownups how to write programs and wrangle data.

It also means I’ve joined a worldwide community of people who are dedicated to helping other people avoid mistakes they’ve made. Which is an excellent goal! If you’re a scientist, analyst of any sort, or even someone needing to wrestle with excel spreadsheets which are getting out of hand, I’d urge you to check the materials out. There are streams for Software Carpentry and Data Carpentry , more or less explained here.

I’m excited! Perhaps I’ll see you at a workshop someplace – or even organise one for you!

16 bit to 8 bit RGB colours with PDAL and Python

Sometimes LAS files have RGB values stored as 16 bit colours. This currently upsets the Potree viewer, and maybe some other things. The following recipe using Python and PDAL ‘eightbitifies’ colours. It also compresses incoming LAS files to LAZ.

PDAL has a neat filters.python option, which passes every incoming point into a python function. Flexible huh?

First, a Python function to scale colours:

import numpy as np

def eightbitify(colour):
    notzero = np.where(colour > 0)
    colour[notzero] = (colour[notzero]/255) - 1
    return colour

def scale_colour(ins,outs):
    outs['Red'] = eightbitify(ins['Red'])
    outs['Green'] = eightbitify(ins['Green'])
    outs['Blue'] = eightbitify(ins['Blue'])
    return True

ins is a numpy array of incoming points from PDALs reader. PDAL dimensions define what’s in there – so here I’ve asked filters.python to read Red, Green, and Blue into numpy arrays to work on. The entire set of data will be loaded up, making python array operations useful and fast (if you have the memory). PDAL’s —stream operator isn’t enabled for filters.python –  if you are memory constrained, think about filters.splitter to make manageable chunks first.  This actually makes sense – PDAL has no way of knowing what you’re going to write into your python function!

Next, construct a JSON pipeline to run the conversion:


    "pipeline": [
            "type" : "readers.las",
            "filename" : "file.las"
            "type" : "filters.python",
            "script": "/opt/data/",
            "function": "scale_colour",
            "module": "scalecolour"
            "type" : "writers.las",
            "filename" : "outfile.laz"


…and finally invoke PDAL invocation to make it go. Here it’s wrapped in a shell script to process a bunch of data. Highlighted lines show the actual PDAL invocation. I’ll replace the shell script to loop over files with a python function once it’s written.

for f in lasfiles/*.las;
    #echo $f
    fileout=$(basename $f ".las")
    #echo $fileout
    docker run -it -v /set/your/working/path:/opt/data \
                    pdal/pdal pdal \
                    pipeline /opt/data/scale_colour.json \
                    --readers.las.filename=/opt/data/${f} \



I ran this using the pdal/pdal:latest docker image:

docker pull pdal/pdal


A geospatial tinkerer’s feast: FOSS4G 2017

A random Boston photo – city skyline with a public fireplace. Awesome!

Between 11 and 19 August 2017 I was in Boston (USA) for the OSGeo foundation’s FOSS4G conference. To write a one sentence summary of the event? It’s like a cornucopia of open source geospatial ideas, applications and inspirations! Variously the event has been compared to disneyland, summer camp, and a few other things. It’s my second round of FOSS4G and I have only this to add: Most of my working life right now is defined by things I learned at FOSS4G 2016 – and my experience this year has compounded/grown/inspired even wilder ideas to implement. It’s going to absolutely consume my waking/working hours.
The event really is that good!

My attendance is funded by my employer (Australia’s National Computational Infrastructure), on the basis of a talk on point cloud data services using open standards. This makes my life hard because NCI has fingers in many geospatial pies, and as one human I need to cover my own domain expertise, plus things-of-wider-organisational-interest. However, it’s OK and I am very grateful to be able to travel across the world to talk shop. I hope I do a reasonable job of representing my workplace as well!

The beauty of going to the conference is that I can look at the program and try to find people to talk to face to face, or representatives from their organisations. I also get to survey the scene and figure out which talks I really need to watch later. Talks are all recorded, and made freely available later (I’m waiting eagerly for this year’s crop to arrive).

So let’s dive into what floated my boat. Please bear in mind that in general, the standard of idea presented at FOSS4G is really high! So don’t be disappointed if you didn’t make this list – it was really hard to filter what to see, and my list of catchup videos to watch is long.

Things I got really excited about


I played with fundamental spatial algorithms in Python, I was comforted in my old age using GDAL on the command line, bash scripts and awk; my mind was blown using  geonotebooks and geopyspark,  and I got to play with Digitalglobe’s GBDX tools for an afternoon. Unfortunately this conflicted with LIDAR processing in GRASS, and I will definitely step through that one when I can. I really also wanted to do one on cartography, but that’s too far out of my day job for work to pay for.

In short, the workshops were amazing, educational, and delivered by people with deep expertise. My own workshop proposal did not get up, but hey – I am an OSgeo minnow compared to the leviathans I got to go and learn from.

So workshops – yes – do. As many as you can! These are all awesome.

Talks and talks and more talks

Here’s where things got really hard – I made a list of every talk I went to, what I took away, and what I could equally have gone to for work.  It’s five pages long, I’m not going to repeat it. Here is the highlight reel – what sticks in my head a week after the fact, in no particular order:

Paul Ramsey’s keynote on the economies of free and open source software. I really liked it, because I could some new ways of thinking about how I interact with the OSgeo ecosystem in a productive way. I’m a trader in the attention economy, and dive a bit into the giving economy – and I’m feeling a lot less guilty about not being able to write c++ or contribute financially. However – the cash part is still important – while software is given away, it is totally fair to not be taken advantage of.

Maria Aria de Reyna on diversity in tech – this is really important and applies to all our lives. The more we talk about diversity the harder it is for bigotry to take root. Everywhere. In tech, walking down the street free of harassment, going shopping without raising eyebrows, even taking a piss in the toilet that is right for you – many of these things are taken for granted by some of us, but needlessly difficult for others. Even though the FOSS4G community could do better, I contrast it with an AWS summit I went to recently, and… well… that community has more work to do than the OSgeo ecosystem.

Connor Manning on majicking with point clouds. Seriously. Majicking. There is no other word for it. In this same ecosystem, talks on PDAL by Brad Chambers and Michael Smith were also very clear, digestible and useful in my daily life. I have a long list of things to try…

Planetlabs – I went to see two of their talks – amazing to see so much done with relatively simple tools – GDAL, python, stuff literally glued together as they go. And it works! Their successful operational model is a massive, massive  thumbs up to the OSgeo ecosystem.

Matt Lammers on visualising weather in CesiumJS. This was really awesome – I tracked Matt down to talk shop, since NCI have similar data and we are always after ways to make it engaging. I have an even longer list of stuff to try…

Helena Mitasova on viewsheds, vsual complexity and other higher order products using the GRASS GIS ecosystem. I really liked how her group is working at the type of ideas that marketers or social scientists think about – I’m always way too far down in the weeds, and this talk filled me with ideas.

Claire Porter on the ArcticDEM project. I appreciated her really down to earth presentation style and obviously deep knowledge – and it’s also an amazing project!

To finish the week off,  Maria Aria de Reyna on cats. or was it metadata? metacats? But yes – the point: making data infrastructure useful for everybody. I’m on this bandwagon a lot, but nowhere near as succinctly, expertly or cat-ly. Why do all this OSgeo stuff in the first place if we can’t find and use data, right?

…and more. Tom Holderness on PetaBencana, Dan Joseph on DIY drone mapping, Hoard Butler on the epic and seminal Proj.4, and .. well .. almost everything. At a high risk of repeating myself I’m really waiting for the videos to turn up – and go through my review list – there were many talks I could not get to in person!

Also, at some point I’ll go through and add links to all the talks. When all the videos turn up…

A comment on diversity

Since I am really not an expert on diversity, and it came up in the conference with a bathroom queue tweet, I asked Claire Trenham for her two cents on diversity in tech. She was formerly my boss, and we talked a lot about basically the importance of not being an asshat. Her comments reflected a lot about existing efforts within the OSgeo community and and how changing the world can be non-obvious. To sum up her thoughts:

Code of conduct, and code of conduct again. Her worst experiences were at conference dinners – which doesn’t mean we should ban alcohol, just reinforce the code of conduct, and again. She rated the FOSS4G code of conduct highly – and had the following thought to digest on it’s application (I quote):

But, realistically, could you be sure that if the proportion of gender and racial minorities rose, they would get equal speaking time during questions? Could you be sure that none of the guys that currently seem like great guys wouldn’t subconsciously switch to mansplaining if there were more women in the room, so it goes from being a room of equals to a room containing people who need to be ‘taught’ and ‘enabled’

This is really important. From my point of view I didn’t see this happening at FOSS4G – and wonder – as an OSgeo community member and FOSS4G attendee would I be able to call any such behaviour out in a way which was respectful? Do I fall into those patterns myself? How can I do better next time? What are my internal biases, how do they manifest, how can I expose them and break them down?

Remote attendance –  some amazing potential contributors skipped just because they can’t get there, or don’t feel comfortable in a conference environment. FOSS4G does an amazing job of making content accessible. Is properly interactive remote attendance a possibility? (conference organisers, please educate here! I’m sure it’s been investigated…)

Equity of access – for people who can make the conference, what barriers exist? cost? (for example is the travel grant enough to get people from, say Africa, to the US?) childcare? would offering *good* childcare be an enabler for parents who may or may not have immediate family to help out?

Stuff to think about going forward. Having never been a conference organiser, the second two points may be really hard to implement and at some point a compromise is needed, but could we do it as a community? I’d love to hear your thoughts.

…and I really, really hope to make it to Dar Es Salaam – do I want to hear about solving African and global issues, African style? oh hell yeah I do!

Sea ice and beer

A while ago I mentioned that I would write something about sea ice. The context was a talk I gave at the 2017 Pint of Science festival in Canberra. It was really quite fun, despite being totally terrified and full of the ‘what ifs’… ten minutes before stepping in. Thanks to some great tips from a Canberra Innovation Network science communication workshop I managed to get it done.

Anyway, The aim of that talk was to give a brief sea ice overview and then show what it’s like as a scientist working in the field – how we think, how we try to solve problems and potential new techniques we see which can help.

The audience was quite impressive – they grasped the material with two hands and had insightful, interesting questions. It was heartwarming.

…but none of that is about sea ice at all. That was all about a talk! So go and read it yourself here and learn more: – it uses Reveal.js – so you can use your arrow keys to move around it either linearly (down, then right, then down) or go around how you please. Press ‘s’ to see speaker notes, I’ll finish those in the next week while I’m travelling to FOSS4G.

Most importantly, it explains the sea ice/beer cycle. This is really an under-studied earth system component and needs massive grant funding and more fieldwork. I mean it! Seriously! Without detailed knowledge of sea ice thickness gained by combining many instruments and rigorous field surveys, we will always be nervous that our beer is at risk.

You can also see a deeper dive ( a PhD thesis in 20 or so slides) here – also in Reveal.js: .

I hope those slide decks are digestible and leave you with something you didn’t know before – the sea ice/beer cycle for one; but also how sea ice gets measured and just a few of the issues surrounding how to make a realistic assessment about how much sea ice exists on the southern ocean. It’s super-complex! It’s also about the most mind blowing place I’ve ever been – we don’t need to go to Mars.. the pack ice zone is much closer and still a totally foreign world.

What a planet we live on…

Browsing MH370

Last week Geoscience Australia released a vast bathymetric survey dataset from phase one of an intensive search for the missing aircraft, flight MH370. Read the full story here.

I’ve been ruminating on the idea of treating bathymetric datasets in the same way I handle LiDAR surveys – as massive point clouds. So this dataset presented an opportunity to try some things out.

I used the Python library Siphon to extract data from NCI’s THREDDS catalogue – ending up with roughly 100gb of ASCII files on my working machinery. It was easy to see what these files contained – but they’re no good for my use case as lines of text. I had in mind dumping them all into a postgres-pointcloud database – but then, I got excited by the idea of visualising it all.

So I did.

The first step was to clean up the data. I needed to convert very nice descriptive headers into something that described the data in a less verbose way.

sed -i took care of that task. It also handled removing leading 0’s from two number longitudes. I still had ASCII data, but now I can do something with it!

Enter the Point Data Abstraction Library (PDAL). My new ASCII headers describe PDAL dimensions. My cleaned numbers left no doubt about what an extra 0 means. With a quick pipeline I turned all my data into LAS files, reprojected from lat/long to metres in EPSG:3577 – GDA94 / Australian Albers. I used this because it was the only cartesian projection which could feasibly swallow the whole region without any weirdness (for example writing out things that are in UTM44 as if they were in UTM48).

But wait – what? Why LAS?

…because it can be read by PotreeConverter. You can likely see where I’m headed now. After some wrangling to get PotreeConverter working on CentOS, I burned a bunch of RAM and CPU time to generate octree indexes of all 4ish billion bathymetry points. With some scaling fun and experimenting with just how much data I could throw in at once, I eventually rendered out and glued together visualisation. A screenshot is below. The interactive visualisation is no longer live, I’m working on a new hosting site and a better version – and will update this post when done!

It’s not perfect by any stretch, but you can browse the entire dataset in 3D in your (Chrome or Firefox) web browser. It is a Potree 1.5 viewer, and doesn’t like Safari  (yet). Given the number of individual indexes being loaded, it also sometimes overwhelms the capacity of a browser to open connection sockets. Ideally I’d have a nice basemap and some more context in there as well, but as a proof-of-concept it isn’t bad!

The whole thing is built with a mix of pretty simple freely available tools – from linux built-ins to cutting-edge webGL.

Why did I do all this, and could it be better?

I’m grateful to my employer for the time I get to spend on side projects like this, but there is a real purpose to it. When data get massive, it makes sense to shift big processing off of desktops and onto dedicated high performance compute hardware. Point cloud data are on the cusp of being workable – data as a service in the raster domain has been around for a while.

Point clouds have some special considerations, the primary one being lack of data topology. Creating visualisations like this demonstrates one way of organising data, and makes light of the traditionally difficult task of investigating entire surveys. It also makes us ask hard questions about how to store the data on disk, and how to generate products from the original datasets on demand – without storing derivatives.

For this project, it would have been great to skip the LAS creation part and render straight from an underlying binary data source to the octree used for visualisation. And then, run an on-demand product delivery (rasterisation/gridding, analytical products) from the same data store. In it’s current form this is not possible. As-is, the data are designed for users to make their own copies and then do stuff – which is limited by the size of local compute, or the size of your public cloud account budget.

What next?

Prepare for testing with a point cloud data delivery service I’m working on. Tune in to FOSS4G (August, Boston) to hear about that.

In the meantime, play with it yourself! You can obtain the data shown here for free – it is at: I used the data in ‘bathymetry processed -> clean point clouds’ (direct link). The data are also held on AWS (see the Geoscience Australia data description) if that’s easier for you. Tinker with it, have look at the viewer, see what you can come up with!

Oh, and let me know if you spot anything unusual. WebGL lets all kinds of things happen in the ocean deeps


Thanks to Geoscience Australia for releasing the dataset to the public! And thanks to the National Computational Infrastructure (NCI) for my time and the hardware used to develop this technology demonstration.

The MH370 dataset is hosted on NCI hardware. However – I used the same methods for data access anyone in the general public would (my download stats were a big blip for a while..)

EGU 2017

I’m going to Vienna next week for EGU17 – which is awesome, and I’m extraordinarily excited under my ‘I’m a badass science ninja’ veneer! I’ll be listening a lot, and talking a little bit about work I’m doing at the National Computational Infrastructure on some ideas for data services around massive, dense point data – standing in front of the poster pictured above.

Hope to see you there!

ps – I promise, I’ll write about sea ice real soon. I need to – for Pint of Science (Australia)  next month…

Drifting sea ice and 3D photogrammetry

3D photogrammetry has been a hobby horse for ages, and I’ve been really excited to watch it grow from an experimental idea [1] to a full-blown industrial tool. It took a really short time from research to production for this stuff. Agisoft Photoscan turned up in 2009 or 2010, and we all went nuts! It is cheap, super effective, and cross-platform. And then along came a bunch of others.

Back to the topic – for my PhD research I was tinkering with the method for a long time, since I had a lot of airborne imagery to play with. I started by handrolling Bundler + PMVS, and then my University acquired a Photoscan Pro license – which made my life a lot simpler!

My question at the time was: how can we apply this to sea ice? or can we at all?

The answer is yes! Some early thoughts and experiments are outlined here, and below are  some results from my doctoral thesis, using imagery captured on a 2012 research voyage (SIPEX II). Firstly, a scene overview because it looks great:

Next, stacking up elevations with in situ measurements from drill holes from a 100m drill hole line on the ice. The constant offset is a result of less-than-great heighting in the local survey – I focussed heavily on getting horizontal measurements right, at the expense of height. Lesson learned for next time!

And finally, checking that we’re looking good in 3D, using a distributed set of drill holes to validate the heights we get from photogrammetric modelling. All looks good except site 7 – which is likely a transcription error.

How did we manage all this? In 2012 I deployed a robotic total station and a farm of GPS receivers on drifting ice, and used them to make up a lagrangian reference frame (fancy word for ‘reference frame which moves with the ice’) – so we can measure everything in cartesian (XYZ) coordinates relative to the ice floe, as well as using displacement and rotation observations to translate world-coordinates to the local frame and vice-versa. Here’s a snapshot:

I don’t know if this will ever make it to publication outside my thesis – I think the method should be applied to bigger science questions rather than just saying ‘the method works and we can publish because nobody put Antarctica in the title yet’ – because we know that from other works already (see [2] for just one example).

So what science questions would I ask? Here’s a shortlist:

  • can we use this method to extract ridge shapes and orientations in detail?
  • can we differentiate between a snow dune and a ridge using image + topographic characteristics?

These are hard to answer with lower-density LiDAR – and are really important for improving models of snow depth on sea ice (eg [3]).

For most effective deployment, this work really needs to be done alongside a raft of in situ observations – previous experience with big aircraft shows that it is really hard to accurately reference moving things from a ship. That’s a story for beers 🙂



[2] Nolan, M., Larsen, C., and Sturm, M.: Mapping snow depth from manned aircraft on landscape scales at centimeter resolution using structure-from-motion photogrammetry, The Cryosphere, 9, 1445-1463, doi:10.5194/tc-9-1445-2015, 2015

[3] Steer, A., et al., Estimating small-scale snow depth and ice thickness from total freeboard for East Antarctic sea ice. Deep-Sea Res. II (2016),

Data sources


Dr Jan Lieser (University of Tasmania) instigated the project which collected the imagery used here, let me propose all kinds of wild ideas for it, and was instrumental in getting my PhD done. Dr Christopher Watson (University of Tasmania) provided invaluable advice on surveying data collection, played a massive part in my education on geodesy and surveying, and also was instrumental in getting my PhD done. Dr Petra Heil and Dr Robert Massom (Australian Antarctic Division) trusted me to run logistics, operate a brand new (never done before in the AAD program) surveying operation and collect the right data on a multi-million dollar investment.  The AAD engineering team got all the instruments talking to each other and battled aircraft certification engineers to get it all in the air. Helicopter Resources provided safe and reliable air transport for instruments and operators; the management and ship’s crew aboard RSV Aurora Australis kept everyone safe, relatively happy, and didn’t get too grumpy when I pushed the operational boundaries too far on the ice; and Walch Optics (Hobart) worked hard to make sure the total station exercise went smoothly.






The LiDAR uncertainty budget part III: data requirements

Part I of this series described the LiDAR georeferencing process (for one, 2D, single return scanner). Part II showed how point uncertainties are computed and why they are useful.

What hasn’t been covered so far is what data you need to do this stuff! Here is your shopping list:

Scanner data

  • Raw LiDAR ranges
  • Scanner mirror angles
  • Waveforms with times if you are going to decide range-to-returns for yourself
  • Intensity
  • anything else (RGB, extra bands, etc)

Essentially you need the full data package from the LiDAR including any corrections for mirror wobble or other sensor oddness – and a way to decode it. You also need engineering  drawings which show the LiDAR instrument’s reference point.

Aircraft trajectory data

  • GPS/GNSS based aircraft trajectory (positions in space)
  • Attitude data  (relationship between the aircraft’s orientation and some reference frame) normally collected by a ‘strapdown navigator’ – which is an inertial motion sensor (often called IMU), and often integrated with GNSS.

Hopefully you don’t have to do the work combining these yourself, but you need a high-temporal-resolution data stream of position (XYZ) and attitude (heading pitch roll, or omega phi kappa, or a quaternion describing rotation of the airframe relative to the earth). This needs to be as accurate as possible (post-processed dual-frequency GPS with preferably tightly-coupled processing of IMU and GPS observations).

Navigation instrument and aircraft data

  • Engineering drawings which show the navigation instrument’s reference point
  • The lever arm between the navigation instrument reference point and the LIDAR reference point, in IMU-frame-coordinates
  • The rotation matrix between the IMU and the LIDAR coordinate systems
  • If necessary, the rotation matrix describing the difference between the aircraft’s body frame and the IMU’s internal coordinate system (quite often, this gets simplified by assuming that the IMU frame and the aircraft frame are equivalent – but many operators still account for this – which is awesome!)
  • Any boresight misalignment information you can get (tiny angles representing the difference between how we think instruments were mounted and how they actually were mounted)

Of course, you could also just push your contractors to deliver point-by-point uncertainty or some per-point quality factor as part of the end product.

This series might have one more post, on how to deal with these data – or more accurately a tutorial on how not to. And then I run out of expertise – brain = dumped, and over to people with newer, more up to date experience.

The LiDAR uncertainty budget II: computing uncertainties

In part 1, we looked at one way that a LIDAR point is created. Just to recap, we have 14 parameters (for the 2D scanner used in this example) each with their own uncertainty. Now, we work out how to determine the geolocation uncertainty of our points.

First, let’s talk about what those uncertainties are. An obvious target is GPS positioning uncertainty, and this is the source that comes to mind first. Using a dual-frequency GPS gives the advantage of sub-decimetre positioning, after some post-processing is done. Merging those positions with observations from the IMU constrains those positions even further.

For a quick mental picture of how this works, consider that the navigation unit is collecting a GPS fix every half a second. The accelerometers which measure pitch, roll and heading take a sample every 1/250th of a second. They also keep track of how far they think they’ve moved since the last GPS fix – plus the rate of motion is tracked, putting a limit on how far it is possible to move between each GPS fix. The navigation unit compares the two data sources. If the GPS fix is wildly unexpected, a correction is added to GPS positions and we carry on.

So we get pretty accurate positions (~5cm).

But is that the uncertainty of our point positions? Nope. There’s more. The laser instrument comes with specifications about how accurate laser ranging is, and how accurate it’s angular encoder is.

Even more, the navigation device has specifications about how accurate it’s accelerometers are, and all of these uncertainties contribute! How?

Variance-covariance propagation to the rescue

Glennie [1] and Schaer [2] used Variance-covariance propagation to estimate the uncertainty in geolocation of LiDAR points. This sounds wildly complex, but at it’s root is a simple idea:

uncert(thing1 + thing2) = \sqrt{uncert(thing1)^2 + uncert(thing2)^2}

See that? we figured out the uncertainty of a thing made from adding thing1 and thing2, by knowing what the uncertainties of things1 and thing2 are.

Going from simple addition to a few matrix multiplications was a quantum leap, handled neatly by symbolic math toolboxes – but only after I nearly quit my PhD trying to write out the uncertainty equation by hand.

Here is the uncertainty equation for LiDAR points, whereU is the uncertainty:

U = FC_uF^t

What!! Is that all? I’m joking, right?

Nearly. F is a 14-element vector containing partial derivatives of each the parameters that go into the LiDAR georeferencing equation (the so-called Jacobians). C is a 14 x 14 matrix on which the uncertainties associated with each element occupy the diagonal.

Writing this set of equations out by hand was understandably a nightmare, but more clever folks than myself have achieved it – and while I certainly learned a lot about linear algebra in this process, I took advice from an old astronomer and used a symbolic maths toolbox to derive the Jacobians.

…which meant that the work actually got done, and I could move on with research!

Now my brain is fully exploded – what do uncertainties look like?

Glennie [1] and Schaer [2] both report that at some point, angular motion uncertainty overtakes GPS position uncertainty as the primary source of doubt about where a point is. Fortunately I found the same thing. Given the inherent noise of the system I was using, this occurred pretty quickly. In the Kalman filter which integrates GPS and IMU observations, a jittery IMU is assigned a higher uncertainty at each epoch. This makes sense, but also means that angular uncertainties need to be minimised (for example, by flying instruments in a very quiet aircraft or an electric-powered UAS)

I made the following map years ago to check that I was getting georeferencing right, and also getting uncertainty estimates working properly. 

It could be prettier, but you see how the components all behave – across-track and ‘up’ uncertainties are dominated by the angular component not far off nadir. Along-track uncertainties are more consistent across-track, because the angular measurement components (aircraft pitch and a bit of yaw) are less variable.

The sample below shows LiDAR point elevation uncertainties (relative to an ITRF08 ellipsoid) during level flight over sea ice. At instrument nadir, height uncertainty is more or less equivalent to instrument positioning uncertainty. Increasing uncertainties toward swath edges are a function of angular measurement uncertainty.

But normally LiDAR surveys come with an averaged accuracy level. Why bother?

i. why I bothered:

In a commercial survey the accuracy figure is determined mostly by comparing LiDAR points with ground control data – and the more ground control there is, the better (you have more comparison points and can make a stronger estimate of the survey accuracy).

Over sea ice this is impossible. I was also using these points as input for an empirical model which attempts to estimate sea-ice thickness. As such, I needed to also propagate uncertainties from my LiDAR points through the model to sea-ice thickness estimates.

In other words, I didn’t want to guess what the uncertainties in my thickness estimates were. As far as plausible, I need to know what the input uncertainty is for each thickness estimate is – so every single point. It’s nonsensical to suggest that every observation and therefore every thickness estimate comes with the same level of uncertainty.

Here is another map from my thesis. It’s pretty dense, but shows the process in action:

The top panel is LIDAR ‘height above sea level’ for sea ice. Orange points are ‘sea level reference’ markers, and the grey patch highlights an intensive survey plot. The second panel is uncertainty associated with each height measurment. In panel  three we see modelled sea ice thickness (I’ll write another post about that later), and the final panel shows the uncertainty associated with each thickness estimate. Thickness uncertainties are greater than LIDAR height uncertainties because we’re also accounting for uncertainties in each of the other model parameters (LIDAR elevations are just one). So, when I get to publishing sea-ice thickness estimates, I can put really well made error bars around them!

ii. why you, as a commercial surveyor or an agency contracting a LiDAR survey or a LIDAR end user should bother:

The computation of uncertainties is straightforward and quick once the initial figuring of the Jacobians is done – and these only need to be recomputed when you change your instrument configuration. HeliMap (Switzerland) do it on-the-fly and compute a survey quality metric (see Schaer et al, 2007 [2]) which allows them to repeat any ‘out of tolerance’ areas before they land. Getting an aircraft in the air is hard, and keeping it there for an extra half an hour is easy – so this capability is really useful in terms of minimising costs to both contracting agencies and surveyors. This analysis in conjunction with my earlier post on ‘where exactly to measure in LiDAR heights‘ shows you where you can assign greater confidence to a set of LiDAR points.

It’s also a great way to answer some questions – for example are surveyors flying over GCP’s at nadir, and therefore failing to meet accuracy requirements off-nadir? (this is paranoid ‘all the world is evil’ me speaking – I’m sure surveyors are aware of this stuff and collect off-nadir ground control matches as well). Are critical parts of a survey being captured off-nadir, when it would be really useful to get the best possible data over them? (this has implications for flight planning). As a surveyor, this type of thinking will give you fewer ‘go back and repeat’ jobs – and as a contracting agency, you might spend a bit more on modified flight plans, but not a lot more to get actually really great data instead of signing off and then getting grief from end users.

As an end user of LiDAR products, If you’re looking for data quality thresholds – for example ‘points with noise < N m over a flat surface’, this type of analysis will help you out. I’ve also talked to a number of end users who wonder about noisy flight overlaps, and why some data don’t appear to be well-behaved. Again, having some quality metric around each point will help an end-user determine which data are useful, and which should be left alone.


I certainly stand on the shoulders of giants here, and still incur brain-melting when I try to come at these concepts from first-principles (linear algebra is still hard for me!). However, the idea of being able to estimate in an a-priori quality metric is, in my mind, really useful.

I don’t have much contact with commercial vendors, so I can only say ‘this is a really great idea, do some maths and make your LiDAR life easer!’.

I implemented this work in MATLAB, and you can find it here:

With some good fortune it will be fully re-implemented in Python sometime this year. Feel free to clone the repository and go for it. Community efforts rock!

And again, refer to these works. I’ll see if I can find any more recent updates, and would really appreciate hearing about any recent work on this stuff:

[1] Glennie, C. (2007). Rigorous 3D error analysis of kinematic scanning LIDAR systems. Journal of Applied Geodesy, 1, 147–157. (accessed 19 January 2017)

[2] Schaer, P., Skaloud, J., Landtwing, S., & Legat, K. (2007). Accuracy estimation for laser point cloud including scanning geometry. In Mobile Mapping Symposium 2007, Padova. (accessed 19 January 2017)

The LiDAR uncertainty budget I: georeferencing points

This is part 1 of 2, explaining how uncertainties in LiDAR point geolocation can be estimated for one type of scanning system. We know LiDAR observations of elevation/range are not exact (see this post), but a critical question of much interest to LiDAR users is ‘how exact are the measurements I have’?

As an end-used of LiDAR data I get a bunch of metadata that is provided by surveyors. One of the key things I look for are the accuracy estimates. Usually these come as some uncertainty in East, North and Up measurements, in metres, relative to the spatial reference system the point measurements are expressed in. What I don’t get is any information about how these figures are arrived at, or if they apply equally to every point. It’s a pretty crude measure.

As a LiDAR maker, I was concerned with the uncertainty of each single point – particularly height – because I use these data to feed a model for estimating sea ice thickness. I also need to feed in an uncertainty – so that I can put some boundaries around how good my sea ice thickness estimate is. However, there was no way of doing so in an off the shelf software package – so I implemented the LiDAR geoferencing equations and a variance-covariance propagation method for them in MATLAB  and used these. This was a choice of convenience at the time, and I’m now slowly porting my code to Python, so that you don’t need a license to make LiDAR points and figure out their geolocation uncertainties.

My work was based on two pieces of fundamental research: Craig Glennie’s work on rigorous propagation of uncertainties in 3D [1], and Phillip Schaer’s implementation of the same equations [2]. Assuming that we have a 2D scanner, the LiDAR georeferencing equation is:

\begin{bmatrix}  x \\  y \\  z \\  \end{bmatrix}^m=  \begin{bmatrix}  X\\  Y\\  Z\\  \end{bmatrix} + R^b_m  \left[  R^b_s  \rho  \left(  \begin{gathered}  sin\Theta\\  0\\  cos\Theta  \end{gathered}  \right)  +  \begin{bmatrix}  a_x\\  a_y\\  a_z\\  \end{bmatrix}^b  \right]

The first term on the right is the GPS position of the vehicle carrying the LiDAR:

\begin{bmatrix}  X\\  Y\\  Z\\  \end{bmatrix}

The next term is made up of a few things. Here we have points in LiDAR scanner coordinates:

\rho\left(  \begin{gathered}  sin\Theta\\  0\\  cos\Theta  \end{gathered}  \right)

…which means ‘range from scanner to target’ (\rho ) multiplied by sin\theta to give an X coordinate and cos\theta to give a Z coordinate of the point measured.

Note that there is no Y coordinate! This is a 2D scanner, observing an X axis (across track) and a Z axis (from ground to scanner). The Y coordinate is provided by the forward motion of a vehicle, in this case a helicopter.

For a 3D scanner, or a canner with an elliptical scan pattern, there will be additional terms describing where a point lies in the LiDAR frame. Whatever term is used at this point, the product is the position of a reflection-causing object in the LiDAR instrument coordinate system which is rotated to the coordinate system of the vehicle’s navigation device using the matrix:


The point observed also has a lever arm offset added (the distance in 3 axes between the navigation device’s reference point and the LiDAR’s reference point), so we pretend we’re putting our navigation device exactly on the LiDAR instrument reference point:

\begin{bmatrix}  a_x\\  a_y\\  a_z\\  \end{bmatrix}^b

This mess of terms is finally rotated to a mapping frame using euler angles in three axes (essentially heading, pitch, roll) recorded by the navigation device:


…and added to the GPS coordinates of the vehicle (which are really the GPS coordinates of the navigation system’s reference point).

There are bunch of terms there – 14 separate parameters which go into producing a LiDAR point, and that’s neglecting beam divergence, and only showing single returns. Sounds crazy – but the computation is actually pretty efficient.

Here’s a cute diagram of the scanner system I was using – made from a 3D laser scan and some engineering drawings. How’s that? Using a 3D scanner to measure a 2D scanner. Even better, the scan was done on the helipad of a ship in the East Antarctic pack ice zone!

You can see there the relationships I’ve described above. The red box is our navigation device – a dual-GPS, three-axis-IMU strapdown navigator, which provides us with the relationship between the aircraft body and the world. The green cylinder is the LiDAR, which provides us ranges and angles in its own coordinate system. The offset between them is the lever arm, and the orientation difference between the axes of the two instruments is the boresight matrix.

Now consider that each of those parameters from each of those instruments and the relationships between them have some uncertainty associated with them, which contributes to the overall uncertainty about the geolocation of a given liDAR point.

Mind warped yet? Mine too. We’re all exhausted from numbers now, so part 2 will examine how we take all of that stuff and determine, for every point, a geolocation uncertainty.

Feel free to ask questions, suggest corrections, or suggest better ways to clarify some of the points here.

There’s some code implementing this equation here: – it’s Apache 2.0 licensed so feel free to fork the code and make pull requests to get it all working, robust and community-driven!

Meanwhile, read these excellent resources:

[1] Glennie, C. (2007). Rigorous 3D error analysis of kinematic scanning LIDAR systems. Journal of Applied Geodesy, 1, 147–157. (accessed 19 January 2017)

[2] Schaer, P., Skaloud, J., Landtwing, S., & Legat, K. (2007). Accuracy estimation for laser point cloud including scanning geometry. In Mobile Mapping Symposium 2007, Padova. (accessed 19 January 2017)