Colouring point clouds with PDAL

Late last year I published a couple of full-RGB lidar datasets made from open data. While both had plenty of ways to generate false-colour interpretive insights (classification, elevation, intensity, flight line), seeing the data in ‘real colour’ lets us identify with it more easily. It helps us find both analytical insight and context.

As an example, the next image is a colourised version of the title shot. Which do you find easier to ‘get’?

The points shown here didn’t come with colour attached, I added it from near-coincident airborne imagery, using PDAL’s filters.colorization method. The short version is this:

Given a raster with band values in cells, and a lidar point cloud in the same coordinate system, filters.colorization searches the XY coordinates of each point in the raster, and applies a configurable set of colour bands it finds to the point.

Importantly, filters.colorization does not raytrace – you can’t currently intersect off-nadir imagery with, say, building walls. It assumes top down points and top down imagery. We’d need to know a lot more about the relative orientations of the camera and the lidar to do anything else.

It also assumes that your imagery is also orthorectified – that is, warped such that it is a ‘top down’ view of the world, rather than oblique. Things like building sides mess things up a bit – you might find that points you think should be on top of a structure are coloured as (for example) wall features. I’ve seen this a lot with satellite-based imagery applied to point cloud data.

With all that in mind, let’s press on and take a look at colourisation methods. I found two different scenarios to work with – tiled images, tiled lidar; and mosaicked imagery, tiled lidar. Here’s how I approached both.

Scenario A: corresponding imagery and tiles

For the ACT dataset, both lidar and orthophotos were already chopped into corresponding 2 kilometre by 2 kilometre tiles In this case, the first task was to construct a pipeline and save it as colorizationpipeline.json:

    "pipeline": [
            "type": "filters.colorization",
            "raster": “input-raster.tif"
            "type": "filters.range",
            "limits": "Red[1:]"
            "type": "writers.las",
            "minor_version": "2",
            "dataformat_id": "3",

Next, identify common patterns in tile naming, and use those to construct a script which looped over all the lidar tiles, and all the imagery, applying colours from the appropriate imagery tile to the lidar.


for f in ./Orthometric/*.LAS
    #extract a file name
    file=$(basename "$f")

    echo $file
    #extract the parts which help us match lidar and image tiles

    echo $tileindex
    #run pdal
    pdal pipeline colorizationpipeline.json \
              —readers.las.filename="./Orthometric/$file” \
              —filters.colorization.raster="./2kmx2km_RGB_TIFF_Tiles/ACT2015-RGB-0_2_$tileindex.tif” \


…run the script, grab a coffee, and wait. Note also, this task is embarrassingly parallel, as in you could colourise as many tiles as you have processing cores at once, provided you have enough memory-per-core to fit the uncompressed lidar tile and raster.

You can explore it all interactively here:, with a screenshot below..

Scenario B: lidar tiles and one giant raster

In Australia, we have competitions to see who can make the most gigantic mosaic images containing the biggest number of pixels. It’s part of a national obsession with biggestness. The result is that sometimes we end up with huge rasters instead of tilesets. For the Murray Darling Basin, I ended up with a 90GB ECW file which I converted using Klokantech’s GDAL-with-everything docker image into an even larger GeoTIFF.

My PDAL pipeline didn’t change much – but my raster processing approach did. In this case, because I didn’t have hundreds of gigabytes of RAM to consume I went for a subsetting approach.

I clipped segments from the raster using PDAL tile boundaries and GDAL magic, wrote out little geotiffs, used them to colourise points, and deleted them. If I were to do it again I’d use either GDAL’s VRT virtual file system – or potentially wrap the whole thing in Python and use in-memory rasters.

Here’s the shell script for chipping up the massive mosaic, and colourising tiles:


# loop over all my laz files (this should be parallel, tsk)
for i in $lazdir/*.laz;
     #collect a filename
     thefile=$(basename $i .laz)

     #print it’s name - as a visual check
     echo $thefile

     #collect a pdal boundingbox and xmin/xmax/ymin/ymax
     # note that the bbox limits are padded by 10m, to ensure that
     # the raster fully covers all lidar points.

     pdal info --summary $i > pdalinfotemp.json
     minx=$(cat pdalinfotemp.json | jq '.summary.bounds.minx')
     minxb=$(echo $minx-10 | bc)
     maxx=$(cat pdalinfotemp.json | jq '.summary.bounds.maxx')
     maxxb=$(echo $maxx+10 | bc)
     miny=$(cat pdalinfotemp.json | jq '.summary.bounds.miny')
     minyb=$(echo $miny-10 | bc)
     maxy=$(cat pdalinfotemp.json | jq '.summary.bounds.maxy')
     maxyb=$(echo $maxy+10 | bc)
     #remove the json info file
     rm ./pdalinfotemp.json

     #print out minmax
     echo $minxb $minyb $maxxb $maxyb
     gdalwarp -te $minxb $minyb $maxxb $maxyb $bigmosaic tiles1km/$thefile.tif
     #flags for SRS translation if we want to do that
     #-s_srs EPSG:7854 -t_srs EPSG:28354

     # for each of our cut out images, colourise some points!
     pdal pipeline colour-mdb.json --readers.las.filename=$i \
                                   --filters.colorization.raster=$thefile.tif \

    # clean up the image chips, if we don’t need them anymore
    rm $thefile.tif


…and here’s the PDAL pipeline:

    "pipeline": [
            "type": "filters.colorization",
            "raster": "inraster.tiff"
            "type": "filters.range",
            "limits": "Red[1:]"
            "type": "writers.las",
            "compression": "true",
            "minor_version": "4",

You can explore the result interactively here:


Making colour images is awesome, we understand them a lot more easily than black and white or false colour things (note – in both the datasets here you can still use tagged colours, like classification or intensity to inspect the data. Adding ‘real colour’ gives us context, and it’s pretty!

…and it ain’t that hard – both these datasets contain about 30 billion points each, and collectively represent over 9 000 square kilometres of land. It should be way harder to process them like this, right?

Nope. Not with the right tools and a bit of code-fu.

As always, much gratitude to the team behind PDAL and Entwine for making this possible – and the open source geospatial vision which creates the motivation to do so.

The sales pitch

Spatialised is a fully independent consulting business. The tutorials and write-ups here are free for you to use, without ads or tracking.

If you find the content here useful to your business or research or billion dollar startup idea, you can support production of ideas and open source geo-recipes via Paypal or Patreon; or hire me to do stuff; or hire me to talk about stuff; or buy stuff from the store; or just give me a seat on your advisory board and a 1% stake.