Tag Archives: python


An interesting github repo to keep an eye on – a way to run a private LLM instance and train it on a local document repository – with no internet connection required (once it’s installed!). It uses GPT4all, LlamaCpp, LangChain and a local vector database (not Pinecone, I assume).


I can imagine lots of use cases for this: training upon your own files could enable a useful personal assistant that is ~private~. Imagine it as a memory assistant for all those millions of forgotten files on your computer.

Or as a Zotero assistant for working through academic papers and making connections.

Or as a way of making searchable documentation of software – one could scrape all the Unreal Engine documentation using beautifulsoup and then train on it.

Or for an Arts organisation (like ANAT!) – train on years of office data to create a kind of corporate memory.

Of course, it would still have the hallucination problem, but this could be mitigated by combining it with some sort of web search or other confirmation mechanism.


Visualising Satellite data using Google Colab

Having spent a few hours reading documentation and having an ongoing conversation with chatGPT, I’m getting the hang of the hdf5 file structure and can now visualise some multispectral data in Google Colab:

from google.colab import drive

import h5py
import numpy as np
import matplotlib.pyplot as plt

# Get the HDF5 file from your Google Drive
file_id = '/content/drive/MyDrive/DATA/file_name.he5'

with h5py.File(file_id, "r") as f:
# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]

# Get the data
data = list(f[a_group_key])

#This gives us some idea about the groups/keys in the hdf file and some idea about the datasets contained therein - but will become more detailed as we go along

# Open the HDF5 file
with h5py.File(file_id, 'r') as f:
# Open the data field
#currently this is hard-coded as I know from hdfView that this is the path I want to look at - but really we want to find this programmatically.
data_field = f['/path_to/Data Fields/Cube']
# Print the shape of the data field
print(f'Shape: {data_field.attrs}')
print(f'Shape: {data_field.dtype}')
print(f'Shape: {data_field.shape}')
#This gives us some idea about the data cube we are examining - such as its attributes, data type and shape (typically rows and columns) - it'll print them to output

# Open the HDF5 file
with h5py.File(file_id, 'r') as f:
# Open the data field
data_field = f['/path_to/Data Fields/Cube']
# Get the data and reshape it to 2D
data = np.array(data_field[:]).reshape(data_field.shape[0], data_field.shape[1])
# Scale the data to fit within an 800x800 pixels
data = np.uint8(np.interp(data, (data.min(), data.max()), (0, 255)))
# Create a figure with the specified size
fig = plt.figure(figsize=(8, 8))
# Add the data to the figure
plt.imshow(data, cmap='viridis')
# Display the figure

Satellite data plotted with viridis colourmap

Next steps involve developing a way of iteratively traversing the hdf5 directory structure, so that I can identify relevant data fields within the file – they’re not explicitly identified as ‘image files’. This can be done using h5py functions. Another thing to explore is GDAL: once I’ve identified the correct data in geolocation fields, it should become possible to output geotiffs or UE-friendly png files with geolocation metadata.

Here’s some other matplotlib colourmaps applied to the same dataset.

It’s all pretty crude at this point – just figuring out how this stuff might work.

ChatGPT, Google Colab and Hyperspectral Data

It occurs to me that, rather than going to all the hassle of figuring out how to write all the code for colab myself, that I can use my chatGPT assistant to do it for me:

Q: write a script for google colab to extract images from an hdf5 file located upon my google drive

Chat GPT:

Here is an example script that demonstrates how to extract images from an HDF5 file located on your Google Drive using the h5py library in Python, and the PyDrive library to interact with Google Drive:

# Install the PyDrive library
!pip install -U PyDrive

# Import the necessary libraries
import h5py
import numpy as np
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Get the HDF5 file from your Google Drive
file_id = '<YOUR FILE ID>'
downloaded = drive.CreateFile({'id': file_id})

# Open the HDF5 file
with h5py.File('hyperspectral_image_data.hdf5', 'r') as hdf5_file:
# Access the dataset containing the images
images = hdf5_file['images']
# Extract an image by its index
image = np.array(images[0])

Make sure to replace <YOUR FILE ID> with the actual file ID of the HDF5 file on your Google Drive. You can find the file ID by opening the file in Google Drive and looking at the URL – the file ID is the string of letters and numbers after the last forward slash.

You can also use the gdal_translate command with the PyDrive library as well, just add the following lines of code to download the file and then use gdal_translate command to convert to tiff or geotiff

# Get the HDF5 file from your Google Drive
file_id = '<YOUR FILE ID>'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('hyperspectral_image_data.hdf ')#Use gdal_translate
!gdal_translate -of GTiff HDF5:"hyperspectral_image_data.hdf5"://dataset output.tif

You can then use any image processing library to open and read the image.

Lots of interesting things to unpack in this advice – and lots of errors as I expect to debug the code.

So, as a first step, let’s see if we can just get access to the file and some basic metadata.

# Install the PyDrive library
!pip install -U PyDrive
# Import the necessary libraries
import h5py
import numpy as np
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Get the HDF5 file from your Google Drive
file_id = '/content/drive/MyDrive/DATA/XXXXXXX.he5'
with h5py.File(file_id, "r") as f:
    # List all groups
    print("Keys: %s" % f.keys())
    a_group_key = list(f.keys())[0]
    # Get the data
    data = list(f[a_group_key])


Success! We can see the HDF keys returned below:

Keys: <KeysViewHDF5 ['HDFEOS', 'HDFEOS INFORMATION', 'Info', 'KDP_AUX']>

These match the top-level directory structure revealed by HDFView:

This is the first step in identifying the file contents, before we can drill down into the file structure to identify a whole bunch of parameters about the datacube that we can do interesting things with 🙂

Working with ChatGPT & Hyperspectral Data

Thus far I have been playing around with HDFView to extract hdf5 image data – but I am very interested in working with advanced machine learning systems and developing processes to interface them with Unreal Engine and on-line data sources. It seems the most tractable way to do this is through the use of a Python API – which is pretty much the lingua franca of scientific computing.

As I am a neophyte with using hyperspectral image data, let’s use ChatGPT to find out what might be an ‘easy’ or ‘best’ way to access image data within an hdf5 file:

Q: what is the best way to extract images from an hdf5 file containing hyperspectral image data?


One way to extract images from an HDF5 file containing hyperspectral image data is to use the h5py library in Python. This library allows you to open the HDF5 file and access the data stored within it, including the images. You can then use the numpy library to manipulate and extract the images as needed. Here is an example of how you might extract an image from an HDF5 file using h5py:

import h5py
import numpy as np
# Open the HDF5 file
with h5py.File(‘hyperspectral_image_data.hdf5’, ‘r’) as hdf5_file:
# Access the dataset containing the images
images = hdf5_file[‘images’]
# Extract an image by its index
image = np.array(images[0])

Another way is to use the gdal library in python, specifically with the gdal_translate utility, which can convert HDF5 to other raster formats like tiff or geotiff, and then use any image processing library to open and read the image.

gdal_translate -of GTiff HDF5:"file.hdf5"://dataset output.tif

In this way you will be able to use any library of your preference to read and extract the images.

I have no idea currently whether this will work, so lets look up the relevant documentation:

Both look promising – and has saved me a heap of time looking for ways to do it!

The first outputs images as a numPy array – meaning that we can examine/export each image by its index – which would be useful for selecting for certain  λ (wavelength)) values and conducting operations upon them.

The second uses GDAL (Geospatial Data Abstraction Library), which provides powerful utilities for the translation of geospatial metadata – enabling correct geolocation of the hyperspectral image data, for instance.

So perhaps a combination of both will be useful as we proceed.

But of course, any code generated by chatGPT or OpenAI Codex or other AI assistants must be taken with several grains of salt. For instance – a recent study by MIT shows that users may write more insecure code when working with an AI code assistant (https://doi.org/10.48550/arXiv.2211.03622). Perhaps there are whole API calls and phrases that are hallucinated? I simply don’t know at this stage.

So – my next step will be to fire up a python environment – probably Google Colab or Anaconda and see what happens.

A nice overview of Codex here:

OpenAI Codex Live Demo

No Description

Scripting the Unreal Editor Using Python

An important aspect of our visualisation pipeline involves chaining together Unreal Engine and Python workflows.

The UE 5.1 Python introductory documentation lists some common examples of what might be done:

  • Construct larger-scale asset management pipelines or workflows that tie the Unreal Editor to other 3D applications that you use in your organization.
  • Automate time-consuming Asset management tasks in the Unreal Editor, like generating Levels of Detail (LODs) for Static Meshes.
  • Procedurally lay out content in a Level.
  • Control the Unreal Editor from UIs that you create yourself in Python.

Extensive UE Python API documentation is listed here.

A useful introductory video on Youtube: