Mirror, Mirror on the Wall: Working with AI

Mirror, mirror on the wall – imaged generated by Stable Diffusion. Image credit: Peter Morse

It’s important when working with the current generation of AI systems (principally Large Language Models or LLMs) to approach them with caution and circumspection. The reason for this is essentially human fallibility – both by the end-user and by the system itself.

The computer science behind generative AI is moving so fast that innovations emerge every month. How researchers choose to use them will dictate their, and our, future.

(Stokel-Walker and Van Noorden, 2023: 214-16)

The system I’m currently using – ChatGPT – is genuinely revolutionary in terms of how helpful it is for programming assistance – it has saved me hours and hours of time working out some approaches I can develop to programming problems. The underpinning technology is, in fact, several years old, but the way it has been implemented and its availability is what sets it apart (expect many more of these sorts of systems over the next few years).

But it has also hallucinated things that simply don’t exist – and presented them in a convincing matter-of-fact way:

Q: How can I import a multispectral geotiff into Unreal Engine 5 and break it into different layers for selected wavebands all correctly geolocated?

ChatGPT: To import a multispectral GeoTIFF into Unreal Engine 5 and break it into different layers for selected wavebands, you will need to use a plugin or script that is capable of handling multispectral imagery. One example is the “GeoTIFF Import” plugin, which is available on the Unreal Engine Marketplace. This plugin allows you to import a GeoTIFF file and separate it into different layers for each band.

Once you’ve imported the GeoTIFF, you can use the “Landscape” tool to create a terrain that corresponds to the image. You can then use the “Landscape Layer Blend” node in the material editor to control the blending between the different layers of the image.

To correctly geolocate the image, you will need to use the “World Aligned Blocking” feature, which allows you to align the image with the terrain using GPS coordinates. In the Unreal Engine, you can use the “World Settings” panel to set the GPS coordinates for your image.

Alternatively, you can use Python scripts that are designed to handle the import of multispectral GeoTIFFs and separate them into different layers, such as rasterio and gdal libraries. You can then use these layers as textures in the Unreal Engine to create a 3D representation of the multispectral data with accurate geolocation.

I’ve highlighted the plausible falsehoods – for instance, there is no GeoTIFF import plugin for Unreal Engine in the UE Marketplace. There are off-the-shelf solutions for this such as Landscaping, Cesium and Terraform Pro (amongst others out there on the interwebs) – and it would be great if UE could directly import GeoTIFFs via a native plugin. But there isn’t – instead ChatGPT hallucinates a solution. It sounds plausible – just like using a ‘World Aligned Blocking’ feature, or simply using the ‘World Settings’ panel to set GPS coordinates.

Essentially the system is confabulating. Large Language Models are pejoratively referred to as ‘stochastic parrots’ – they act basically as ‘token prediction’ machines, where there is no awareness of on-going context within a given text generated by the system, and certainly no concept of inference or common-sense. Long passages of text or programming are simply realisations of the likelihood of these generated token streams on a larger scale than individual words**, articulated within the interpretive reception of the ‘user’ that may perceive ‘seeming’ coherence:

We say seemingly coherent because coherence is in fact in the eye of the beholder. Our human understanding of coherence derives from our ability to recognize interlocutors’ beliefs [30, 31] and intentions [23, 33] within context [32]. That is, human language use takes place between individuals who share common ground and are mutually aware of that sharing (and its extent), who have communicative intents which they use language to convey, and who model each others’ mental states as they communicate. As such, human communication relies on the interpretation of implicit meaning conveyed between individuals….

Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do [89, 140]. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model).

(Bender and Gebru, 2021:616)

Nevertheless, even with these caveats, the system provides a valuable and useful distillation of a hugely broad-range of knowledge, and can present it to the end user in an actionable way. This has been demonstrated by my use of it in exploring approaches toward Python programming for the manipulation of GIS data. It has been a kind of dialogue – as it has provided useful suggestions, clarified the steps taken in the programming examples it has supplied, and helped me correct processes that do not work.

But it is not a dialogue with an agent – seeming more akin to a revealing mirror, or a complex echo, from which I can bounce back and forth ideas, attempting to discern a truth for my questions. This brings with it a variety of risks, depending upon the context and domain in which it is applied:

The fundamental problem is that GPT-3 learned about language from the Internet: Its massive training dataset included not just news articles, Wikipedia entries, and online books, but also every unsavory discussion on Reddit and other sites. From that morass of verbiage—both upstanding and unsavory—it drew 175 billion parameters that define its language. As Prabhu puts it: “These things it’s saying, they’re not coming out of a vacuum. It’s holding up a mirror.” Whatever GPT-3’s failings, it learned them from humans.

(Strickland, 2021)

Moving beyond this current state, the path to ‘true’ AI, human-level AI, AGI (Artificial General Intelligence) and ASI (Artificial Super-Intelligence), may be shortish (20 years) or longish (50 years) – but given the current pace of development, my impression is that it will be measured in decades, not centuries. Domain experts have already mapped out research programs that encompass many of the conceptual and scientific breakthroughs that need to be made for this to occur (Hutter, 2005; LeCun, 2022), neatly adumbrated by Friston et al. (2022):

Academic research as well as popular media often depict both AGI and ASI as singular and monolithic AI systems, akin to super-intelligent, human individuals. However, intelligence is ubiquitous in natural systems—and generally looks very different from this. Physically complex, expressive systems, such as human beings, are uniquely capable of feats like explicit symbolic communication or mathematical reasoning. But these paradigmatic manifestations of intelligence exist along with, and emerge from, many simpler forms of intelligence found throughout the animal kingdom, as well as less overt forms of intelligence that pervade nature. (p.4)

…AGI and ASI will emerge from the interaction of intelligences networked into a hyper-spatial web or ecosystem of natural and artificial intelligence. We have proposed active inference as a technology uniquely suited to the collaborative design of an ecosystem of natural and synthetic sensemaking, in which humans are integral participants—what we call shared intelligence. The Bayesian mechanics of intelligent systems that follows from active inference led us to define intelligence operationally, as the accumulation of evidence for an agent’s generative model of their sensed world—also known as self-evidencing. (p.19)

In the meantime, it is the role of the human interlocutor to establish the inferential framework with which we work with these systems. It is remarkable that what until recently seemed like science-fictional concepts are now available for use.

A critical awareness of machine learning and machine intelligence capabilities seems to me to be a prudent mindset to develop for any engagement with technology that interfaces with Earth observation systems – indeed, any observational system, because it is up to us human beings to develop frameworks for designing goals for these systems, and developing the capacity to interrogate and understand them in accessible ways, discern objective and/or consensual truth and to deploy them for good.

For argument’s sake – here’s some hallucinated images of bushfires taken from a satellite, created using Stable Diffusion 1.5. Who’s to say they aren’t real images of real places? How would you be able to tell?

Notes:

**This is a supposition I have made that may or may not be correct (I don’t know, so I am rephrasing this/correcting this as my knowledge increases). Tokens are word fragments or components, and, apparently (see next posts) each is added singly on a stochastic basis – but presumably the probabilistic value is informed by larger-scale probabilistic structures than individual words in sequence. There must be syntagmatic and paradigmatic values at play.

References:

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. Virtual Event Canada: ACM, 2021. https://doi.org/10.1145/3442188.3445922.

Friston, Karl J, Maxwell J D Ramstead, Alex B Kiefer, Alexander Tschantz, Christopher L Buckley, Mahault Albarracin, Riddhi J Pitliya, et al. “Designing Ecosystems of Intelligence from First Principles,” December 2022. https://doi.org/10.48550/arXiv.2212.01354.

Hutter, Marcus. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Germany: Springer, 2005.

LeCun, Yann. “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27,” July 2022. https://openreview.net/forum?id=BZ5a1r-kVsf&fbclid=IwAR2lFOl7Uy0lyfYbOya3sExmULL9uZopKuPbSJuNBqLYB2G9aob4j32fxqQ.

Stokel-Walker, Chris, and Richard Van Noorden. “What ChatGPT and Generative AI Mean for Science.” Nature 614, no. 7947 (February 6, 2023): 214–16. https://doi.org/10.1038/d41586-023-00340-6.

Strickland, Eliza. “OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language).” Accessed February 7, 2023. https://spectrum.ieee.org/open-ais-powerful-text-generating-tool-is-ready-for-business.

Peter Morse 2022-2023

Mirror, Mirror on the Wall: Working with AI

Leave a Reply Cancel reply

Creative Research Journal