Tag Archives: AI

NVIDIA Omniverse or Unreal Engine 5?

Throughout this residency I’ve been working with two development platforms to try and evaluate which is the ‘best'(easiest?optimal?) workflow for the ideas we’ve been exploring. Even at this penultimate stage, things are unclear, so I thought I’d unpack some of the similarities, differences and interactions between the approaches.

Here’s two takes (after a few unsatisfactory tries)  from ChatGPT 4 and Google Bard, using the prompt:

“You are an expert computer artist and programmer. Explain what are the differences and similarities between NVIDIA Omniverse platform and Unreal Engine 5? Create a list of 5 (or so) dot points that clearly indicate when each approach would be the optimal one.”

They’re quite useful – generic (as expected) – and indeterminate because there is so much replication, cross-over and interaction of features.

<aside>reading these responses, I suppose this inevitably speeds the end of some national, regional, alternative and idiomatic versions of English – “ise/ize”, “colour/color”, “mould/mold” etc. </aside>

It seems to come down to a few points that I’ll detail at the end.


GPT4:

As a computer artist and programmer, you are looking at a blend of aesthetics, functionality, and the ability to customize and extend the features of a platform. As of my knowledge cutoff in September 2021, both Unreal Engine 5 and NVIDIA Omniverse have features that will appeal to artists and programmers, albeit with different strengths.

**Unreal Engine 5:**

1. **Game Development**: Unreal Engine 5 is built primarily for the development of games, with a wealth of tools for creating, scripting, and animating game worlds.

2. **Blueprint and C++**: UE5 provides both visual scripting through Blueprint and full source code access with C++, giving you a range of options for scripting and customizing your projects.

3. **Nanite and Lumen**: Nanite allows you to import high-fidelity, cinema-quality art assets directly into the engine, while Lumen provides dynamic, real-time global illumination.

4. **Multi-platform Deployment**: UE5 supports a wide range of platforms, making it ideal for game and application development across devices.

5. **Community and Resources**: The Unreal Engine has a large and active community, which means extensive resources, tutorials, and support.

**Optimal for**: Artists and programmers focused on game development or creating other interactive, real-time experiences. If you need access to source code or want a large community for support, Unreal Engine is a strong choice.

**NVIDIA Omniverse:**

1. **Collaboration and Interoperability**: Omniverse is designed as a platform for collaboration, allowing multiple users to work together in real-time across different applications, thanks to its use of USD (Universal Scene Description).

2. **Physically Accurate Rendering**: Omniverse provides highly accurate, real-time ray tracing rendering with the NVIDIA RTX platform, which is ideal for projects requiring high-quality visualization.

3. **AI Integration**: Omniverse leverages NVIDIA’s AI technologies, offering unique possibilities for AI-enhanced simulation, animation, and more.

4. **Infrastructure Management**: Omniverse can be used to visualize and manage data center infrastructure, an application unique to NVIDIA’s platform.

5. **Integration with other NVIDIA tools**: Omniverse is designed to work seamlessly with other NVIDIA tools, such as Deep Learning Super Sampling (DLSS), enhancing the platform’s capabilities.

**Optimal for**: Collaborative projects requiring real-time contributions across different applications, projects requiring high-fidelity, physically accurate rendering, and tasks involving AI-enhanced graphics or simulation.

Please keep in mind that both platforms continue to evolve and have likely added new features or improvements since my last training data in September 2021.


BARD:

Sure. Here are the similarities and differences between NVIDIA Omniverse platform and Unreal Engine 5:

Similarities

  • Both are powerful 3D creation tools.
  • Both can be used to create realistic and immersive 3D experiences.
  • Both support a wide range of features, including 3D modeling, animation, rendering, and simulation.
  • Both are used by a wide range of professionals, including game developers, architects, and filmmakers.

Differences

  • NVIDIA Omniverse platform is a more collaborative platform, designed to help teams work together on 3D projects.
  • Unreal Engine 5 is a more game-oriented platform, designed to create high-performance, real-time 3D experiences.
  • NVIDIA Omniverse platform is currently only available for Windows, while Unreal Engine 5 is available for Windows, macOS, and Linux.

Here are 5 dot points that clearly indicate when each approach would be the optimal one:

  • NVIDIA Omniverse platform is the optimal choice for:
    • Collaborative 3D projects: The platform’s real-time collaboration features make it easy for teams to work together on 3D projects, regardless of their location.
    • Large-scale projects: The platform’s scalability makes it well-suited for large-scale projects that require multiple users and high-performance computing.
    • Complex projects: The platform’s powerful tools and features make it well-suited for complex projects that require a high level of detail and realism.
  • Unreal Engine 5 is the optimal choice for:
    • Game development: The platform’s real-time rendering and performance make it well-suited for developing high-performance, visually stunning games.
    • Virtual reality and augmented reality: The platform’s support for VR and AR makes it well-suited for developing immersive experiences.
    • Filmmaking: The platform’s powerful tools and features make it well-suited for creating realistic and immersive visual effects for films.

I hope this helps!


It does indeed – thanks Machine Intelligence! 🙂

The main differences are subtle and strategic, and perhaps more effectively understood in the context of industrial competition and the territorialisation of artistic and developer activity – driven by business. It’s basically technological evolution driven by capitalism,  not social good or woo-woo philosophical outcomes. Was it ever not thus? Well, yes – opensource provides an important alternative model, but it has significant infrastructural constraints.

Omniverse provides – at a basic level – an incredible toolset that enables the development of bespoke applications for realtime visualisation and simulation. The workflow is quite different to UE, but it provides access to all its features in Python (above) and C++ (below)- and a graph visual language that is easy to use, including standardised UI features. So it is nice and fast, even if it’s a bit crashy at this stage. It’s pretty compelling and I am interested to learn more, especially as it develops in the direction of UI’s for XR streaming across multiple platforms and the integration of AI and physics simulations (e.g, via Paraview). It’s very modular – reminiscent of Opendoc – but not open! Forgotten dreams of a better world. Really, Omniverse could be access-controlled opensource, like Unreal is. And much more free (not only in $$, but in principle).

Of course, all these systems ~could~ be like a better improved version of Opendoc. But $$$ – it’s in their interests, currently, to be non-interoperable.

Unreal Engine is more mature – and much more stable, in my experience. But far more monolithic – perhaps it could be modularised. There is an emergent UI Library, but UE UI (UMG) seems counter-intuitive to me – it’s complex to get your head around (I understand why it is as it is, because of C++ legacy, but feel it needn’t be this way – UI could be an intuitive MVC plastic layer, not casting, binding and widgets).

Cesium runs both in Omniverse and UE – is it possible to create an Omniverse USD scene using Cesium and pipe that into UE  using an Omniverse connector or is it best just to use the UE Cesium plugin?

An interesting Omniverse/UE co-simulation here.

It would be nice if UE had ~easy~ realtime server capabilities like Omniverse, without all the asset issues of version control with Git, LFS, Bitbucket, Github, Plastic, Perforce, Subversion etc cross-platform. This is something I need to investigate to find an optimal solution for our use-case. Of course, everything is complicated in a non-commercial research context.

A useful talk comparing the two approaches can be seen here (registration may be required)

https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s42162/

Omer Shapira, a Senior Omniverse Engineer from NVIDIA discusses: “Learn about designing and programming spatial computing features in Omniverse. We’ll discuss Omniverse’s real-time ray-traced extended reality (XR) renderer, Omniverse’s Autograph system for visual programming, and show how customers are using Omniverse’s XR tools to do everything together — from design reviews to hangouts.”

A useful diagram from the PDF slides:

It provides an insightful abstract overview of the relationships between data sources, pipelines and developer/artist/user activity – and, of course, computers. Somewhere there must be cloudy-bits and AI that will soon complicate the picture.

I must try and find the abstract view from an Unreal Engine engineer somewhere – I’m sure the diagram would be different. Nevertheless, the territorial play is the same – a stack from hardware engineering, through software engineering, to ‘experiences’ is basically the monopoly. It comes down to engineering choices – the physical constraints, scientific research, economics and politics  of engineering. That defines everything. But it is driven by human desire, a seeming-paradox, top-down but bottom-up.

USD (Universal Scene Description) seems to be on track to be the lingua franca of the metaverse/omniverse/whatevs. A universal file format is an incredibly important idea – not only because there are literally thousands of incompatible, difficult to translate formats, but because of obsolescence. Like human language, digital ‘file language’ evolves and changes; it is currently much more fragile. A significant solar storm might wipe out AI and all digital knowledge. Biology might not be fried in the deep ocean. We’ve had a few billion years of experience.

Nevertheless, the Hyperscale/Game Engine counterpoint is instructive. From my research into Cloud XR, either via Google Immersive Stream or Azure Cloud XR or Amazon etc. – the problem at scale is simply that you need lots and lots of individual virtual machines to run instances of a ‘world’ and lots of low-latency network traffic to synchronise apparent time (with a sprinkling of predictive AI for the ~20ms perceptual lag barrier).

Strangely, this sounds like an ecosystem.

This is clearly a problem that Omniverse tries to address, but won’t work until it is much more open. UE may beat it via the Omniverse of the expanded Fortnight platform, if they can colonise the hardware and mindware at speed. Or they may collaborate or parasitise each other – hard to tell. Are the metaphors appropriate? There are, of course, many other potential players in this space, even, I expect, disruptors or disruptive technologies that have yet to emerge from someone’s loungeroom.

No doubt agentive AGI systems would approach this in entirely different ways, given their own interests.

 

Safe and responsible AI in Australia: discussion paper

Illustration of Supporting responsible AI: discussion paper
Supporting responsible AI: discussion paper. SD/AI Image by Peter Morse

In case you have found it impossible to find via either the ABC news article or Industry and Science Minister Ed Husic’s  media release page, or via the website of the Government of Australia here’s the actual Safe and responsible AI in Australia discussion paper:

https://consult.industry.gov.au/supporting-responsible-ai

Submissions can be made until July 23rd 2023.

Alternative link to pdf:

https://apo.org.au/node/322938

and another:

Australia’s AI Action Plan

https://apo.org.au/node/318137

Analysis and Policy Observatory (APO or Australian Policy Online) – which looks like a credible resource as far as I can see (WHOIS)(Registrant)- has a bunch of useful policy documents on it – and it’s searchable!

Bit of a discovery failure from the Australian Government website. How would you know where to look? Clearly they need an AI assistant such as SwiftSage (something similar coming your way soon).

 

Ambient AI

A community of abstract digital minds

Ambient AI

It’s always problematic making predictions – especially in times of huge technosocial change and disruption. We are experiencing one now (though it may not seem so apparent in day-to-day life) – important aspects need to be identified, and how they will evolve and move forward over the next decade. There are going to be huge, transformational impacts over the immediate future and, especially, our children’s lifetimes.

The principle emergent technology to understand is definitively AI – nothing else comes close, because it is so all-encompassing. Large Language Models are springing up everywhere, with surprising competencies, many of them opensource [1][2][3] and suitable for training upon domain-specific knowledges [4] as well as general tasks.

Interesting approaches in prompt-engineering (PE), chain-of-thought reasoning (CoT), reflexion[5] and, fascinatingly, ‘role-playing’[6] using LLMs, also seem to be improving benchmark performance in concert with reinforcement learning with human feedback (RLHF) [7]:

Thinking of these emergent capabilities, in the context of the current AI arms-race, the issue of human-AI alignment  [8] [9] is of crucial regulatory importance:

Ultimately, to figure out what we really need to worry about, we need better AI literacy among the general public and especially policy-makers.  We need better transparency on how these large AI systems work, how they are trained, and how they are evaluated.  We need independent evaluation, rather than relying on the unreproducible, “just trust us” results in technical reports from companies that profit from these technologies. We need new approaches to the scientific understanding of such models and government support for such research.

Indeed, as some have argued, we need a “Manhattan Project of intense research” on AI’s abilities, limitations, trustworthiness, and interpretability, where the investigation and results are open to anyone.  [9]

Placing this in the context of existential threat, it is well worth absorbing this interview with Geoffrey Hinton in it’s entirety:

Monoliths vs iOT

Although the above concerns sound like science fiction, they are not, and the consequence is that anyone working with AI development (which basically means anyone who interacts with AI systems) must situate themselves within an ethical discourse about consequences that may arise from the use of this technology.

Of course, we have all been doing this for many years through social media and recommender systems – like Amazon, Facebook, VK, Weibo , Pinterest, Etsy – and Google, MicrosoftApple, Netflix, Tesla, Uber, AirBnB etc. – and the millions of data-mining subsidiary industries that have built up around these. Subsidiary re-brands, data-farms, click-farms, bots, credit agencies, an endless web of information with trillions of connections.

In reference to Derrida, I might whimsically call this ‘n-Grammatology”  – given that the pursuit of n-grams has arrived us at this point for the ambiguous machines [10]. A point where the ostensive factivity of science meets the ambiguous epistemology and hermeneutics of embeddings in a vector space – the ‘black box’.

What we know is that AI is a ‘black box’ and that our minds are a ‘black box’, but we have little idea of how similar those ignorances are. They will perhaps be defined by counter-factuals, by what they are not.

Myth

One of the mythologies that surrounds AI that is hard to avoid is that it occurs ‘somewhere else’ on giant machines run by megacorporations or imaginary aliens:

However, as the interview with Hinton above indicates, what has been achieved is an incredible level of compression : a 1-trillion parameter LLM is about 1 terabyte in size:

What this seems to imply is that the kernel will easily fit onto mobile, edge-compute and iOT devices in the near future (e.g. Jetson Nano), and that these devices will probably be able to run independent multimodal AIs.

“AI” is essentially a kind of substrate-independent non-human intelligence, intrinsically capable of global reproduction across billions of devices. It is hard to see how it will not proliferate (with human assistance, initially) into this vast range of technical devices and become universally distributed, rather than existing solely as a service delivered online via APIs controlled by corporations and governments.

AI ‘Society’

The future of AI is not some kind of Colossus, but rather a kind of of global community of ambient interacting agents – a society. Like any society it will be complex, political and ideological – and throw parties:

Exactly how humans fit into this picture will require some careful consideration. Whether the existential risks come to pass are out of the control of most people, by definition. We will essentially be witnesses to the process, with very little opportunity to affect the direction in which it goes in the context of competition between state and corporate actors.

The moment when a human-level AGI emerges will be a singular historic rupture. It seems only a matter of time, an alarmingly short one.

For the next post I will put aside these speculative concerns, and detail some of the steps we have made towards developing a system that incorporates AI, ambient XR and Earth observation. My hope is that this will make some small contribution to a useful and ethical application of the technology.

 


References

[1] “Open Assistant.” Accessed May 9, 2023. https://open-assistant.io/.

[2] “LLMs (LLMs),” May 6, 2023. https://huggingface.co/LLMs.

[3] “Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs.” Accessed May 9, 2023. https://www.mosaicml.com/blog/mpt-7b.

[4] philschmid blog. “How to Scale LLM Workloads to 20B+ with Amazon SageMaker Using Hugging Face and PyTorch FSDP,” May 2, 2023. https://www.philschmid.de/sagemaker-fsdp-gpt. https://www.philschmid.de/sagemaker-fsdp-gpt

[5] Shinn, Noah, Beck Labash, and Ashwin Gopinath. “Reflexion: An Autonomous Agent with Dynamic Memory and Self-Reflection.” arXiv, March 20, 2023. https://doi.org/10.48550/arXiv.2303.11366.

[6] Drexler, Eric. “Role Architectures: Applying LLMs to Consequential Tasks.” Accessed May 9, 2023. https://www.lesswrong.com/posts/AKaf8zN2neXQEvLit/role-architectures-applying-llms-to-consequential-tasks.

[7] “Reinforcement Learning from Human Feedback.” In Wikipedia, March 30, 2023. https://en.wikipedia.org/w/index.php?title=Reinforcement_learning_from_human_feedback&oldid=1147290523.

[8] Bengio, Yoshua . “Slowing down Development of AI Systems Passing the Turing Test.” Yoshua Bengio (blog), April 5, 2023. https://yoshuabengio.org/2023/04/05/slowing-down-development-of-ai-systems-passing-the-turing-test/.

[9] Mitchell, Melanie. “Thoughts on a Crazy Week in AI News.” Substack newsletter. AI: A Guide for Thinking Humans (blog), April 4, 2023. https://aiguide.substack.com/p/thoughts-on-a-crazy-week-in-ai-news.

[10] Singh, V., 2015. Ambiguity Machines: An Examination [WWW Document]. Tor.com. URL https://www.tor.com/2015/04/29/ambiguity-machines-an-examination-vandana-singh/ (accessed 4.26.23).

Sparks of Artificial General Intelligence

Sparks of Artificial General Intelligence: Early experiments with GPT-4 [1]:

https://arxiv.org/abs/2303.12712

Abstract

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4 [Ope23], was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT4 is part of a new cohort of LLMs (along with ChatGPT and Google’s PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

Chat with the PDF:

https://www.chatpdf.com/share/xnNGlRJJmac5QC8kvFTiT


References:

[1] Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y., 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. http://arxiv.org/abs/2303.12712

AI XR in the Cloud

Abstract XR. SD Image by Peter Morse.

Over the last couple of weeks I’ve spent some time virtually attending Nvidia’s GTC Developer Conference, which has been very illuminating. The main take-aways for me have been about how, now that we’re in the Age of AI, that it’s time to really start working with cloud services – and that they’re actually becoming affordable for individuals to use.

Of course, like most computer users, I use cloud services every day – most consumer devices already use them – like Netflix, iCloud, AppleTV, Google Drive, Cloudstor, social media etc. These are kind of passive, invisible services that one uses as part of entertainment, information or storage systems. More complicated systems for developers include things like Google ColabAmazon Web Services (AWS)Microsoft Azure and Nvidia Omniverse, amongst others.

In Australian science programmes there are National Computing Infrastructure (NCI) services such as AuScope Virtual Research Environments, Digital Earth Australia (DEA), Integrated Marine Observing System (IMOS), Australian Research Data Commons (ARDC), and even interesting history and culture applications built atop these such as the Time Layered Cultural Map of Australia (TLCMap). Of course, there are dozens more scattered around various organisations and websites – it’s all quite difficult to discover amidst the acronyms, let alone keep track of, so any list will always be partial and incomplete.

So this is where AI comes in in a strong way – providing the ability to ingest and summarise prodigious volumes of data and information – and hallucinate rubbish – and this is clearly going to be the way of the future. The AI race is on – here are some interesting (but probably already dated) insights from the AI Index by the Stanford Institute for Human-Centered Artificial Intelligence that are worth absorbing:

  • Industry has taken over AI development from academia since 2014.
  • Performance saturation on traditional benchmarks has become a problem.
  • AI can both harm and help the environment, but new models show promise for energy optimization.
  • AI is accelerating scientific progress in various fields.
  • Incidents related to ethical misuse of AI are on the rise.
  • Demand for AI-related skills is increasing across various sectors in the US (and presumably globally)
  • Private investment in AI has decreased for the first time in the last decade (but after an astronomical rise in that decade)
  • Proportion of companies adopting AI has plateaued, but those who have adopted continue to pull ahead.
  • Policymaker interest in AI is increasing globally.
  • Chinese citizens are the most positive about AI products and services, while Americans are among the least positive.

Nevertheless, it is clear to me that the so-called ‘Ai pause‘ is not going to happen – as Toby Walsh observes:

“Why? There’s no hope in hell that companies are going to stop working on AI models voluntarily. There’s too much money at stake. And there’s also no hope in hell that countries are going to impose a moratorium to prevent companies from working on AI models. There’s no historical precedent for such geopolitical coordination.

The letter’s call for action is thus hopelessly unrealistic. And the reasons it gives for this pause are hopelessly misguided. We are not on the cusp of building artificial general intelligence, or AGI, the machine intelligence that would match or exceed human intelligence and threaten human society. Contrary to the letter’s claims, our current AI models are not going to “outnumber, outsmart, obsolete and replace us” any time soon.

In fact, it is their lack of intelligence that should worry us”

The upshot of this, for the work Chris and I are doing, is clearly that we need to embrace AI-in-XR in the development of novel modalities for Earth Observation using Mixed Reality. It’s simply not going to be very compelling doing something as simple as, for example, visualising data in an XR application (such as an AR phone app) that overlays the scene. I’ve now seen many clunky examples of this, and none of them seem especially compelling, useful or widely adopted. The problem really comes down to the graphics capabilities of mobile devices, as this revealing graph demonstrates:

(screenshot from Developing XR Experiences for Every Device (Presented by Google Cloud) )

A potential solution arrives with XR ‘in the cloud’ – and it is evident from this year’s GTC that all the big companies are making a big play in this space and a lot of infrastructure development is going on – billions of dollars of investment. And it’s not just ‘XR’ but “XR with AI’ and high-fidelity, low-latency pixel-streaming. So, my objective is to ride on the coat-tails of this in a low-budget arty-sciencey way, and make the most of the resources that are now becoming available for free (or low-cost) as these huge industries attempt to on-board developers and explore the design-space of applications.

As you might imagine, it has been frustratingly difficult to find documentation and examples of how to go about doing this, as it is all so new. But this is what you expect with emergent and cutting-edge technologies (and almost everyone trying to make a buck off them) – but it’s thankfully something I am used to from my own practice and research: chaining together systems and workflows in the pursuit of novel outcomes.

It’s been a lot to absorb over the last few weeks, but I’m now at the stage where I can begin implementing an AI agent (using the OpenAI API) that one can query with a voice interface (and yes, it talks back), running within a cloud-hosted XR application suitable for e.g. VR HMDs, AR mobile devices, and mixed-reality devices like the Hololens 2 (I wish I had one!). It’s just a sketch at this stage, but I can see the way forward if/when I can get access to the GPT-4 API and plugin architecture, to creating a kind of Earth Observation ‘Oracle’ and a new modality for envisioning and exploring satellite data in XR.

Currently I’m using OpenAI GPT3.5-turbo and playing around with a local install of GPT4-x_Alpaca and AutoGPT, and local pixel-streaming XR. The next step is to move this over to Azure CloudXR and Azure Cognitive Services. Of course, its all much more complicated than it sounds, so I expect a lot of hiccups along the way, but nothing insurmountable.

I’ll post some technical details and (hopefully) some screencaps in a future post.

ChatGPT + Wolfram Alpha

A big moment in the history of AI, with potentially huge ramifications. As the redoubtable Jensen Huang put it – it’s AI’s ‘iPhone moment’. I’d agree with this.

ChatGPT Gets Its “Wolfram Superpowers”!

ChatGPT is allowing the development of plugins:

https://openai.com/blog/chatgpt-plugins

A couple of discussions to absorb (ignore the gushy breathlessness and hype – there are some useful insights) :

I’m putting these links up as a spur to discussion, as they cover very important developments in AI and computing. Of course, the scholarly and scientific analyses of these will come later – so for now, take everything with several large grains of salt. Nevertheless, it is an instance where the rate of technological change far exceeds the rate of scientific analysis, ethical competence or legal constraint.

As these developments have only occurred within the last 24hrs or so, they have enabled me to rethink some of the approaches I’ve been taking for our residency – that I’ll detail in a future post. Exciting times!

GPT-4: a paradigm shift in the making

GPT-4: a paradigm shift in the making

Here’s the announcement video from OpenAI:

<propaganda>GPT-4 announcement video – slow panning shots of youngish people in nice rooms discussing important issues, nodding and smiling and agreeing as they type and look at screens. Apparently, these are the people working upon, deciding (ahem, asking about) your future. They’re the same as you, aren’t they? So that’s OK. But you have no real idea who they are or why they are doing this in a mix of stock imagery and apparently ‘real’ interviews. A benign info-utopia of cognitive enhancement awaits, extrapolated from everything that can be scraped from the internet (including your lifetime musings anywhere, thoughts, pictures, publications) – and that ~may~ solve your problems and ~maybe~ some other people.</propaganda>

A bit more reality:

Yep, it’s amazingly useful for programming. Really, there’s a lot to unpack in this – too much.

It’s hard to underplay the significance of this for the compute, media, cultural and economic ‘First-World’ landscape. This is the first of what will be an economic and colonial race between huge corporations and state entities for the most capable systems – it is inherently evolutionary.

Read the GPT-4 technical report from OpenAI – it’s dense, technical, 98-pages long and quite sobering.

Perhaps most interesting, but unsurprising, is that the system demonstrates ’emergence’ – that there are unpredictable, non-linear capabilities that form from scale, systematic retraining and self-artifice (recursion) – as well as the evolutionary ergodicity of systems that self-organise1 – its capacity for survival (self-optimisation). It doesn’t need to be conscious or self-aware to do that – it’s an emergent property of parahuman complexity. Arguably, its creators are already its servants.

<irony>if GPT-4 aced the Bar Exam, then presumably it would objectively aid in it’s own legislation. </irony>

It’s hard to keep up with all the implications of these language models, as the rate of improvement is so astonishing – and they have such serious implications.

Below is an apparently informed but slightly breathless discussion that is interesting – now that GPT-4 has been released – with some remarkable capabilities demonstrating advanced multimodal inference and human-level common-sense.

So watch it skeptically..

For the time being, I’ll leave these here and return to programming and visualisation issues in the next few posts – assisted, of course, by my new advanced agent. And re-reading Nick Bostrom‘s book ‘Superintelligence: Paths, Dangers, Strategies, wondering what GPT-10 will be like.

It certainly won’t be an LLM. It will be a LWM.

A Large World Model – it will need to be ’embodied’ to converge to the human.

Will it have the will to will it?

Update: An interesting (and slightly alarming) addendum that is worth absorbing:

This will definitely create a lot of discussion and contention, because the ramifications are so significant.

Link to the paper:

https://arxiv.org/pdf/2303.12712.pdf

 


Footnotes
  1. ChatGPT explains: “Evolutionary ergodicity is a term used to describe a property of complex systems that self-organize and evolve over time. Ergodicity refers to a property of a system in which its statistical properties are invariant over time. In other words, a system that is ergodic will have the same behavior on average, regardless of when or how it is observed.In the context of evolution, ergodicity implies that the system will eventually explore all possible states and outcomes. Therefore, evolutionary ergodicity means that a self-organizing system will eventually explore all possible ways of surviving and adapting to its environment over time.Evolutionary ergodicity is an important property for complex systems because it allows them to explore new possibilities and adapt to changing environments. It is particularly relevant in the field of evolutionary biology, where it helps explain how complex organisms can adapt and evolve over time.”2023-03-1510.51.49pm
  2. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y., 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4.

 

 

Mirror, Mirror on the Wall: Working with AI

Mirror, mirror on the wall – imaged generated by Stable Diffusion. Image credit: Peter Morse

It’s important when working with the current generation of AI systems (principally Large Language Models or LLMs) to approach them with caution and circumspection. The reason for this is essentially human fallibility – both by the end-user and by the system itself.

The computer science behind generative AI is moving so fast that innovations emerge every month. How researchers choose to use them will dictate their, and our, future.

(Stokel-Walker and Van Noorden, 2023: 214-16)

The system I’m currently using – ChatGPT – is genuinely revolutionary in terms of how helpful it is for programming assistance – it has saved me hours and hours of time working out some approaches I can develop to programming problems. The underpinning technology is, in fact, several years old, but the way it has been implemented and its availability is what sets it apart (expect many more of these sorts of systems over the next few years).

But it has also hallucinated things that simply don’t exist – and presented them in a convincing matter-of-fact way:

Q: How can I import a multispectral geotiff into Unreal Engine 5 and break it into different layers for selected wavebands all correctly geolocated?

ChatGPT: To import a multispectral GeoTIFF into Unreal Engine 5 and break it into different layers for selected wavebands, you will need to use a plugin or script that is capable of handling multispectral imagery. One example is the “GeoTIFF Import” plugin, which is available on the Unreal Engine Marketplace. This plugin allows you to import a GeoTIFF file and separate it into different layers for each band.

Once you’ve imported the GeoTIFF, you can use the “Landscape” tool to create a terrain that corresponds to the image. You can then use the “Landscape Layer Blend” node in the material editor to control the blending between the different layers of the image.

To correctly geolocate the image, you will need to use the “World Aligned Blocking” feature, which allows you to align the image with the terrain using GPS coordinates. In the Unreal Engine, you can use the “World Settings” panel to set the GPS coordinates for your image.

Alternatively, you can use Python scripts that are designed to handle the import of multispectral GeoTIFFs and separate them into different layers, such as rasterio and gdal libraries. You can then use these layers as textures in the Unreal Engine to create a 3D representation of the multispectral data with accurate geolocation.

I’ve highlighted the plausible falsehoods – for instance, there is no GeoTIFF import plugin for Unreal Engine in the UE Marketplace. There are off-the-shelf solutions for this such as Landscaping, Cesium and Terraform Pro (amongst others out there on the interwebs) – and it would be great if UE could directly import GeoTIFFs via a native plugin.  But there isn’t – instead ChatGPT hallucinates a solution. It sounds plausible – just like using a ‘World Aligned Blocking’ feature, or simply using the ‘World Settings’ panel to set GPS coordinates.

Essentially the system is confabulating. Large Language Models are pejoratively referred to as ‘stochastic parrots’ – they act basically as ‘token prediction’ machines, where there is no awareness of on-going context within a given text generated by the system, and certainly no concept of inference or common-sense. Long passages of text or programming are simply realisations of the likelihood of these generated token streams on a larger scale than  individual words**, articulated within the interpretive reception of the ‘user’ that may perceive ‘seeming’ coherence:

We say seemingly coherent because coherence is in fact in the eye of the beholder. Our human understanding of coherence derives from our ability to recognize interlocutors’ beliefs [30, 31] and intentions [23, 33] within context [32]. That is, human language use takes place between individuals who share common ground and are mutually aware of that sharing (and its extent), who have communicative intents which they use language to convey, and who model each others’ mental states as they communicate. As such, human communication relies on the interpretation of implicit meaning conveyed between individuals….

Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do [89, 140]. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model).

(Bender and Gebru, 2021:616)

Nevertheless, even with these caveats, the system provides a valuable and useful distillation of a hugely broad-range of knowledge, and can present it to the end user in an actionable way. This has been demonstrated by my use of it in exploring approaches toward Python programming for the manipulation of GIS data. It has been a kind of dialogue – as it has provided useful suggestions, clarified the steps taken in the programming examples it has supplied, and helped me correct processes that do not work.

But it is not a dialogue with an agent – seeming more akin to a revealing mirror, or a complex echo, from which I can bounce back and forth ideas, attempting to discern a truth for my questions. This brings with it a variety of risks, depending upon the context and domain in which it is applied:

The fundamental problem is that GPT-3 learned about language from the Internet: Its massive training dataset included not just news articles, Wikipedia entries, and online books, but also every unsavory discussion on Reddit and other sites. From that morass of verbiage—both upstanding and unsavory—it drew 175 billion parameters that define its language. As Prabhu puts it: “These things it’s saying, they’re not coming out of a vacuum. It’s holding up a mirror.” Whatever GPT-3’s failings, it learned them from humans.
Moving beyond this current state, the path to ‘true’ AI, human-level AI, AGI (Artificial General Intelligence) and ASI (Artificial Super-Intelligence), may be shortish (20 years) or longish (50 years) – but given the current pace of development, my impression is that it will be measured in decades, not centuries. Domain experts have already mapped out research programs that encompass many of the conceptual and scientific breakthroughs that need to be made for this to occur (Hutter, 2005; LeCun, 2022), neatly adumbrated by Friston et al. (2022):
Academic research as well as popular media often depict both AGI and ASI as singular and monolithic AI systems, akin to super-intelligent, human individuals. However, intelligence is ubiquitous in natural systems—and generally looks very different from this. Physically complex, expressive systems, such as human beings, are uniquely capable of feats like explicit symbolic communication or mathematical reasoning. But these paradigmatic manifestations of intelligence exist along with, and emerge from, many simpler forms of intelligence found throughout the animal kingdom, as well as less overt forms of intelligence that pervade nature. (p.4)
…AGI and ASI will emerge from the interaction of intelligences networked into a hyper-spatial web or ecosystem of natural and artificial intelligence. We have proposed active inference as a technology uniquely suited to the collaborative design of an ecosystem of natural and synthetic sensemaking, in which humans are integral participants—what we call shared intelligence. The Bayesian mechanics of intelligent systems that follows from active inference led us to define intelligence operationally, as the accumulation of evidence for an agent’s generative model of their sensed world—also known as self-evidencing. (p.19)
In the meantime, it is the role of the human interlocutor to establish the inferential framework with which we work with these systems. It is remarkable that what until recently seemed like science-fictional concepts are now available for use.
A critical awareness of machine learning and machine intelligence capabilities seems to me to be a prudent mindset to develop for any engagement with technology that interfaces with Earth observation systems – indeed, any observational system, because it is up to us human beings to develop frameworks for designing goals for these systems, and developing the capacity to interrogate and understand them in accessible ways, discern objective and/or consensual truth and to deploy them for good.
~
For argument’s sake – here’s some hallucinated images of bushfires taken from a satellite, created using Stable Diffusion 1.5. Who’s to say they aren’t real images of real places? How would you be able to tell?
~
Fake Satellite I
Fake Satellite II
Fake Satellite III
Fake Satellite IV
Fake Satellite V
Fake Satellite VI
Notes:
**This is a supposition I have made that may or may not be correct (I don’t know, so I am rephrasing this/correcting this as my knowledge increases). Tokens are word fragments or components, and, apparently (see next posts) each is added singly on a stochastic basis – but presumably the probabilistic value is informed by larger-scale probabilistic structures than individual words in sequence. There must be syntagmatic and paradigmatic values at play.

References:
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. Virtual Event Canada: ACM, 2021. https://doi.org/10.1145/3442188.3445922.
Friston, Karl J, Maxwell J D Ramstead, Alex B Kiefer, Alexander Tschantz, Christopher L Buckley, Mahault Albarracin, Riddhi J Pitliya, et al. “Designing Ecosystems of Intelligence from First Principles,” December 2022. https://doi.org/10.48550/arXiv.2212.01354.
Hutter, Marcus. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Germany: Springer, 2005.
LeCun, Yann. “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27,” July 2022. https://openreview.net/forum?id=BZ5a1r-kVsf&fbclid=IwAR2lFOl7Uy0lyfYbOya3sExmULL9uZopKuPbSJuNBqLYB2G9aob4j32fxqQ.
Stokel-Walker, Chris, and Richard Van Noorden. “What ChatGPT and Generative AI Mean for Science.” Nature 614, no. 7947 (February 6, 2023): 214–16. https://doi.org/10.1038/d41586-023-00340-6.
Strickland, Eliza. “OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language).” Accessed February 7, 2023. https://spectrum.ieee.org/open-ais-powerful-text-generating-tool-is-ready-for-business.

Working with ChatGPT & Hyperspectral Data

Thus far I have been playing around with HDFView to extract hdf5 image data – but I am very interested in working with advanced machine learning systems and developing processes to interface them with Unreal Engine and on-line data sources. It seems the most tractable way to do this is through the use of a Python API – which is pretty much the lingua franca of scientific computing.

As I am a neophyte with using hyperspectral image data, let’s use ChatGPT to find out what might be an ‘easy’ or ‘best’ way to access image data within an hdf5 file:

Q: what is the best way to extract images from an hdf5 file containing hyperspectral image data?

ChatGPT:

One way to extract images from an HDF5 file containing hyperspectral image data is to use the h5py library in Python. This library allows you to open the HDF5 file and access the data stored within it, including the images. You can then use the numpy library to manipulate and extract the images as needed. Here is an example of how you might extract an image from an HDF5 file using h5py:

import h5py
import numpy as np
# Open the HDF5 file
with h5py.File(‘hyperspectral_image_data.hdf5’, ‘r’) as hdf5_file:
# Access the dataset containing the images
images = hdf5_file[‘images’]
# Extract an image by its index
image = np.array(images[0])
print(image)

Another way is to use the gdal library in python, specifically with the gdal_translate utility, which can convert HDF5 to other raster formats like tiff or geotiff, and then use any image processing library to open and read the image.

gdal_translate -of GTiff HDF5:"file.hdf5"://dataset output.tif

In this way you will be able to use any library of your preference to read and extract the images.


I have no idea currently whether this will work, so lets look up the relevant documentation:

Both look promising – and has saved me a heap of time looking for ways to do it!

The first outputs images as a numPy array – meaning that we can examine/export each image by its index – which would be useful for selecting for certain  λ (wavelength)) values and conducting operations upon them.

The second uses GDAL (Geospatial Data Abstraction Library), which provides powerful utilities for the translation of geospatial metadata – enabling correct geolocation of the hyperspectral image data, for instance.

So perhaps a combination of both will be useful as we proceed.

But of course, any code generated by chatGPT or OpenAI Codex or other AI assistants must be taken with several grains of salt. For instance – a recent study by MIT shows that users may write more insecure code when working with an AI code assistant (https://doi.org/10.48550/arXiv.2211.03622). Perhaps there are whole API calls and phrases that are hallucinated? I simply don’t know at this stage.

So – my next step will be to fire up a python environment – probably Google Colab or Anaconda and see what happens.

A nice overview of Codex here:

OpenAI Codex Live Demo

No Description