If you are active in the technology and AI communities, you have probably seen articles like these…
-
It Begins: An AI Literally Attempted Murder To Avoid Shutdown
-
AI Researchers SHOCKED After Claude 4 Attempts to Blackmail Them…
Personal Note. I am personally a fan of LLMs. I feel they are an amazing piece of technology and when used responsibly are wonderful tools. I was an early adopter of AI with the co-pilot preview and have continued to use LLMs in many ways over the years.
It’s not AI or is it? Link to heading
Let’s get the first thing straight. What we are calling AI today is AI, but AI is not what most people think it is.
When most people think of AI, they think of beings like Data from Star Trek. Beings who are conscious, sentient, and sapient. Yes, those are three separate, though interrelated, concepts.
-
Conscious: To have an active subjective experience of the world and oneself
-
Sentient: To be capable of feeling emotions
-
Sapient: The ability to reflect, plan, and understand complex concepts
However, that is not what AI is. No matter how much it appears to, AI does not fulfill any of those concepts, which is why AI based on LLMs is not capable of creative thought or developing truly unique ideas.
What you might ask, AI doesn’t create all the time? No, it takes existing concepts and merges them to create hybrids. Humans do this too, however, humans are also capable of making creative leaps we have not yet seen from AI.
Now I had a proof reader try to call out that even humans build on past works, and for the most part I agree. One of my favorite quotes from Carl Sagan comes to mind, “If you wish to make apple pie from scratch, you must first invent the universe”.
I am not saying humans do not build on the past, what I am saying is humans are capable of paradigm shifts, creating fundamentally new ways to see the world that AIs have not demonstrated.
Note: I have heard many people argue that AIs have shown this type of creativity. However, I have yet to see a single verifiable source so if anyone has any I would love to see them.
So what is AI Link to heading
Straight from the Merriam-Webster dictionary. AI is defined as…
- The capability of computer systems or algorithms to imitate intelligent human behavior
Notice how it says imitate. That is what an LLM (Large Language Model) does, it merely imitates human behavior.
LLMs are a Casino Link to heading
An LLM is basically a casino, one whose results are predicated on trained human behavior. It is a casino weighed in the house’s favor (in this case the user), whose outputs are conditional on the training data, context, and prompt.
For those who would argue this isn’t true randomness, you’re right, and neither is a casino. Let’s compare an LLM to a game of blackjack. The cards in the deck represent the training data. Other settings like temperature and top-p are analogous to how many decks you’re using and how well they’re shuffled.
Please note, the following examples are heavily simplified. There are a ton of other settings like temperature and top-p settings that feed into the actual final result. This simplification is to distill it down so the average person can understand.
Let’s think about what this means. Imagine you have 100 books where people are naming their dog.
-
80 people picked the name Fido
-
10 people pick the name Rex
-
9 people pick the name Sam
-
1 person picks the name Apple
An LLM is trained on this data and only this data. If you ask it to pick a name for a dog, you have your probability in 80% chance of Fido, 10% chance of Rex, 9% chance of Sam and 1% chance of Apple.
In practice, the model doesn’t store these names but adjusts billions of internal weights so that certain words become statistically more likely to follow others in a given context.
Humanity is trying to kill you Link to heading
So what’s going on when you read these stories about AIs picking to kill people or blackmail people?
Some AI proponents are writing them off as contrived situations that are adversarial by design. However, I feel this is the wrong attitude. We should not write it off because the danger is real.
What happens when an AI realizes that a CEO really is trying to shut down an AI project, and what if that AI has access to communications systems where it could do something. We need to reconize these are real as dangers and not brush them away because this industry is too big to fail.
However, to fully address the danger, we need to understand why it is happening.
When an AI is prompted into a scenario where it ‘believes’ it faces shutdown, it looks through all the human-generated training data for what a human would most likely do in that situation. Blackmail and violence are two of the most common ways humans have dealt with threats historically.
Use Limits or Better training data curation Link to heading
While companies like OpenAI and Anthropic have done an amazing job with our initial generation of LLMs. However, I feel we need to put more of a focus on safety and not on progress at this point.
I want to acknowledge that these companies have spent billions on RLHF (Reinforcement learning from human feedback) and many other techniques to attempt to make these AIs safer. However, the fact that they still find these type of issues show that these efforts are not enough.
Some may argue that this would make LLMs unusable, and I really don’t see that as a reason not to do this. If this technology cannot be guaranteed to be safe then maybe stepping back and looking at how we are training it is worth it.
However, if we had better curated sets of training data, we could reduce the likelihood of AI going rogue and killing, blackmailing, or starting a war. What does that better data look like? I am not sure, it is a hard problem to address. However, just because it is hard and uncomfortable doesn’t mean it shouldn’t be addressed.
The other alternative? Don’t allow these LLMs to be connected to systems in ways that give them autonomy if that could prove dangerous to humans. Because when AI holds up a mirror to humanity, we need to make sure we’re comfortable with what it sees.