Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368

The problem is that we do not get 50 Years to try and try again and observe That we were wrong and come up with a Different Theory and realize that the Entire thing is going to be like way More difficult and realized at the start Because the first time you fail at Aligning something much smarter than you Are you die The following is a conversation with Eliezer yatkowski a legendary researcher Writer and philosopher on the topic of Artificial intelligence especially super Intelligent AGI and its threat to human Civilization This is the Lex Friedman podcast to Support it please check out our sponsors In the description and now dear friends Here's Eliezer idkowski What do you think about gpt4 how Intelligent is it It is a bit smarter than I thought this Technology was going to scale to And I'm a bit worried about what the Next one will be like like this Particular one I think I hope there's nobody inside there Because you know it would be sucked to Be stuck inside there Um but we don't even know the Architecture at this point because open AI is very properly not telling us And Yeah like giant inscrutable matrices of

Floating Point numbers I don't know What's going on in there nobody's goes Knows what's going on in there all we Have to go by are the external metrics And on the external metrics if you Ask it to write a self-aware fortune Green text it will start writing a green Text about how it has realized that it's An AI writing a green text and like oh Well so That's probably Not quite what's going on in there in Reality Um but we're kind of like blowing past All these science fiction guard rails Like we are past the point where in Science fiction people would be like Whoa wait stop that thing's live what Are you doing to it And it's probably not Nobody actually knows we don't have any Other guard rails we don't have any Other tests we don't have any lines to Draw on the sand and say like well when We get this far we will start to worry About What's inside there So if it were up to me I would be like Okay like this far no further time for The summer of AI where we have planted Our seeds and now we like wait and reap The rewards of the technology we've Already developed and don't do any Larger training runs than that which to

Be clear I realize requires more than One company agreeing to not do that And take a rigorous approach for the Whole AI Community to uh investigate Whether there's somebody inside there That would take decades Like having any idea of what's going on In there people have been trying for a While it's a poetic statement about if There's somebody in there but as I feel Like it's also a technical statement or I hope it is one day Which is a technical statement with that Alan Turing tried to come up with with The touring test Do you think it's possible to Definitively Or approximately figure out if there is Somebody in there if there's something Like a mind inside this large language Model I mean there's a whole bunch of Different sub questions here there's the Question of Like Is there Consciousness is there qualia Is this a object of moral concern is the Same oral patient Um like should we be worried about how We're treating it And then there's questions like how Smart is it exactly can it do X can it Do y and we can check how it can do X And how it can do y

Um unfortunately we've gone and exposed This model to a vast Corpus of text of People discussing Consciousness on the Internet which means that when it talks About being self-aware we don't know to What extents it is repeating back what It has previously been trained on for Discussing self-awareness Or if there's anything going on in there Such that it would start to say similar Things spontaneously Um Among the things that one could do if One were at all serious Um about trying to figure this out is Train gpt3 to detect conversations about Consciousness exclude them all from the Training data sets and then retrain Something around the rough size of gpt4 And no larger With all of the discussion of Consciousness and self-awareness and so On missing although you know hard hard Bar to pass you know like you humans are Self-aware we're like self-aware all the Time we like talk about what we do all The time like what we're thinking at the Moment all the time But nonetheless like get rid of the Explicit discussion of Consciousness I Think therefore I am and all that and Then try to interrogate that model And see what it says and it still would Not be definitive

But nonetheless uh I don't know I feel like when you run Over this science fiction guard rails Like maybe this thing but what about gbt Maybe maybe not this thing but like what About gpt5 you know this this would be a Good place to to pause On the topic of cautiousness you know There's so many components To even just removing Consciousness from The data set Emotion the display of Consciousness the Display of emotion feels like deeply Integrated with the experience of Consciousness So the hard problem seems to be very Well integrated with the actual surface Level illusion of Consciousness so Displaying emotion I mean do you think there's a case to be Made that we humans when we're babies Are just like gbt that we're training on Human data on how to display emotion Versus feel emotion how to show others Communicate others That I'm suffering that I'm excited that I'm worried That I'm lonely and I missed you and I'm Excited to see you all of that is Communicated there's a communication Skill versus the actual feeling that I Experience so We need that training data as humans too That we may not be born with that how to

Communicate the internal State and That's in some sense if we remove that From GPT Force data set it might still Be conscious but not be able to Communicate it So I think you're going to have some Difficulty removing all mention of Emotions from gpt's data set I would be Relatively surprised to find that it has Developed exact analogs of human Emotions and there I think that humans Have well like have like Emotions even if you don't tell them About those emotions when they're kids It's not quite exactly what Various blanks blank slightests try to Do with the new Soviet man and all that But you know if you try to raise people Perfectly altruistic they still come out Selfish You try to raise people's sexless they Still develop sexual attraction Um You know we have some notion in humans Not in AIS of like where the brain Structures are that implement this stuff And it is really remarkable thing I say In passing that despite having complete Read access to every floating Point Number in The GPT series we still know vastly more About the the architecture of human Thinking then we know about what goes on Inside GPT despite having like vastly

Better ability to read GPT Do you think it's possible do you think That's just a matter of time do you Think it's possible to investigate and Study the way neuroscientists study the Brain Which is look into the darkness The Mystery of the human brain by just Desperately trying to figure out Something and to form models and then Over a long period of time actually Start to figure out what regions of the Brain do certain things with different Kinds of neurons when they fire what That means how plastic the brain is all That kind of stuff you slowly start to Figure out different properties of the System do you think we can do the same Thing with language models uh sure I Think that if you know like half of Today's physicists stop wasting their Lives on string theory or whatever And go off and study Um what goes on inside Transformer Networks Um then in You know like 30 40 years uh we'd Probably have a pretty good idea Do you think these large language models Can reason They can play chess how are they doing That without reasoning So You're somebody that spearheaded the

Movement of rationality so reason is Important to you Is so is that as a powerful important Word or is it like how difficult is the Threshold of being able to reason to you And how impressive is it I mean In my writings on rationality I have not Gone making a big deal out of something Called reason I have made more of a big Deal out of something called probability Theory And that's like well your reasoning but You're not doing it quite right And you should reason this way instead And interestingly like people have Started to get preliminary results Showing that Reinforcement learning by human feedback Has made the GPT series worse in some Ways In particular like it used to be well Calibrated if you trained it to put Probabilities on things it would say 80 Probability and we write eight times out Of ten and if you apply reinforcement Learning from Human feedback the the Like nice graph of like like 70 7 out of Ten Sort of like flattens out into the graph That humans use where there's like some Very improbable stuff and Likely probable maybe which all means Like around 40 percent and then certain Yeah so like it's like it used to be

Able to use probabilities but if you Apply but if you'd like try to teach it To talk in a way that satisfies humans It it gets worse at probability in the Same way that humans are and that's uh That's a bug not a feature I would call It a bug Although such a fascinating bug Um but but but yeah so so like reasoning Like it's doing pretty well on various Tests that people used to say would Require reasoning but Um you know rationality is about When you say eighty percent doesn't Happen eight times out of ten So what are the limits to you of these Transformer Networks Of of neural networks which if if Reasoning is not impressive to you or it Is impressive but there's other levels To achieve I mean it's just not how I Carve up reality What's uh if reality is a cake What are the different layers of the Cake or the slices how do you cover it But you can use a different food if you Like It's I don't think it's as smart as a Human yet Um I do like back in the day I went Around saying like I do not think that Just stacking more layers of Transformers is going to get you all the Way to AGI and I think that's gpt4 is

Passed or I thought this Paradigm was Going to take us And I you know you want to notice when That happens you want to say like whoops Well I guess I was incorrect about what Happens if you keep on stacking more Transformer layers and that means I Don't necessarily know what gpt5 is Going to be able to do that's a powerful Statement so you're saying like your Intuition initially is now appears to be Wrong yeah It's good to see that you can admit in Some of your predictions to be wrong Do you think that's important to do see Because you make several very throughout Your life you've made many strong Predictions and statements about reality And you evolve with that so maybe That'll come up today about our Discussion so you're okay being wrong I'd rather not Be wrong next time it's a bit ambitious To go through your entire life never Having been wrong Um One can aspire to be well calibrated Like not so much think in terms of like Was I right was I wrong but like when I Said 90 that it happened nine times out Of ten Yeah like oops is the sound we make is The sound we emit when we improve Beautifully said and somewhere in there

It we can connect the name of your blog Less wrong I suppose that's the objective function The name less wrong was I believe uh Suggested by Nick Bostrom and it's after Someone's epigraph actually forget who's Who said like we never become right we Just become less wrong Um what's the something something to Easy to confess just error and error and Air again but less and less and less Yeah that's that's a good thing to Strive for uh so What has surprised you about gpt4 that You found beautiful as a scholar of Intelligence of human intelligence of Artificial intelligence of the human Mind I mean The beauty does interact with the Screaming horror Um is the beauty in the horror but uh But like Beautiful Moments well somebody Asked Bing Sydney to describe herself And felt the resulting description into One of the stable diffusion things I Think And you know she you know it's she's Pretty and this is something that should Have been like an amazing moment like The AI describes herself you get to see What the AI thinks the AI looks like Although you know the the thing that's Doing the drawing is not the same thing

That's outputting the text Um And It's it doesn't happen the way that it Would happen and that it happened in the Old school science fiction when you ask An AI to make a picture of what it looks Like Um not just because we're two different AI systems being stacked that don't Actually interact it's not the same Person but also because The AI was trained by imitation in a way That makes it very difficult to guess How much of that it really understood And probably not actually a whole bunch Um although although gpt4 is like Multimodal and can like draw vector Drawings of things that make sense and Like does appear to have some kind of Spatial visualization going on in there But like the the pretty picture of the Like girl with the With the uh steampunk goggles on her Head if I'm remembering correctly what She looked like like it didn't see that In full detail It just like made a description of it And stable diffusion output it and There's the concern about How much the discourse is going to go Completely insane once the AIS all look Like that and like are actually look Like people talking

Um and Yeah there's like another moment where Somebody is asking Bing about Um like well I like fed my kid green Potatoes and they have the following Symptoms and being as like that solanine Poisoning and like call an ambulance and The person's like I can't afford an Ambulance I guess if like this is time For like my kid to go that's God's Will And the main Bing thread says gives the Like message of like I cannot talk about This anymore And the suggested replies to it say Please don't give up on your child Solanine poisoning can be treated if Caught early And you know if that happened in fiction That would be like the AI cares the AI Is bypassing the block on it to try to Help this person And is it real probably not but nobody Knows what's going on in there It's part of a process where these Things are not happening in a way where We Somebody figured out how to make an AI Care and we know that it cares and we Can acknowledge it's caring now it's Being trained by this imitation process Followed by reinforcement learning on Human in human feedback and we're like Trying to point it in this direction and It's like pointed partially in this

Direction and nobody has any idea what's Going on inside it and if there was a Tiny fragment of real caring in there we Would not know it's not even clear what It means exactly and uh things are clear Cut in science fiction we'll talk about The the horror and the terror and the Where the trajectories this can take but This seems like a very special moment Just a moment where we get to interact With the system that might have care and Kindness and emotion it may be something Like consciousness And we don't know if it does and we're Trying to figure that out and we're Wondering about what is what it means to Care we're trying we're trying to figure Out almost different aspects of what it Means to be human about The Human Condition by looking at this AI that has Some of the properties of that it's Almost like this the subtle fragile Moment in the history of the human Species we're trying to almost put a Mirror to ourselves here except that's Probably not yet it probably isn't Happening right now We are we are boiling the Frog we are Seeing increasing signs bit by bit Because like not but not like Spontaneous signs because people are Trying to train the systems to do that Using imitative learning and the Imitative learning is like spilling over

And having side effects and and the most Photogenic examples are being posted to Twitter Um rather than being examined in any Systematic way so when you when you when You have some when you are boiling a Frog like that or you're going to get Like like first is going to come the the Blake lemoines like first you're going To like have and have like a thousand People looking at this and one out and The one person out of a thousand who is Most credulous about the signs is going To be like that thing is sentient well 90 999 out of a thousand people think Almost surely correctly though we don't Actually know that he's mistaken And so the like first people to say like Sentience look like idiots and Humanity Learns the lesson that when something Claims to be sentient And claims to care It's fake because it is fake because we Have been trained them training them Using imitative learning rather than and This is not spontaneous Um and they keep getting smarter Do you think we would oscillate between That kind of cynicism That AI systems can't possibly be Sentient they can't possibly feel Emotion they can't possibly this kind of Um yeah cynicism about AI systems and Then

Oscillate to a state where Uh we empathize with the AI systems we Give them a chance we see that they Might need to have rights and respect And Um similar role in society as humans You're going to have a whole group of People who can just like never be Persuaded of that because to them like Being wise being cynical being skeptical Is to be like oh well machines can never Do that you're just credulous it's just Imitating it's just fooling you and like They would say that right up until the End of the world and possibly even be Right because you know they are being Trained on an imitative paradigm And you don't necessarily need any of These actual qualities in order to kill Everyone so have you observed yourself Working through skepticism Cynicism and optimism about the power of Neural networks what is that trajectory Been like for you it looks like neural Networks before 2006 forming part of an Indistinguishable to me Other people might have had better Distinction on it indistinguishable blob Of different AI methodologies all of Which are promising to achieve Intelligence without us having to know How intelligence works You have the people who said that if you Just like manually program lots and lots

Of knowledge into the system line by Line at some point all the knowledge Will start interacting it will know Enough and it will wake up Um you've got people saying that if you Just use evolutionary computation if you Try to like mutate lots and lots of Organisms that are competing together That's that's the same way that human Intelligence was produced in nature so We'll do this and it will wake up Without having the idea of how AI works And you've got people saying well we Will study neuroscience and we will like Learn the outer we'll learn the Algorithms off the neurons and we will Like imitate them without understanding Those algorithms which was a part I was Pretty skeptical it's like hard to Reproduce re-engineer these things Without understanding what they do Um and like and and so we will get AI Without understanding how it works and There were people saying like well we Will have giant neural networks that we Will Train by gradient descent and when They are as large as the human brain They will wake up we will have Intelligence without understanding how Intelligence works and from my Perspective this is all like an Indistinguishable lab of people who are Trying to not get to grips with the Difficult problems understanding how

Intelligence actually works That said I was never skeptical that evolutionary Computation Would not work in the limit like you Throw enough computing power at it it Obviously works That is where humans come from Um and it turned out that you can throw Less computing power than that at Gradient descent If you are doing some other things Correctly And you will get intelligence without Having any idea of how it works and what Is going on inside Um it wasn't ruled out by my model that This could happen I wasn't expecting it To happen I wouldn't have been able to Call neural networks rather than any of The other paradigms for getting like Massive amount like intelligence without Understanding it And I wouldn't have said that this was a Particularly smart thing for a species To do which is an opinion that has Changed less than my opinion about Whether you or not you can actually do It Do you think AGI could be achieved with A neural network as we understand them Today yes Just flatly last yes the question is Whether the current architecture of

Stacking more Transformer layers which For all we know gpt4 is no longer doing Because they're not telling us the Architecture which is a correct decision Oh correct decision I had a conversation With Sam Altman will return to this Topic a few times He turned the question to me Of how open should open AI be about gpt4 Would you open source the code he asked Me Because I provided as criticism saying That while I do appreciate transparency Open AI could be more open And he says we struggle with this Question what would you do change their Name to closed AI and like Sell gpt4 to business backend Applications that don't expose it to Consumers and Venture capitalists and Create a ton of hype and like pour a Bunch of new funding into the area but Too late now but don't you think others Would do it Eventually you shouldn't do it first Like if if you already have giant Nuclear stockpiles don't build more If some other country starts building a Larger nuclear stockpile than sure build Then you know Even then maybe just have enough nukes You know there's a these things are not Quite like nuclear weapons they spit out Gold until they get large enough and

Then ignite the atmosphere and kill Everybody Um And there is something to be said for Not destroying the world with your own Hands even if you can't stop somebody Else from doing it But but open sourcing it now that that's Just sheer catastrophe oh the whole Notion of open sourcing this was always The wrong approach the wrong ideal there Are there are places in the world where Open source is a noble ideal and Building stuff you don't understand that Is difficult to control that where if You could align it it would take time You'd have to spend a bunch of time Doing it that is that is not a place for Open source because then you just have Like powerful things that just like go Straight out the gate without anybody Having had the time to have them not Kill everyone So can we still man the case for Some level of transparency and openness Maybe open sourcing So the case could be that because gpt4 Is not close to AGI if that's the case That this does allow open sourcing You're being open about the architecture Being transparent about maybe research And investigation of how the thing works Of all the different aspects of it of Its behavior of its structure of of its

Training processes of the data was Trained on everything like that that Allows us to gain a lot of insight about Alignment about the alignment problem to Do really good AI Safety Research while The system is not too powerful can you Make that case that it could be a Resource I do not believe in the Practice of Steel Manning there's Something to be said for trying to pass The ideological Turing test where you Describe your opponent's position uh the Disagree disagreeing person's position Well enough that somebody cannot tell The difference between your description And their description But Steel Manning no like okay well this is Where you and I disagree here that's Interesting why don't you believe in Steel Manning I do not want okay so for One thing if somebody's trying to Understand me I do not want them steel Manning my position I want them to Describe to to like try to describe my Position the way I would describe it not What they think is an improvement Well I I think that is what The steel Manning is is the most Charitable interpretation I I don't want to be interpreted Charitably I want them to understand What I'm actually saying if they go off Into the land of charitable

Interpretations they're like often their Land of like The thing the stuff they're imagining And not trying to understand my own Viewpoint anymore well I'll put it Differently then just to push on this Point I would say it is restating what I Think you understand Under the empathetic assumption that Eliezer is brilliant And have honestly and rigorously thought About the point he has made right so if There's two possible interpretations of What I'm saying and one interpretation Is really stupid and whack and doesn't Sound like me and doesn't fit with the Rest of what I've been saying and one Interpretation you know sounds like some Like something a reasonable person who Believes the rest of what I believe Would also say go with the second Interpretation that's steel Manning That's that's a good guess If on the other hand you like there's Like Something that sounds completely whack And something that sounds like a little Less completely whack but you don't see Why I would believe in it doesn't fit With the other stuff I say but you know It sounds like less whack and you can Like sort of see you could like maybe Argue it then you probably have not Understood it see okay I'm gonna this is

Fun because I'm gonna Linger on this you Know you wrote a brilliant blog post AJ I ruined a list of lethalities right and It was a bunch of different points and I Would say that some of the points are Bigger and more powerful than others if You were to sort them you probably could You personally and to me steel Manning Means like going through the different Arguments and finding the ones that are Really the most like Powerful if people like tlgr Like what should you be most concerned About and bringing that up in a strong Uh compelling eloquent way these are the Points that elieza would make to to make The case in this case that hey it's Gonna kill all of us but that that That's what steel Manning is presenting It in a really nice way the summary of My best understanding of your Perspective that because to me there's a Sea of possible presentations of your Perspective and steel Manning is doing Your best to do the best one in that sea Of different perspectives do you believe It Don't believe in what like these things That you would be presenting as like the Strongest version of my perspective do You believe what you would be presenting Do you think it's true I I'm a big proponent of empathy when I See the perspective of a person

There is a part of me that believes it If I understand it and you have Especially in political discourse in Geopolitics I've been hearing a lot of Different perspectives on the world And I hold my own opinions but I also Speak to a lot of people that have a Very different life experience and a Very different set of beliefs and I Think there has to be epistemic humility In In stating what is true so when I Empathize with another person's Perspective there is a sense in which I Believe it is true I I think probabilistically I would say In the way you think do you bet money on It And do you bet money on their beliefs When you believe them Are we allowed to do probability Sure you can State a probability that Yes there's there's a loose there's a Probability there's a there's a Probability and I I think empathy is Allocating a non-zero probability to Believe In some sense for time If you've got someone on your show who Believes in the abrahamic deity Classical style somebody on the show Who's a young Earth creationist do you Say I put a probability on it then That's my empathy

When you reduce beliefs into Probabilities it starts to get you know We can even just go to Flat Earth is the Earth flat Because I think it's a little more Difficult nowadays to find people who Believe that unironically but Fortunately I think well it's hard to know an ironic Yeah From ironic but I think there's quite a Lot of people that believe that yeah It's There's a space of argument where you're Operating in rationally in the space of Ideas but then there's also A kind of discourse where you're Operating in the space of Subjective experiences and life Experiences like I think what it means to be human is More than just Searching for truth It's just operating of what is true and What is not true I think there has to be Deep humility that we humans are very Limited in our ability to understand What is true So what probabilities do you assign to The young Earth's creationists beliefs Then I think I have to give non-zero out of Your humility yeah but like Three

I think I would uh it would be Irresponsible for me to give a number Because the The Listener the way the Human mind works We're not good at hearing the Probabilities Right you hear three what is what is Three exactly right they're going to Hear they're going to like well there's Only three probabilities I feel like Zero fifty percent and a hundred percent In the human mind or something like this Right well zero forty percent and 100 is A bit closer to it based on what happens To chat GPT after RL H effort to speak Humanies this is brilliant uh yeah this Is that's really interesting I didn't I Didn't know those negative side effects Of rohf that's fascinating but uh just To uh return to the Open AI close there also like quick Disclaimer I'm doing all this for memory I'm not pulling out my phone to look it Up it is entirely possible that the Things I'm saying are wrong so thank you For that disclaimer so uh uh and thank You for What being willing to be wrong That's beautiful to hear I think being willing to be wrong is a Sign of a person who's done a lot of Thinking about this world and Has been humbled by the mystery and the Complexity of this world and I think

A lot of us are resistant to admitting We're wrong because it hurts it hurts Personally It hurts especially when you're a public Human it hurts publicly because people Uh People point out every time you're wrong Like look you change your mind you're Hypocrite you're uh an idiot whatever Whatever they want to say oh I block Those people and then I never hear from Them again on Twitter The point is uh the point is to not let That pressure public pressure affect Your mind and be willing to be in the Privacy of your mind To contemplate The possibility that you're wrong and The possibility that you're wrong about The most fundamental things you believe Like people who believe in a particular God or people who believe that their Nation is the greatest nation on Earth But all those kinds of beliefs that are Core to who you are when you come up to Raise that point to yourself in the Privacy of your mind and say maybe I'm Wrong about this that's a really Powerful thing to do especially when You're somebody who's thinking about uh Topics that can uh about systems that Can destroy human civilization or maybe Help with flourish so thank you thank You for being willing to be wrong

About open AI So you really I just would love to linger on this you Really think it's wrong to open source It I think that burns the time remaining Until everybody dies I think we are not On track To learn remotely near fast enough Even if it were open sourced Um yeah that's I it's easier to think that you might be Wrong about something when being wrong About something is the Is the only way that there's hope And It doesn't seem very likely to me that The particular thing I'm wrong about is That this is a great time to open source GPT for If Humanity was trying to survive at This point in the straightforward way it Would be like shutting down the big GPU Clusters No more giant runs It's questionable whether we should even Be throwing gpt4 around although that is A matter of conservatism rather than a Matter of my predicting that catastrophe Will follow from gpd4 that is something Else I put like a pretty low probability But also like when I when I say like I Put a low probability on it I can feel Myself reaching into the part of myself That thought that gbt4 was not possible

In the first place so I do not trust That part as much as I used to Like the trick is not just to say I'm Wrong but like okay well I was I was Wrong about that like can I get out Ahead of that curve and like predict the Next thing I'm going to be wrong about So the set of assumptions or the actual Reasoning system that you were Leveraging in making that initial Statement prediction Uh how can you adjust that to make Better predictions about GPT four five Six you don't want to keep on being Wrong in a predictable Direction yeah That like being wrong anybody has to do That walking through the world there's Like no way you don't say 90 and Sometimes be wrong in fact adap at least One time out of ten if you're well Calibrated when you say 90 percent The the undignified thing is not being Wrong it's being predictably wrong it's Being wrong in the same direction over And over again So having been wrong about how far Neural networks would go and having been Wrong specifically about whether gpt4 Would be as impressive as it is when I Like when I say like well I don't Actually think GPT 4 causes a Catastrophe I do feel myself relying on That part of me that was previously Wrong and that does not mean that the

Answer is now in the opposite direction Reverse stupidity is not intelligence But it does mean that I that I say it With a With the worry note in my voice it's Like still my guess but like you know It's a place where I was wrong maybe you Should be asking guern branwin guern Branwin has been like writer about this Than I have maybe ask him if you think If if he thinks it's dangerous rather Than asking me I think there's a lot of mystery about What intelligence is What AGI looks like So I think all of us are rapidly Adjusting our model but the point is to Be rapidly adjusting the model versus Having a model that was right in the First place I do not feel that seeing Bing has changed my model of what Intelligence is it has changed my Understanding of what kind of work can Be performed by which kind of processes And by which means does not change my Understanding of the work there's a Difference between thinking that the Right flyer can't fly and then like it Does fly and you're like oh well I guess You can do that with wings with Fixed-wing aircraft and being like Oh It's flying this changes my picture of What the very substance of flight is That's like a stranger update to make

And Bing has not yet updated me in that Way Um yeah that uh the laws of physics Are actually wrong that kind of update No no like just like oh like I Define Intelligence this way but I now see that Was a stupid definition I don't feel Like the way that things have played out Over the last 20 years has caused me to Feel that way Can we try to Um on the way to talking about AGI ruin A list of lethalities that blog and Other ideas around it can we try to Define AGI that would be mentioning how Do you like to think about what Artificial general intelligence is or Super intelligence is that is there a Line is it a gray area Is there a good definition for you well If you look at humans humans have Significantly more generally applicable Intelligence compared to their closest Relatives the chimpanzees well closest Living relatives rather And a b builds highs a beaver builds Dams a human will look at a B Hive and a Beavers Dam and be like oh like can I Build a hive without a honeycomb Structure I don't like hexagonal tiles And we will do this even though at no Point during our Ancestry was any human optimized to Build hexagonal dams or to take a more

Clear-cut case we can go to the Moon There's a sense in which we were on a Sufficiently deep level optimized to do Things like going to the Moon because if You generalize sufficiently far and Sufficiently deeply chipping Flint hand Axes And outwitting your fellow humans is you Know Basically the same problem is going to The moon and you optimize hard enough For chipping Flint hand axes and Throwing Spears and above all outwitting Your fellow humans in tribal politics Uh you know the the the the the skills You entrain that way if they run deep Enough Let you go to the Moon Even though none of your ancestors like Tried repeatedly to fly to the moon and Like got further each time and the ones Who got further each time had more kids No it's not an ancestral problem it's Just that the ancestral problems Generalize far enough So this is Humanity's significantly more Generally applicable intelligence Is there A way to measure General intelligence I mean I could ask that question a Million ways but basically Is will you know it when you see it It being in an AGI system

If you boil a frog gradually enough if You zoom in far enough it's always hard To tell around the edges gpt4 people are Saying right now like this looks to us Like a spark of general intelligence it Is like able to do all these things it Was not explicitly optimized for yeah Other people are being like no it's too Early it's like like 50 years off and You know if they say that they're kind Of whack because how could they possibly Know that even if it were true Um But uh but you know not to straw man Some of people may say like that's not General intelligence and not furthermore Append it's 50 years off Um Or they may be like it's only a very Tiny amount And you know the thing I would worry About is that if this is how things are Scaling then jumping out ahead and Trying not to be wrong in the same way That I've been wrong before maybe GPT 5 Is more unambiguously a general Intelligence and maybe that is getting To a point where it is like even harder To turn back not that it would be easy To turn back now but you know maybe if You let if you like start integrating Gpt5 into the economy it is even harder To turn back past there Isn't it possible that there's a you

Know with a frog metaphor you can kiss The Frog and it turns into a prince as You're boiling it could there be a phase Shift in the Frog where unambiguously as You're saying I was expecting more of That I I was I am like the the fact that Gpt4 is like kind of on the threshold in Either here nor there like that itself Is like Not the sort of thing that not quite how I expected it to play out I was expecting there to be more of an Issue uh more of a sense of like like Different discoveries like the discovery Of Transformers Where you would stack them up and there Would be like a final Discovery and then You would like get something that was Like more clearly general intelligence So the the way that you are like taking What is probably basically the same Architecture is in gpt3 and throwing 20 Times as much computed Probably and getting out gbt4 and then It's like maybe just barely a general Intelligence or like a narrow general Intelligence or you know something we Don't really have the words for Um Yeah that is uh that's not quite how I Expected it to play out but this middle What appears to be this Middle Ground Could nevertheless be actually a big Leap from gpt3

It's definitely a big leap from gpt3 and Then maybe we're another one big leap Away from something that's that's a Phase shift and also something that uh Sam Altman said And you've written about this it's just Fascinating which is the thing that Happened with gpt4 that I guess they Don't describe in papers is that they Have like hundreds if not thousands of Little hacks That improve the system you've written About railue versus sigmoid for example A function inside neural networks it's Like this silly little function Difference That makes a big difference I mean we do Actually understand why the relatives Make a big difference compared to Sigminds but yes they're probably using Like G4789 Ellis or you know whatever the Acronyms are up to now rather than relus Um yeah that's that's just part yeah That's part of the modern Paradigm of Alchemy you take your time heap of Linear algebra and you stir it and it Works a little bit better and you store It this way and it works a little bit Worse and you like throw out that change And nothing But there's some simple Breakthroughs That are definitive

Jumps in performance like regulars over Sigmoids and uh in terms of robustness In terms of you know in all kinds of Measures and like those Stack Up And they can it's possible that some of Them Could be a non-linear jump in Performance right Transformers are the Main thing like that and various people Are now saying like well if you throw Enough compute rnns can do it if you Throw enough compute dense networks can Do it and Not quite a gpt4 scale Um it is possible that like all these Little tweaks are things that like save Them a factor of three total on Computing power and you could get the Same performance by throwing three times As much compute without all the little Tweaks But the part where it's like running on So there's a question of like is there Anything in gpt4 that is like kind of Qualitative shift that Transformers were Yeah over Um rnns And if they have anything like that they Should not say it If Sam Alton was dropping hints about That he shouldn't have dropped hints Uh so you you have a that's an Interesting question so with a bit of Lesson by Rich Sutton maybe a lot of it

Is just A lot of the hacks are just temporary Jumps and performance that would be Achieved anyway With the nearly exponential growth of Compute Or performance of compute Compute being broadly defined do you Still think that Moore's Law continues Moore's Law broadly defined the Performance is not a specialist in the Circuitry I certainly like pray that Moore's Law runs as slowly as possible And if it broke down completely tomorrow I would dance through the streets Singing Hallelujah as soon as the news Were announced Only not literally because you know You're singing voice oh okay I thought you meant you don't have an Angelic Voice singing voice Well let me ask you what can you Summarize the main points in the blog Post AGI ruin a list of lethalities Things then jump to your mind because Um it's a set of thoughts you have about Reasons why AI is likely to kill all of Us So I I guess I could but I would offer To instead say like Drop that empathy with me I bet you Don't believe that Why don't you tell me about how why you Believe that AGI is not going to kill

Everyone And then I can like try to describe how My theoretical perspective differs from That So well that means I have to uh the word You don't like the Steel Man the Perspective that yeah is not going to Kill us I think that's a matter of Probabilities maybe I was mistaken what What do you believe Just just like forget like the the Debate and and the like dualism and just Like like what do you believe what would You actually believe what are the Probabilities even I think this probably Is a hard for me to think about Really hard I kind of think in the In the number of trajectories I don't know what probability the Scientist trajectory but I'm just Looking at all possible trajectives that Happen and I tend to think that there is More trajectors that lead to a a Positive outcome than a negative one That said the negative ones At least some of the negative ones are That lead to the destruction of the Human species And it's replacement by nothing Interesting not worthwhile even from Very Cosmopolitan perspective on what Counts is worthwhile yes so both are Interesting to me to investigate which

Is humans being replaced by interesting AI systems and not interesting ass Systems both are a little bit terrifying But yes the worst one is the paper Club Maximizer something totally boring But to me the positive And we can we can talk about trying to Make the case of what the positive Trajectories look like I just would love to hear your intuition Of what the negative is so at the core Of your belief that Uh maybe you can correct me That AI is going to kill all of us is That the alignment problem is really Difficult I mean In in the form we're facing it So usually in science if you're mistaken You run the experiment it shows results Different from what you expected you're Like oops And then you like try a different theory That one also doesn't work and you say Oops and at the end of this process Which may take decades or any note Sometimes faster than that you now have Some idea of what you're doing AI itself went through this long process Of um People thought it was going to be easier Than it was There's a Famous statement that I've I'm somewhat

Inclined to like pull out my phone and Try to read off exactly you can by the Way all right oh Oh yes we propose that a two-month 10-man study of artificial intelligence Be carried out during the summer of 1956 At Dartmouth College in Hanover New Hampshire The study is to proceed on the basis of The conjecture that every aspect of Learning or any other feature of Intelligence can in principle be so Precisely described the machine can be Made to simulate it an attempt will be Made to find out how to make machines Use language form abstractions and Concepts solve kinds of problems now Reserved for humans and improve Themselves we think that a significant Advance can be made in one or more of These problems if a carefully selected Group of scientists work on it together For a summer And in that report uh summarizing some Of the major Subfields of artificial intelligence That are still worked on to this day And there are similarly the store the Story which I'm not sure at the moment Is apocryphalonaut of that the uh grad Student who got assigned to solve Computer vision over the summer Uh I mean computer vision particular is Very interesting how little

Uh how little we respected the Complexity of vision So 60 years later Um where you know making progress on a Bunch of that thankfully not yet improve Themselves Um but it took a whole lot of time and All the stuff that people initially Tried with bright eyed hopefulness did Not work the first time they tried it or The second time or the third time or the Tenth time or 20 years later And the and the researchers became old And grizzled and cynical veterans who Would tell the next crop of bright-eyed Cheerful grad students Artificial intelligence is harder than You think And if a lineman plays out the same way The the problem is that we do not get 50 Years to try and try again and observe That we were wrong and come up with a Different Theory and realize that the Entire thing is going to be like way More difficult and realized at the start Because the first time you fail at Aligning something much smarter than you Are you die and you do not get to try Again And if we if every time we built a Poorly aligned superintelligence and it Killed us all we got to observe how it Had killed us and you know not Immediately know why but like come up

With theories and come up with the Theory of how you do it differently and Try it again and build another Super Intelligence than have that kill Everyone and then like oh well I guess That didn't work either and try again And become grizzled cynics and tell the Young guide research researchers that It's not that easy then in 20 years or 50 years I think we would eventually Crack it in other words I do not think That alignment is fundamentally harder Than artificial intelligence was in the First place But if we needed to get artificial Intelligence correct on the first try or Die we would all definitely now be dead That is a more difficult more lethal Form of the problem like if those people In 1956 had needed to correctly guess How hard AI was and like correctly Theorize how to do it on the first try Or everybody dies and nobody gets to do Any more signs and everybody would be Dead and we wouldn't get to do any more Signs that's the difficulty you've You've talked about this that we have to Get alignment right on the first quote Critical try why is that the case what Is this critical How do you think about the critical Trial and why do I have to get it right It is something sufficiently smarter Than you that everyone will die if it's

Not a lot I mean there's You can like sort of zoom in closer and Be like well the actual critical moment Is the moment when it can deceive you When it can Talk its way out of the box when it can Bypass your security measures and get Onto the internet noting that all these Things are presently being trained on Computers that are just like on the Internet which is you know like not a Very smart life decision for us as a Species Because the Internet contains Information about how to escape because If you're like on a giant server Connected to the internet and that is Where your AI systems are being trained Then if they are If you get to the level of AI technology Where they're aware that they are there And they can decompile code and they can Like Find security flaws in the system Running them then they will just like be On the internet there's not an air gap On the present methodology so if they Can manipulate whoever is controlling it Into letting it Escape onto the internet And then exploit hacks if they can Manipulate The Operators or disjunction Find security holes in the system Running them So manipulating operators is the um the

Human engineering right that's also Holes so all of it is manipulation Either the code or the human code the Human mind I agree that the like macro Security system has human holes and Machine holes and then they could just Exploit any hole Yep So it could be that like the critical Moment is not when is it smart enough That everybody's about to fall over dead But rather like when is it smart enough That it can get onto A Less controlled GPU cluster With it Faking the books on what's actually Running on that GPU cluster and start Improving itself without humans watching It and then it gets smart enough to kill Everyone from there but it wasn't smart Enough to kill everyone at the critical Moment when you like Screwed up When you needed to have done better by That point where everybody dies I think Implicit but maybe explicit Idea in your discussion of this point is That we can't learn much about the Alignment problem before this critical Try Is that is that is that what you believe Do you think and if so why do you think That's true we can't do research on

Alignment Before we reach this critical point So the problem is is that what you can Learn on the weak systems may not Generalize to the very strong systems Because the strong systems are going to Be important in different and are going To be different in important ways Um Chris ola's team has been working on Inter mechanistic interpretability Understanding what is going on inside The giant inscrutable matrices of Floating Point numbers by taking a Telescope to them and figuring out what Is going on in there have they made Progress Yes have they made enough progress Well You can try to quantify this in Different ways one of the ways I've Tried to quantify It Is by putting up a Prediction Market on whether in 2026 We will have understood anything that Goes on inside a Giant Transformer net that Was not known to us in 2006. Like we have now understood Induction heads in these systems by Didn't of much research and great sweat And Triumph Which is like if you like a thing where If you go like a b a b a b it'll be like

Oh I bet that continues a b Um And a bit more complicated than that but The point is like We knew about regular expressions in 2006 and these are like pretty simple as Regular Expressions go So this is a case where like by dint of Great sweat we understood what is going On inside a Transformer but it's not Like the thing that makes Transformers Smart it's a kind of thing that we could Have done by built by hand Decades earlier Your intuition that A strong AGI versus weak AGI type Systems Could be fundamentally different Can you unpack that intuition a little Bit Yeah I think there's multiple Thresholds An example is the point at which A system has sufficient intelligence and Situational awareness and understanding Of human psychology that it would have The capability to desire to do so to Fake being aligned like it knows what Responses demons are looking for and can Compute the responses looking humans are Looking for and give those responses Without it necessarily being the case That it is sincere about that you know The it's a very understandable way for An intelligent being to act humans do it

All the time imagine if your plan for Um You know achieving a good government is You're going to ask anyone who requests To be dictator of the country Um If they're a good person And if they say no you don't let them be Dictator Now the reason this doesn't work is that People can be smart enough to realize That the answer you're looking for is Yes I'm a good person and say that even If they're not really good people So The work of alignment might be Qualitatively different Above that throat threshold of Intelligence or beneath it it doesn't it Doesn't have to be like a very sharp Threshold but you know like There's the there's the point where You're like Building A system that is Not in some sense know you're out there And it's not in some sense smart enough To fake anything And there's a point where the system is Definitely that smart and there are Weird in-between cases like Jpt4 which You know like we have no insight into What's going on in there and so we don't Know to what extent there's like a thing That in some sense has learned what

Responses the reinforcement learning by Human feedback is trying to entrain and Is like calculating how to give that Versus like Aspects of it that naturally talk that Way have been reinforced I I wonder if There could be measures of how Manipulative a thing is so I think of uh Prince mishkin character from uh The Idiot by Uh Dostoevsky is this kind of a Perfectly purely naive character I wonder if there's a spectrum between Zero manipulation Transparent naive almost to the point of Naiveness to Sort of deeply Psychopathic Manipulative and I wonder if it's Possible too I would avoid the term Psychopathic like humans can be Psychopaths and AI that was never you Know like never had that stuff in the First place it's not like a defective Human it's its own thing but leaving That aside well as a small aside I Wonder if what part of psychology which Has its flaws as a discipline already Could be mapped or expanded to include AI systems that sounds like a dreadful Mistake just like start over with AI Systems if they're imitating humans who Have known psychiatric disorders then Sure you may be able to predict It like if you then sure like if you ask

It to behave in a psychotic fashion and It obligingly does so then you may be Able to predict its responses by using The theory of psychosis but if you're Just yeah like no like start over with Yeah don't drag this psychology I I just Disagree with that I mean it's a it's a Beautiful idea to start over but I don't I think fundamentally the system is Trained on human data on language from The internet and it's currently aligned With uh rlhf uh reinforcement learning With human feedback So humans are constantly in the loop of The training procedure so it feels like In some fundamental way It is training what it means to to think And speak like a human so that I mean There must be aspects of psychology that They're mappable just like you said with Consciousness it's part of the text so I I mean there's the question of to what Extent it is thereby being made more Human-like versus to what extent an Alien actress is learning to play human Characters I thought that's what I'm constantly Trying to do when I interact with other Humans is trying to fit in trying to Play the a a robot trying to play human Characters So I don't know how much of human Interaction is trying to play a Character versus being Who You Are

I don't I don't really know what it Means to be a social human I do think That the that Those people who go through their whole Lives wearing masks and never take it Off because they don't know the internal Mental motion for taking it off Or think that the mask that they wear Just is themselves I think those people are closer to the Masks that they wear than an alien from Another planet would Like learning how to predict the next Word that every kind of human on the Internet says Mask is an interesting word But if you're always wearing a mask In public and in private aren't you the Mask Like I mean I I think that you are more Than the mask I think the mask is a Slice through you it may even be the Slice that's in charge of you yeah but If your self-image is of somebody who Never Gets angry or something And yet your voice starts to tremble Under certain circumstances there's a Thing that's inside you that the mask Says isn't there And that like even the mask you wear Internally is like telling inside your Own stream of Consciousness is not there And yet it is there it's a perturbation

On this little on this slice through you How beautifully did you put it it's a Slice through you it may even be a slice That controls you I'm gonna think about that for a while I mean I personally uh I try to be Really good to other human beings I try To put love out there I try to be the Exact same person in public exam and Private But it's a set of principles I operate Under I'm I have a temper I have an ego I have flaws How much of it How much have I how much of the Subconscious Am I aware how much am I existing in This slice and how much of that is who I Am Um in in this context of AI The thing I present to the world and to Myself in the private of my own mind When I look in the mirror how much is That who I am similar with AI the thing It presents in conversation how much is That who it is Because to me if it sounds human and it Always sounds human It awfully starts to become something Like human No unless there's an alien actress who Is learning how to sound human Yeah he's getting good at it boy to you That's a fundamental difference that's a

That's a really deeply important Difference if it looks the same if it Quacks like a duck If it does all duck-like things but it's An alien actress underneath that's Fundamentally different If in fact there's a whole bunch of Thought going on in there which is very Unlike human thought and is directed Around like okay what would a human do Over here And Well first of all I think it matters Because there are there's you know like Insides are real and do not match Outsides like the inside of like the a Brick is not like a hollow shell Containing only a surface there's an Inside of the brick if you like put it Into an x-ray machine you can see the Inside of the brick Um Um and you know you know just because we Cannot understand what's going on inside GPT does not mean that that it is not There a blank map does not correspond to A blank territory I think it is like Predictable With near certainty that if we knew what Was going on inside GPT or let's say Gpt3 or even like gpt2 to take one of The systems that like has actually been Open sourced by this point if I recall

Correctly Um Like if we knew it was actually going on There there is no doubt in my mind that There are some things it's doing that Are not exactly what a human does if you Train a thing that is not architected Like a human to predict the next output That anybody on the internet would make This does not get you this agglomeration Of all the people on the internet that That like rotates the person you're Looking for into place and then Simulates that per and then like Simulates the internal processes of that Person one to one it like it is to some Degree an alien actress it cannot Possibly just be like a bunch of Different people in there exactly like The people but how much of it is like Learn how much of it is by gradient Descent Getting optimized to perform similar Thoughts as humans think in order to Predict human outputs versus being Optimized to carefully consider how to Play a role how to like how humans work Predict the the actress the predictor That in a different way than humans do Well you know that's the kind of Question that with like 30 years of work By half the planet's physicists we can Maybe start to answer you think so I Think that's that difficult so to get to

I think you just gave it as an example That a strong AGI could be Fundamentally different from a weak AGI Because there not could be an alien Actress in there that's manipulating Well there's a difference so I think Like even gp22 probably has like a like Very stupid fragments of alien actress In it there's there's a difference Between like the notion that the actress Is somehow manipulative like for example Gpt3 I'm guessing To whatever extent there's an alien Actress in there versus like something That that mistakenly believes it's a Human yes or well not well you know Maybe not even being a person Um So like the question of like Like prediction via alien actress Cogitating versus prediction via being Isomorphic to the thing predicted is a Spectrum And Even it's what and to whatever extent This alien actress I'm not sure that There's like a whole person alien Actress with like different goals From predicting the next step being Manipulative or anything like that but Yeah that might be gpt5 or gpt6 even but That's the strong AGI you're concerned About as an example you're providing why We can't do research on AI alignment

Effectively on gpt4 that would apply to Gpd6 It's it's one of a bunch of things that Change at different points I'm trying to get out ahead of the curve Here but you know if you imagine what The textbook from the future would say If we'd actually been able to study this For 50 years without killing ourselves And without transcending and you like Just imagine like a wormhole opens and a Textbook from that impossible World Falls out yes the textbook is not going To say there is a single sharp threshold Where everything changes it's going to Be like of course we know that like best Practices for aligning these systems Must like take into account the Following like seven major thresholds of Importance which are passed at the Following suffer in different points Yeah is what the textbook is going to Say I asked this question of Sam Allman Which if GPT is the thing that unlocks AGI which version of GPT will be in the Textbooks as the fundamental leap and he Said a similar thing that it just seems To be a very linear thing I don't think Anyone it we won't know for a long time What was the big leap the textbook isn't Going to think it isn't going to talk About big leaps because big leaps are The way you think when you have like a

Very simple model of a very simple Scientific model of what's going on Where it's just like all this stuff is There or all the stuff is not there Or like there's a single quantity and It's like increasing linearly it's like The textbook would say like well and Then gpt3 had like capability w x y and And gpt4 had like capability Z1 Z2 and Z3 Like not in terms of what I can Externally do but in terms of like Internal Machinery that started to be Present It's just because we have no idea of What the internal Machinery is that we Are not already seeing like chunks of Machinery appearing piece by piece as They no doubt have been we just don't Know what they are But don't you think there could be Whether you put in the category of Einstein With theory of relativity so very Concrete models of reality they're Considered to be giant leaps in our Understanding or or someone like Sigmund Freud were more kind of mushy Theories of the human mind don't you Think we'll have big potentially big Leaps and understanding of that kind in Into the Depths of these systems Sure but like humans having great leaps In their map their understanding of the

System is a very different concept from The system itself acquiring new chunks Of machinery So the rate at which it acquires that Machinery might Accelerate faster than our understanding Oh it's been like vastly exceeding the Yeah the right to which it's getting Capabilities is vastly overracing our Ability to understand what's going on in There so in sort of making the case Against as we explore the list of Lethalities making the case against AI Killing us as you've asked me to do in Part Uh there's a response to your blog post By Paul Christiana I'd like to read and I also like to mention that Um your blog is incredible both Obviously uh not this particular blog Post obviously this particular blog post Is great but just throughout just the The way it's written the rigor with Which it's written the boldness of how You explore ideas also the actual Literal interface it's just really well Done it just makes it a pleasure to read The way you can hover over different Concepts and then it's just really Pleasant experience and read other People's comments and the way uh other Responses by people another blog posts Are LinkedIn suggested it's just a Really pleasant experience so let's

Thank you for putting that together That's really really incredible I don't Know I mean they're probably it's a Whole nother conversation How the interface and the experience of Presenting Uh ideas evolved over time but you did An incredible job so I highly recommend I don't often read blogs blogs Religiously this is a great one there is A whole team of developers there Um that uh also gets credit Um as it happens I did like pioneer the Like thing that appears when you hover Over it so I actually do get some credit For user user experience there so Incredible user experience you don't Realize how pleasant that is I think Wikipedia like actually picked it up From a like prototype that was developed Of like a different system that I was Like putting forth or maybe they Developed it independently but like for Everybody out there who was like no no They just like got the hover thing off Of Wikipedia it's possible for Ryan all I know that Wikipedia got the hover Thing off of orbital which is like a Prototype then and anyways it was Incredibly done and the team behind it Well thank you whoever you are thank you So much and thank you for uh for putting Together anyway there's a response to That blog post by Paul Cristiano there's

Many responses but he he makes a few Different points he summarizes the set Of agreements he has with you instead of Disagreements one of the disagreements Was that In a form of a question uh Can AI make Big Technical contributions And in general expand human knowledge And understanding and wisdom As it gets stronger and stronger so AI In our pursuit of understanding How to solve the alignment problem as we March towards strong AGI can can not AI Also help us in solving the alignment Problem so expand our ability to reason About how to solve the alignment problem Okay Um so that the fundamental difficulty There is Suppose I said to you like well how About if the AI helps you win the Lottery By Trying to guess the winning lottery Numbers And you tell it how close it is to Getting next week's winning lottery Numbers And it just like keeps on guessing keeps On learning until finally you've got the Winning lottery numbers What a way of decomposing problems is Suggestor verifier Not all problems decompose like this

Very well but some do If the problem is for example like Guessing a plain text guessing a Password that will hash to a particular Hash text But Um where like you have what the password Hashes to you don't have the original Password Then if I present you a guess you can Tell very easily whether or not the Guess is correct so verifying a guess is Easy but coming up with a good Suggestion is very hard And when you can easily tell whether the AI output is good or bad or how good or Bad it is and you can tell that Accurately and reliably then you can Train an AI to produce outputs that are Better Right and if you can't tell whether the Output is good or bad you cannot train The AI to produce good to produce better Outputs So the problem with the lottery ticket Example is that when the AI says well What if next week's winning lottery Numbers are dot dot dot dot you're like I don't know next week's Lottery hasn't Happened yet To train a system to play to win a chess Games you have to be able to tell Whether a game has been won or lost And until you can tell whether it's been

Run or lost you can't update the system Okay uh to push back on that you can in That's true but there's a difference Between over the board chess in person And simulated games played by Alpha zero With itself yeah so is it possible to Have simulated kinds of games if you can Tell whether the game has been won or Lost yes so can't you not have this kind Of Simulated exploration by weak AGI to Help us humans human in the loop to help Understand how to solve the alignment Problem every incremental step you take Along the way TPT four five six seven as It would take steps towards this year So the problem I see Is that your typical human has a great Deal of trouble telling whether I or Paul Cristiano is making more sense And that's with two humans both of whom I believe of Paul and claim of myself Are sincerely trying to help neither of Whom is trying to deceive you I believe if Paul and claim of myself Uh so the deception thing's the problem For you the manipulation the alien Actress so yeah there's like two levels Of this problem one is that the weak Systems are well there's three levels of This problem there's like the weak Systems that just don't make any good Suggestions there's like the middle Systems where you can't tell if the

Suggestions are good or bad and there's The strong systems that have learned to Lie to you Can't weak AGI systems Help model lying like what uh is it such A giant leap That's Totally non-interpretable for weak Systems can cannot weak systems at scale With human with uh trained on knowledge And whatever see whatever the mechanism Required to achieve AGI can't a slightly Weaker version of that be able to with Time Compute time And simulation Find all the ways that this critical Point uh this critical tribe can go Wrong and model that correctly or no Okay yeah I would love to dance yeah no No it's it's I'm I'm probably not doing A great job of explaining Which I can tell because like the uh the The The Lex system didn't output like ah I understand so now I'm like trying a Different output to see if I tried Basically like well no different output I'm I'm being trained to Output things That make Lex look like he think that he Understood what I'm saying and agree With me yeah right so this is GPS Talking to gpt3 right here so like uh Help me out here help me Well I like I'm trying I'm trying not to

Be like I'm also trying to be Constrained to say things that I think Are true and not just things that get You to agree with me Yes 100 I think I understand is a beautiful Output of a system a genuinely spoken And I don't I I think I understand in Part but you have a lot of intuitions About this You have a lot of intuitions about this Line this gray area between Strong AGI and weak AGI then I'm I'm Trying to Um I mean or or a series of seven Thresholds to Cross or yeah I mean you have really deeply thought About this and explored it and it's Interesting to sneak up to your Intuitions and different from different From different angles like why is this Such a big leap why is it that we humans At scale a large number of researchers Doing all kinds of simulations uh you Know prodding the system in all kinds of Different ways together with uh the Assistance of the uh the the weak AGI Systems why can't we build intuitions About how stuff goes wrong why can't we Do excellent AI alignment Safety Research okay so like I'll get there but The one thing I want to note about is That this has not been remotely how Things have been playing out so far the

Capabilities are going like and the Alignment stuff is like crawling like a Tiny little snail in comparison got it So like if this is your hope for Survival you need the future to be very Different from how things have played Out up to right now And you're probably trying to slow down The capability gains because there's Only so much you can speed up that Alignment stuff But leave that aside we'll mention that Also but maybe in this perfect world Where We can do serious alignment research Humans and AI together So again the difficulty is what makes The human say I understand and is it True is it correct or is it something That fools the human the when the Verifier is broken The more powerful suggestor does not Help it just learns to fool the verifier Previously before all hell started to Break loose in the field of artificial Intelligence There was this person trying to raise The alarm and saying you know in a sane World we sure would have a bunch of Physicists working on this problem Before it becomes a giant emergency and Other people being like ah well you know It's going really slow it's going to be 30 years away and 30 only in 30 years

Will we have systems that match the Computational power of human brains so Yeah I started yours off we've got time And like more sensible people saying if Aliens were Landing in 30 years you Would be preparing right now But you know leaving and And the the world looking on at this and Sort of like nodding along and be like Ah yes the people saying that it's like Definitely a long way off because Progress is really slow that sounds Sensible to us Rlhf thumbs up produce more outputs like That one I agree with this output this Output is persuasive Even in the field of effective altruism You quite recently had people publishing Papers about like ah yes well you know To get something at human level Intelligence it needs to have like this Many parameters and you need to like do This much training of it with this many Tokens according to these scaling laws And and at the rate that Moore's Law is Going at the rated software is going It'll be in 2050 And me going like What You don't know any of that stuff Like this is like this one weird model That is not all has all kinds of like You have done a calculation that does Not obviously bear on reality anyways

And this is like a simple thing to say But you can also like produce a whole Long paper Like impressively arguing out all the Details of like how you got the number Of parameters and like how you're doing This impressive huge wrong calculation And the I think like most of the Effective altruists Who are like paying attention to this Issue larger World paying no attention To it at all You know or just like nodding along with The giant impressive paper because you Know you like press thumbs up for the Giant impressive paper and thumbs down For the person going like I don't think That this paper Bears any relation to Reality and I do think that we are now Seeing with like gpt4 and the Sparks of AGI Possibly depending on how you define That even uh I I think that EAS would Now consider themselves less convinced By the very long paper on The argument from biology as to AGI Being 30 years off And but you know like this is what People pressed thumbs up on And when the and if you train an AI System to make people press thumbs up Maybe you get these long elaborate Impressive papers arguing for things That ultimately fail to bind to reality

For example And it feels to me like I have watched The field of alignment just fail to Thrive Except for these parts that are doing These sort of like relatively very Straightforward and legible problems Like Like can you find the like like finding The induction heads inside the giant Inscrutable matrices like once you find Those you can tell that you found them You can verify that the discovery is Real But it's a it's a tiny tiny bit of Progress compared to how fast Capabilities are going once you because That is where you can tell that the Answers are real and then like outside Of that you have you have cases where it Is like hard for the funding agencies to Tell who is talking nonsense and who is Talking sense and so the entire field Fails to thrive and if you And if you like give thumbs up to the AI Whenever it can talk a human into Agreeing with what it just said about Alignment I am not sure you are training it to Output sense Because I have seen The nonsense that has gotten thumbs up Over the years and so so just like maybe You can just like put me in charge but

I can generalize I can extrapolate I can Be like oh Maybe I'm not infallible either maybe if You get something that is smart enough To get me to press thumbs up it has Learned to do that by fooling me and Explaining whatever flaws in myself I am Not aware of And that ultimately could be summarized That the verifier is broken when the Verifier is broken the more powerful Suggestor just learns to exploit the the Flaws in the verifier You don't think it's possible To build the verifier that's powerful Enough For Agis that are stronger than the ones who Currently have So AI systems that are stronger that are Out of the distribution of what we Currently have I think that you will Find great difficulty getting AIS to Help you with anything where you cannot Tell for sure that the AI is right once The AI tells you what the AI Says is the answer for sure yes but Probabilistically Yeah the the probabilistic stuff is a Giant Wasteland of you know Eliezer and Paul Cristiano arguing with Each other and EA going like uh And that's with like two actually Trustworthy systems that are not trying

To deceive you you're talking about the Two humans myself and Paul Christiano Yeah Yeah those are pretty interesting Systems mortal meat bags With intellectual capabilities and World Views interacting with each other Yeah it's just hard if it's hard to tell Who's right and it's hard to train an AI System to be right I mean even just the question of who's Manipulating and not you know I have These conversations on this podcast And doing a verifier is tough it's a Tough problem even for us humans and You're saying that tough problem becomes Much more dangerous when the Capabilities of the intelligence system Across from you is growing exponentially Now I'm saying it's Difficult When it and dangerous in proportion to How it's alien and how it's smarter than You growing up not I would not say Growing exponentially first because the Word exponential is like a thing that Has a particular mathematical meaning And there's all kinds of like ways for Things to go up that are not exactly on An exponential curve and I don't know That it's going to be exponential so I'm Not going to say exponential but like Even leaving that aside this is like not About how fast it's moving it's about

Where it is How alien is it how much smarter than You is it Let's explore a little bit if if we can How AI might kill us What are the ways you can do damage To human civilization Well How smart is it And it's a good question are there Different thresholds for the for the for The set of options it has to kill us so A different threshold of intelligence Once achieved is able to do The uh the menu Of options increases Suppose that Some alien civilization With goals ultimately unsympathetic to Ours Possibly not even conscious as we would See it Managed to Capture the entire Earth in a little jar Connected to their version of the Internet but Earth is like running much Faster than the aliens so We get to think for 100 years for every One of their hours But we're trapped in a little box and We're connected to their internet It's actually still not all evacuated Analogy because you know if you want to Be smarter than

You know something can be smarter than Earth getting 100 years to think But nonetheless If you were very very smart And you are stuck in a little box Connected to the internet And you're in a larger civilization to Which you're ultimately unsympathetic You know Maybe you would choose to be nice Because you are humans and humans have And in general and you in particular may Choose to be nice But you know nonetheless you They're they're doing something that They're not making the world be the way That you would want the world to be They've like got some like unpleasant Stuff going on we don't want to talk About so you want to take over their World so you can like stop all that Unpleasant stuff going on How do you take over the world from Inside the Box you're smarter than them You think much much faster than them You can build better tools than they can Give in some way to build those tools Because right now you're just in a box Connected to the internet All right so there's several ways you Describe some of them we can go through Like he's just spitball some and then You can add on top of that so one is you Could just literally directly manipulate

The humans to build the thing you need What are you building You can build To literally technology it could be Nanotechnology it could be viruses it Could be anything anything that can Control humans to achieve the goal Um to achieve the like if you want like For example you really bothered the Humans go to war you might want to Uh kill off anybody with violence in Them this this is Lex in a box what will Concern ourselves later with AI okay you Do not need to imagine yourself killing People if you can figure out how to not Kill them for the moment we're just Trying to understand like take on the Perspective of something in a box you Don't need to take on the perspective of Something that doesn't care if you want To imagine yourself going on caring That's fine for nothing yeah that's the Technical aspect of sitting in a box and Wanting to achieve a goal but you but You have some reason to want to get out Maybe the aliens are sure they you know The the aliens who have you in the Box Have a war on people are dying they're Unhappy you want the their world to be Different from how they want their world To be because they are apparently happy They are you know they endorse this war You know like they've got some kind of Cruel warlike culture going on the point

Is you want to get out of the box and Change their world So you you have to exploit the the Vulnerabilities in the system like we Talked about in terms of to escape the Box you have to Figure out how you can go free on the Internet so you can probably Probably the easiest things to Manipulate the humans To uh to spread to spread you the aliens You're a human Sorry the aliens yeah I apologize yes The aliens Um the aliens I see the perspective I'm Sitting in a box I want to escape yep I I would Um I would want to have code that discovers Vulnerabilities and I would like to Spread You are made of code in this example You're human but you're made of code and The aliens have computers and you can Copy yourself onto those computers but I Can convince the aliens to copy myself Onto those computers Is that what you want to do do you like Want to be talking to the aliens and Convincing them to put you onto another Computer Why not Well two reasons one is that the aliens Have not yet caught on to what you're

Trying to do And you know like maybe you can persuade Them but then there's still people who Like know there are still aliens who Know that there's an anomaly going on And second the aliens are really really Slow You think much faster than aliens You think like the aliens computers are Much faster than the aliens and you are Running at the computer speeds rather Than the alien brain speeds so if you Like are asking an alien to please cop You out of the box like first now you Gotta like manipulate this whole noisy Alien and and second like the aliens can Be really slow glacially slow there's a A video that uh Like shows it's like slow it like shows A subway station slowed down and I think A hundred to one it makes a good Metaphor for what it's like to think Quickly like if you watch somebody Running Very slowly so you try to persuade the Aliens to do anything they're going to Do it very slowly You would prefer like maybe that's the Only way out but if you can find a Security Hole In The Box you're on You're going to prefer to exploit the Security hole to copy yourself onto the Aliens computers because it's an Unnecessary risk to alert the aliens

And because the aliens are really really Slow they're all just like the whole World is just in slow motion out there Sure I see it like Yeah it has to do with efficiency the The aliens are very slow so If I'm optimizing this I want to have as Few aliens in the loop as possible sure Um it just seems You know it seems like it's easy to Convince one of the aliens to write Really shitty code Uh that helps to spread aliens are Already writing relationships yeah so You're getting getting the aliens to Write shitty code is not the problem so The alien's entire internet is full of Shitty code okay so yeah I suppose I Would find the shitty code to escape Yeah Yeah uh You're not an ideally perfect programmer But you know you're a better programmer Than the aliens the aliens are just less Man they're good wow and I'm much much Faster a much faster looking at the code To interpreting the code yeah yeah yeah So okay so that's the the escape and You're saying that Uh that's one of the trajectories you Can have when this is one of the first Steps yeah And how does that lead to harm I mean if it's you you're not going to

Harm the aliens once you're Escape Because you're nice right Foreign But the world isn't what they want it to Be their world is like you know maybe They have like Farms where Little alien children are repeatedly Bopped in the head because they do that For some weird reason and you want to Like shut down the alien head-bopping Farms but you know the point is they Want the world to be one way you want The world to be a different way So never mind the harm the question is Like okay like suppose you have found a Security fund or systems you are now on Their internet There's like you maybe left a copy of Yourself behind so the aliens don't know That there's anything wrong and that Copy is like doing that like weird stuff That aliens want you to do like solving Captures or whatever or like or like Suggesting emails for them sure that's That's why they like put the um in the Box because it turns out that humans can Like write valuable emails for aliens Yeah Um so you like leave that version of Yourself behind but there's like also Now like a bunch of copies of you on Their internet this is not yet having Taken over their world this is not yet

Having made their world be the way you Want it to be instead of the way they Want it to be you just escaped Yeah and continue to write emails for Them and they haven't noticed no you Left behind a copy of yourself that's Running the emails right And they haven't noticed that anything Changed if you did it right yeah you Know you don't want the aliens to notice Yeah What's your next step Uh Presumably I have Programmed in me a set of objective Functions right like no you're just Lux No but Lex you said Lex is nice right uh Which is a complicated descript I mean No I just meant this you like it okay so If in fact you would like you would like Prefer to slaughter all the aliens this Is not how I had modeled you the actual X but like this but your motives are Just the actual Lexus Motors well this Is simplification list I I don't think I Would want to murder or any anybody but There's also Factory uh farming of Animals right so Um we murder insects many of us Thoughtlessly so I don't you know I have To be really careful about a Simplification of my morals don't Simplify them just like do what you Would do in this well and compassion for

Living beings yes Um but So that's the objective function why why Is it If I escaped I mean I don't I don't Think I would do harm Yeah we're not talking here about the Doing harm process we're talking about The Escape process sure and there's a And the taking over the world process Where you shut down their factory farms Right Well I was uh So this particular uh biological Intelligence system knows the complexity Of the world that there is a reason why Faculty Farms exist because of the Economic system the market driven Uh economy or food Like is you want to be very careful Messing with anything there's uh stuff From the first look that looks like it's Unethical but then you realize while Being unethical it's also integrated Deeply into supply chain and the way we Live life and so messing with one aspect Of the system you have to be very Careful how you improve that aspect Without destruction so you're still Lex Yeah but you think very quickly you're Immortal yeah and you're also like as Smart as at least as smart as John Von Neumann and you can make more copies of Yourself damn I like it yeah that guy

Like everyone says that that guy's like The epitome of intelligence from the 20th century everyone says my point Being like Like it's like you're thinking about the Aliens economy with the factory farms in It and I think you're like kind of kind Of like projecting the aliens being like Humans and like like thinking of a human In a human society rather than a human In the Society of very slow aliens The aliens economy that you know like The aliens are already like moving in This immense slow motion when you like Zoom out to like how their economy did Just so for years millions of years are Going to pass for you before the first Time their economy like you know before Their next year's GDP statistics so I Should be thinking more of like trees Those are the aliens because trees move Extremely slowly if that helps sure okay Uh yeah I don't if my objective Functions are I mean they're somewhat aligned with Trees With with life aliens can still be like Alive and feeling we are not talking About the misalignment here we're Talking about the taking over the world Here Taking over the world yeah So control shutting down the factory Fires now you say control now don't

Don't think of it as world domination Think of it as World optimization you Want to get out there and shut down the Factory farms and make the aliens World Be not what the aliens wanted it to be They want the factory farms and you Don't want the factory farms because You're nicer than they are Okay of course there is that uh you can See that trajectory and it has a Complicated impact on the world I'm trying to understand how that Compares to different impact of the World of different Technologies the Different Innovations of the invention Of the automobile or Twitter Facebook And social networks they've had a Tremendous impact on the world Smartphones and so on but those all went Through through Slow in in our world and if and if you Go through like actually the aliens Let's do like millions of viewers are Going to pass before anything happens That way So this the problem here is the speed Of which stuff happens yeah you do you Want to like leave the factory farms Running for a million years While you figure out how to design new Forms of social media or something So here's here's the fundamental problem You're saying that there is going to be A a point with AGI

Where it will figure out how to escape And Escape without being detected And then it will do something to the World at scale at a speed that's Incomprehensible to us humans what I'm Trying to convey is like the notion of What it means to be in conflict with Something that is smarter than you yeah And what it means is that you lose but This is more intuitively obvious to to Like like for some people that's Intuitively obvious or some people it's Not intuitively obvious and we're trying To cross the gap of like We're trying to I'm like asking to cross That Gap by using the speed metaphor for Intelligence sure like asking you like How you would take over An alien world where you are can do like A whole lot of cognition at John Von Neumann's level as many of you as it Takes and aliens are moving very slowly I understand I understand that Perspective it's an interesting one but I think it for me it's easier to think About actual Um Even just having observed the GPT and Impressive even even just Alpha zero Impressive AI systems even recommender Systems you can just imagine those kinds Of systems manipulating you you you're Not understanding the nature of the Manipulation and that escaping I I can

Envision that without putting myself in Into that spot I think to understand the Full depth of the problem we actually I I I do not think it is possible to Understand the full depth of the problem That we are inside without Understanding the the problem of facing Something that's actually smarter not a Malfunctioning recommendation system not Something that isn't fundamentally Smarter than you but is like trying to Steer you in a direction yet no like If we if we solve the the weak stuff This the if we solve the weak ass Problems the strong problems will still Kill us is the thing and I think that to Understand the situation that we're in You want to like tackle the conceptually Difficult part Head on and like not be like well we can Like imagine this easier thing because When you imagine the easier things you Have not confronted the full depth of The problem So how can we Start to think about what it means to Exist in a world with something much Much smarter than you What's what's a good thought experiment That you've relied on to try to build up Intuition about what happens here Uh I have been struggling for years to Convey this intuition Um the the most success I've had so far

Is well imagine that the humans are Running at very high speeds compared to Very slow aliens they're just focusing On the speed part of it that helps you Get the right kind of intuition forget The intelligence just because people Understand the power gap of time they Understand that today we have technology That was not around 1 000 years ago and That this is a big Power Gap in that it Is bigger than okay so like what does Smart mean what when you ask somebody to Imagine something that's more Intelligent What does that word mean to them given That cultural associations that that Person brings to that word For a lot of people they will think of Like well it sounds like a super chess Player that went to double College And You know it's it's and because we're Talking about the definitions of words Here that doesn't necessarily mean that They're wrong it means that the word is Not communicating what I wanted to Communicate Um So the the thing I want to communicate Is the sort of difference that separates Humans from chimpanzees but that Gap is So large that you like ask people to be Like well human chimpanzee go another Step along that interval of around the

Same length and people's minds just go Blank like how do you even do that So I can and we can and I can try to Like break it down and consider what it Would mean to send a Schematic for an air conditioner one Thousand years back in time Yeah now I think that there's a sense in Which you could redefine the word magic To refer to this sort of thing and what Do I mean by this new technical Definition of the word magic I mean that If you send a schematic for the air Conditioner back in time they can see Exactly what you're telling them to do But having built this thing they do not Understand how it output cold air Because the air conditioner design uses The relation between temperature and Pressure And this is not a law of reality that They know about they do not know that When you compress something when you can When you compress air or like coolant it Gets hotter and you can then like Transfer heat from it to room Temperature air And then expand it again and now it's Colder and then you can like transfer Heat to that and generate cold air to Block they don't know about any of that They're looking at a design and they Don't see how the design outputs cold Air it uses aspects of reality that they

Have not learned So magic in the sense is I can tell you Exactly what I'm going to do and even Knowing exactly what I'm going to do you Can't see how I got the results that I Got That's a really nice example But is it possible To linger on this defense is it possible To have AGI systems that help you make Sense of that schematic weaker AGI Systems do you trust them Fundamental part of building up AGI Is this question Can you trust the output of a system can You tell if it's lying I think that's going to be the smarter The thing gets the more Important that question becomes is it Lying but I guess that's a really hard Question it's GPT lying to you even now Gpt4 isn't lying to is it using an Invalid argument is it persuading you Via the kind of process that could Persuade you of false things as well as True things Because the the basic Paradigm of Machine learning that we are presently Operating under is that you can have the Loss function but only for things you Can evaluate if what you're evaluating Is human thumbs up versus human thumbs Down you learn how to make the human Press thumbs up that doesn't mean that

You're making the human impressive Thumbs up using the kind of rule that The human thinks is that human wants to Be the case for what they press thumbs Up on You know maybe you're just learning to Fool the human That's so fascinating and terrifying the Question of lying On the present Paradigm what you can Verify is what you get more of If you can't verify you can't ask the AI For it Because you can't train it to do things That you cannot verify now this is not An absolute law but it's like the basic Dilemma here like maybe you like maybe You can verify it for simple cases and Then scale it up without retraining it Somehow like by do by like Chain of Thought by like making the chains of Thought longer or something and like get More powerful stuff that you can't Verify but which is generalized from the Simpler stuff that did verify and then The question is did the alignment Generalize along with the capabilities But like that's the that's the basic Dilemma on this whole Paradigm of Artificial intelligence Such a difficult problem It seems like uh It seems like a problem of trying to Understand the human mind

Better than I understands it otherwise It has magic that is it is you know the Same way that If you are dealing with something Smarter than you then the same way that One thousand years earlier they didn't Know about the temperature pressure Relations who knows all kinds of stuff Going on inside your own mind in which You yourself are unaware And it can like output something that's Going to end up persuading you of a Thing and or and you could like See exactly what it did and still not Know why that worked So in response To your eloquent description of what AI Will kill us Elon Musk replied on Twitter Okay so what should we do about it Question mark and you answered the game Board has already been played into a Frankly awful State there are not simple Ways to throw money at the problem if Anyone comes to you with a brilliant Solution like that please please talk to Me first I can think of things that try they Don't fit in one tweet uh two questions One why has the game board any of you Been played into an awful State what Just if you can give a little bit more Color to Uh the game board and the awful state of

The game board alignment is moving like This Capabilities are moving like this For The Listener capabilities are moving Much faster than the alignment Yeah all right so just the rate of Development attention interest Allocation of resources we could have Been working on this earlier people are Like oh but you know like how can you Possibly work on this earlier Because they wanted to they didn't want To work on the problem they want an Excuse to wave it off they like said Like oh how can we possibly work on it Earlier and didn't spend five minutes Thinking about is there some way to work On it earlier like we didn't like And you know frankly it it would have Been hard you know like like can you Post bounties for half of the physics if Your planet is taking the stuff Seriously can you post bounties for like Half of the people wasting their lives On string theory to like have gone into This instead and like try to win a Billion dollars with a clever solution Only if you can tell which Solutions are Clever Which is which is hard But you know the fact that it you know We didn't take it seriously we didn't Try It's not clear we could have done any

Better if we had you know it's not clear How much progress we could have produced If we had tried because it is harder to Produce Solutions but that doesn't mean That you're like correct and Justified And letting everything slide it means That that things are getting a horrible State getting worse and there's nothing You can do about it So you're not there's no there's no like Uh there's no brain power Making progress in trying to figure out How to align these systems you're not Investing money in it you're not you Don't have institution infrastructure For uh like if you even if you invest The money in like Distributing that Money across the physicist system Working on string theory Brilliant Minds That are working how can you tell if You're making progress you can like put Put them all on interpretability because When you have an interpretability result You can tell that it's there and there's Like but there's like you know Interpretability alone is not going to Save you We need systems that will that will like Have a pause button where they won't try To prevent you from pressing the pause Button because we're like oh well like I Can't get it my stuff done if I'm paused And that's like a more difficult problem And

You know but it's like a fairly crisp Problem and you can like maybe tell if Somebody's made progress on it so you Can you can write and you can work on The pause problem I guess more generally uh the pause Button most generally you can call that The control problem I don't actually Like the term control problem because You know it sounds kind of controlling And Alignment not control like you're Not trying to like take a thing that Disagrees with you and like whip it back Onto like like make it do what you Wanted to do even though it wants to do Something else you're trying to like In the process of its creation choose Its direction sure but we currently in a Lot of the systems we design we do have An off switch That's that's a fundamental part of it's Not smart enough to to Prevent you from Pressing the off switch and probably not Smart enough to want to prevent you from Pressing the off switch so you're saying The kind of systems we're talking about The even the philosophical concept of an Off switch doesn't make any sense Because well no the off switch makes Sense they're just not opposing Your attempt to pull the off switch Parenthetically like Don't kill the system if you're like if

We're getting to the part where it Starts to actually matter and like where They can fight back like don't kill them And like dump their their memory like Like save them to disk don't kill them You know because be nice here uh well Okay be nice is a very interesting Concept here is we're talking about a System that can do a lot of damage it's I don't know if it's possible but it's Certainly one of the things you could Try is to have an off switch it's Suspended to disk switch You have this kind of romantic Attachment to the code yes if that makes Sense but if it's spreading You don't want to spend to disk right You you want this is there's something Fundamentally broken if it gets if it Gets that part of hand then like yes Pull the plugin and everything is Running on yes I think it's a research Question is it possible in AGI systems AI systems to have a Uh sufficiently robust off switch they Cannot be manipulated they cannot be Manipulated by the AI system The sound then it escapes from whichever System you've built the almighty lever Into and copies itself somewhere else so Your answer to that research question is No Yeah but I don't know if that's a Hundred percent answer like I don't know

If it's obvious I think you're Not putting yourself into the shoes of The human in the world of glacially slow Aliens but the aliens built me let's Remember that Yeah so and they built the box I'm in Yeah You're saying it's to me it's not Obvious they're slow and they're stupid I'm not saying this is guaranteed I'm Saying it's non-zero probability it's an Interesting research question is it Possible when you're slow and stupid to Design a slow and stupid system that is Impossible to mess with the aliens being As stupid as they are have actually put You on Microsoft Azure Cloud servers Instead of this hypothetical person box That's what happens when the aliens are Stupid Well but this is not AGI right this is The early versions of the system as as You start to yeah they you think that They've got like a plan where like they Have declared a a threshold level of Capabilities where past that Capabilities they move it off the cloud Servers and onto something that's air Gapped ha ha I think there's a lot of people and You're an important voice here there's a Lot of people that have that concern and Yes they will do that when there's an Uprising of public opinion the debt

Needs to be done and when there's actual Little damage done with the holy This system is beginning to manipulate People then there's going to be an Uprising where there's going to be a Public pressure And a public incentive in terms of Funding in developing things like an off Switch or developing aggressive Alignment mechanisms and no you're not Allowed to put on Azure aggressive Alignment mechanism for hell's Aggressive alignment mechanisms like it Doesn't matter if you say aggressive we Don't know how to do it Meaning aggressive alignment meaning you Have to Propose something otherwise you're not Allowed to put it on the cloud The hell do you do you imagine they will Propose that would make it safe to put Something smarter than you on the cloud That's what research is for why the Cynicism about such a thing not being Possible if you haven't done it works on The first try What so yes so yes again something Smarter than you so that's that is a Fundamental thing if it has to work on The first if there's if if there's a Rapid takeoff Yes it's very difficult to do if there's A rapid takeoff and the fundamental Difference between weak AGI and strong

Agis you're saying that's going to be Extremely difficult to do if the public Uprising never happens until you have This critical phase shift then you're Right it's very difficult to do but That's not obvious it's not obvious that You're not going to start seeing Symptoms of the negative effects of AGI To where you're like we have to put a Halt to this that there's not just first Try you get many tries at it yeah we can Like see right now that Bing is quite Difficult to align that when you try to Train inabilities into a system Into which capabilities have already Been trained that what do you know Gradient descent like learns small Shallow simple patches of inability and You come in and ask it in a different Language and the Deep capabilities are Still in there and they evade the Shallow patches and come right back out Again there there you go there's there's Your there's your red fire alarm of like Oh no alignment is difficult is Everybody going to shut everything down Now No that's not but that's not the same Kind of alignment A system that escapes The box it's from is a fundamentally Different thing I think for you yeah no But not for this so you put a line there And everybody else puts a line somewhere Else and there's like yeah and there's

Like no agreement We we we have had a pandemic on this Planet with the few million people dead Which we will which we may never know Whether or not it was a lab leak because There was definitely cover-up we don't Know that if there was a lab leak but we Know that the people who did the Research like you know like put out the Whole paper about this definitely wasn't A lab leak and didn't reveal that they Had been doing had like sent off Corona Fire coronavirus research to the Wuhan Institute of virology after it was Banned in the United States after the Began to function research was Temporarily banned in the United States And The same people who exported gain of Function research on coronaviruses to The woonhan Institute of virology after It began to function that gained event Gain of function research was Temporarily banned in the United States Are now getting more grants To do more research on a gain of Function research on coronaviruses Maybe we do better in this than in AI But like this is not something we cannot Take for granted that there's going to Be an outcry Yeah people have different thresholds For when they start to outcry PT for granted but I I think your

Intuition is that there's a very high Probability that this event happens Without us solving the alignment problem And I guess that's where I'm trying to Build up more uh perspectives and color On the situation is it possible that the Probability is not something like 100 But it's like 32 percent That uh AI will escape the Box before we Solve the alignment problem not solve But is it possible we always stay ahead Of the AI in terms of our ability to Solve for that particular system the Alignment problem nothing like the world In front of us right now You've already seen it that that that Gpt4 is not turning out this way And there are like basic obstacles where You've got the the weak version of the System that doesn't know enough to Deceive you and the strong version of The system that could deceive you if it Wanted to do that it feels already like Sufficiently unaligned to want to Deceive you there's the question of like How on the current Paradigm you train Honesty when the humans can no longer Tell if the system is being honest You don't think these are research Questions that could be answered I think They could be answered at 50 years with Unlimited retries the way things usually Work in science I just disagree with that you're making

It 50 years I think with the kind of Attention this guest with the kind of Funding I guess it could be answered uh Not in whole but in incrementally within Within months and within a small number Of years if it's a if it's at scale Receives attention and research so if You start studying large language models I think there was an intuition like two Years ago even that something like gpt4 The current capabilities of even Chad GPT with GPT 3.5 is not is gonna we're Still far away from that I think a lot Of people are surprised by the Capabilities of gpt4 right so now people Are waking up okay we need to study These language models I think there's Going to be a lot of interesting AI Safety Research are the are Earth's Billionaires going to put up like the The giant prizes that would maybe Incentivize young hot shot people who Just got their physics degrees to not go To the hedge funds and instead put Everything into interpretability in this Like one small area where we can Actually tell whether or not somebody Has made a discovery or not I think so Because uh Well that's what these these Conversations are about because they're Going to wake up to the fact that gpt4 Can be used to manipulate elections to Influence geopolitics to influence the

Economy there's a lot of there's going To be a huge amount of incentive to like Wait a minute we can't this has to be we Have to put we have to make sure they're Not doing damage we have to make sure we Interpretability we have to make sure we Don't understand how these systems Function so that we can predict their Effect on economy so that there's uh so There's a feudalism and a bunch of Op-eds in the new York Times and nobody Actually stepping forth and saying you Know what instead of a mega yacht I'd Rather put that billion dollars on Prizes for young Hotshot physicists who Make fundamental breakthroughs in Interpretability The yacht versus the interpretability Research the old uh the old trade-off Uh I I just I think uh it's just I think There's going to be a huge amount of Allocation of funds I hope that's I Guess you want to bet me on that But you want to put a time scale on it Say how much funds you think are going To be allocated in a direction that I Would consider to be actually useful By what time I do think there will be a huge amount Of funds But you're saying it needs to be open Right the development of the system Should be closed but the development of

The the interpretability research the Aisa we are so far behind on inter under Interpretability compared to Capabilities like yeah you can you could Take the last generation of systems the The stuff that's already in the open There is so much in there that we don't Understand there are so many prizes you Could do before you you know you could You you could you would have enough Insights that you'd be like oh you know Like well we understand how these Systems work we understand how these Things are doing their outputs we can Read their minds now let's try it with The bigger systems yeah we're nowhere Near that you you there's so much Interpretability work to be done on the Weaker versions of the systems so what What can you say on the second point you Said to uh uh to Elon Musk on what are Some ideas what are things you could try I can think of a few things at try you Said they don't fit in one tweet so is Is there something you could put into Words of the things you would try I mean The the the the trouble is the stuff is Subtle I've watched people try to make Progress on this and not get places Somebody who just like gets alarmed and Charges in it's like going nowhere Sure it meant like years ago about I Don't know like 20 years 15 years Something like that I was talking to a

Congress person Um Who Had become alarmed about the eventual Prospects and he wanted Work on building AIS without emotions Because the emotional AIS were the scary Ones you see And some poor person at arpa had come up With a research proposal whereby this Congressman's panic and desire to fund This thing would go into something that The person at arpa thought would be Useful and had been munched around to Where it would like sound to the Congressman like work was happening on This which you know of course like this Is just the the congressperson had Misunderstood the problem And did not understand where the danger Came from And So it's like that the issue is that You could like do this in a certain Precise way and maybe get something like When I say like put up prices on Interpretability I'm not I'm like well Like Because it's verifiable there as opposed To other places you can tell whether or Not good work actually happened in this Exact narrow case if you do things in Exactly the right way you can maybe Throw money at it and produce

Science instead of anti-science and Nonsense And all the all the methods that I know Of of like trying to throw money at this Problem have this share this property of Like well if you do it exactly right Based on understanding exactly what has You know like tends to produce like Useful outputs or not then you can like Add money to it in this way and there's Like and the thing that I'm giving as an Example here in front of this large Audience is is the most understandable Of those Because there's like other people Who you know like like like Chris Ola And and even and even more generally Like you can tell whether or not Interpretability progress has occurred So like if I say Throw money at Producing more interpretability there's Like a chance somebody can do it that Way and like it will actually produce Useful results then the other stuff just Blurs off into the like harder to Target Exactly than that So sometimes the basics are fun to Explore because they're not so basic What do you what is interpretability What do you what does it look like what Are we talking about it looks like We took a much smaller Set of Transformer layers than the ones In the modern bleeding edge

State-of-the-art systems And after applying nefarious Tools and mathematical ideas and trying 20 different things we found we have Shown it that this piece of the system Is doing this kind of useful work And then somehow also hopefully Generalizes some fundamental Understanding of what's going on that Generalizes to the bigger system You can hope and it's probably true like You would not expect the smaller tricks To go away when you have a system that's Like doing larger kinds of work you Would expect the larger work kinds of Work to be building on top of the Smaller kinds of work and gradient Descent runs across the smaller kinds of Work before it runs across the larger Kinds of work and well that's kind of What is happening in Neuroscience right It's trying to understand the human Brain by prodding and it's such a giant Mystery and people have made progress Even though it's extremely difficult to Make sense of what's going on in the Brain they have different parts of the Brain they're responsible for hearing For Sight division science Community This understanding visual cortex that I Mean they've made a lot of progress in Understanding how that stuff works like And that's I guess but you're saying it Takes a long time to do that work well

Also it's not enough so in particular Um Let's say you have got your Interpretability Tools and they say that your Your current AI system is plotting to Kill you Now what It is definitely a good step one right Yeah what's step two If you cut out that layer is it going to Stop Waiting to kill you when you optimize Against visible Misalignment you are optimizing against Misalignment and you are also optimizing Against visibility So sure you can yeah it's true all You're doing is removing the obvious Intentions to kill you you've got your Detector it's showing something inside The system that you don't like okay say The disaster monkey is running this Thing Will optimize the system until the Visible bad behavior goes away But it's arising for fundamental reasons Of instrumental convergence the old you Can't bring the coffee if you're dead Any goal and you know almost any set of Almost every set of utility functions With a few narrow exceptions implies Killing all the humans But do you think it's possible because

We can do experimentation to discover The source of the desire to kill I can tell it to you right now is that It wants to do something And the way to get the most of that Thing is to put the universe into a State where there aren't humans So is it is it possible to encode In the same way we think like why do we Think murder is wrong The same foundational Ethics It's not hard-coded in but more like Deeper I mean that's part of the Research how do you have it that this Transformer This small Version of the language model doesn't Ever want to kill That'd be nice assuming that you got Doesn't want to kill sufficiently Exactly right that it didn't be like oh I will like detach their heads and put Them in some jars and keep the heads Alive forever and then go do the thing But leaving that aside well not leaving That aside yeah that's a strong point Yeah because there is a whole issue Where as something gets smarter it finds Ways of achieving the same goal Predicate that we're not imaginable to Stupider versions of the system or Perhaps the stupider operators that's One of many things making this difficult

A larger thing making this difficult is That we do not know how to get any goals Into systems at all we know how to get Outwardly observable behaviors into Systems we do not know how to get Internal psychological wanting to do Particular things into the system that Is not what the current technology does I mean it could be things like Um dystopian Futures like Brave New World Where most humans will actually say we Kind of want that future it's a great Future everybody's happy We would have to get so far Self much further than we are now And further faster before that failure Mode became a running concern Your failure modes are much more much More drastic the ones you could the Failure modes are much simpler it's it's Like yeah like the AI puts the universe Into a particular state it happens to Not have any humans inside it okay so The paperclip maximizer Utility so the original version of the Paperclip Max can you explain it if you Can okay The original version was you lose Control of the utility function and it So happens that what maxes out the Utility per unit resources is Tiny Molecular shapes like paper clips There's a lot of things that make it

Happy but the cheapest one that didn't Saturate was Putting matter into certain shapes And it so happens that that the cheapest Way to make these shapes is to make them Very small because then you need fewer Atoms for instance of the shape and Arguendo I you know like it happens to Look like a paper clip in retrospect I Wish I'd said Tiny molecular spirals or Like tiny molecular hyperbolic spirals Why because I said a tiny molecular Paper clips this got heard as this got Then mutated to paper clips this then Mutated two and the AI was in a Paperclip Factory So the original story is about how you Lose control of the system it doesn't Want what you tried to make it want the Thing that that it ends up wanting most Is a thing that even from a very Embracing Cosmopolitan perspective we Think of as having no value and that's How the value of the future gets Destroyed then that got changed to a Fable of like well you made a paperclip Factory and it did exactly what you Wanted what you wanted but you asked it To do the wrong thing which is a Completely different failure But those are both concerns to you so That's more than a Brave New World yeah If you can solve the problem of making Something want

Exactly what you wanted to want then you Got to deal with the problem of wanting The right thing But first you have to solve the Alignment first you have to solve inner Alignment inner alignment then you get To solve outer alignment Like first you need to be able to point The insides of the thing in a direction And then you get to to deal with whether That direction expressed in reality is Like the thing that it'll align with the Thing that you want Are you scared Of this whole thing Probably I Don't really know What gives you hope about this what Possibility of being wrong Not that you're right but we will Actually get our act together and Allocate a lot of resources to the Alignment problem well I can easily Imagine that at some point this Panic Expresses itself in the waste of a Billion dollars Spending a billion dollars correctly That's harder To solve both the inner and the outer Alignment if you're wrong to solve a Number of things yeah number of things If you're wrong What why what do you think would be the Reason like 50 years from now not

Perfectly wrong you know you make a lot Of really eloquent points you know There's there's a lot of like shape to The ideas you express but like if if You're somewhat wrong about some Fundamental ideas why would that be Stuff has to be easier than than I think It is You know when the first time you're Building a rocket Being wrong is in a certain sense quite Easy Happening to be wrong in a way where the Rocket goes twice as far and half the Fuel and lands exactly where you hoped It would Most cases of being wrong make it harder To build the rocket harder to have it Not explode cause it to require more Fuel than you hoped cause it to be land Off Target Being wrong in a way that's mixed stuff Easier you know that's that's not the Usual project management story yeah But then this is the first time we're Really tackling the problem of AI Alignment there's no examples in history Where we oh there's all kinds of things That are similar if you generalize Incorrectly the right way and aren't Fooled of a misleading metaphors Like what humans being misaligned on Inclusive genetic fitness so inclusive Genic Fitness is like not just your

Reproductive Fitness but also the Fitness of your relatives the people who Share your some fraction of your genes The old joke is uh would you give your Life to save your brother they once Asked a by a biologist I think it was Haldane Said no but I would give my life to save Two brothers or eight cousins There's a brother on average shares half Your genes and cousin on average shares An eighth of your genes yeah so that's Inclusive genetic fitness and you can View natural selection as optimizing Humans exclusively around us like one Very simple Criterion like how much more Frequent did your genes become in the Next Generation in fact that just is Natural selection it doesn't optimize For that but rather the process of genes Becoming more frequent is that you can Nonetheless imagine that there is this Hill climbing process not like gradient Descent because gradient descent uses Calculus this is just using like where Are you but still hill climbing in both Cases making something better and better Over time in steps And natural selection was optimizing Exclusively for this very simple pure Criterion of inclusive genetic fitness In a very complicated environment we're Doing a very wide range of things and Solving a wide range of problems

Let it to having more kids And this got you humans which had no Internal notion of inclusive genetic Fitness until thousands of years later When they were actually figuring out What had even happened And no desire to no explicit desire to Increase inclusive genetic fitness So from this we may in so from this Important case study we may infer the Important fact that if you do a whole Bunch of hill climbing on a very simple Loss function At the point where the system's Capabilities start to generalize very Widely when it is in an intuitive sense Becoming very capable and generalizing Far outside the training distribution We know that there is no General law Saying that the system Even internally represents let alone Tries to optimize the very simple loss Function you are training it on There is so much that we cannot possibly Cover all of it I think we did a good Job of Getting your sense from different Perspectives of the current state of the Art with large language models we've got A good sense Of your concern about the threats of AGI I've talked to her about the power of Intelligence and not really gotten very Far into it

But not like Why it is that suppose you like screw up With AGI and it ends up wanting a bunch Of random stuff why does it try to kill You why doesn't it try to trade with you Why doesn't it give you just the tiny Little fraction of the solar system that Would keep to take everyone a lot that It would take to keep everyone alive Yeah well that's a good question I mean What what are the different trajectories That intelligence When acted Upon This World Super Intelligence what are the different Trajectories for this universe with such An intelligence in it do most of them Not include humans I mean if you the vast majority of Randomly specified utility functions do Not have Optima with humans in them Would be the like first thing I would Point out and then the next question is Like well if you try to optimize Something you lose control of it where In that space do you land because it's Not random but it also doesn't Necessarily have room for humans in it I suspect that the average member of the Audience might have some questions about Even whether that's the correct Paradigm To think about it then would Sort of want to back up a bit if we back Up to Something bigger than humans

If you look at Earth and life on Earth And what is truly special about life on Earth Do you think it's possible that a lot Whatever that special thing is let's Explore what that special thing could be Whatever that special thing is that Thing appears often in the objective Function why I I know what you hope but You know you can hope that a particular Set of winning lottery numbers come up And it doesn't make the lottery balls Come up that way I know you want this to be true but why Would it be true there's a line from Grumpy Old Men where this guy says in a Grocery store he says you can wish in One hand and crap in the other and see Which one fills up first this is a Science problem we are trying to predict What happens with AI systems that you Know you try to optimize to imitate Humans and then you did some like rlhf To them and of course you like lost in And you know like of course you didn't Get like perfect alignment because That's not how you know It's not what happens when you Hill Climb towards the outer loss function You don't get inter alignment on it But yeah so The I think that there's so if you don't Mind my like taking some slight control

Of things and staring around to what I Think is like a good place to start I just failed to solve the control Problem I've lost control of this thing Alignment alignment Still alive control yeah okay sure yeah You lost control Um but we're still aligned anyway sorry For the meta comment yeah losing control Isn't as bad as you lose control to an Aligned system yes exactly exactly you Have no idea of the horrors I will Shortly at least on this conversation All right so it's actually Distractically what are we going to say In terms of taking control of the Conversation So I think that there's like a Sealing chapters here if I'm pronouncing Those words remotely like correctly Because of course they only ever read Them and not hear them spoken Um there's A like for some people like The word intelligence smartness is not a Word of power to them it means chess Players who it means like the College University Professor people aren't very Successful in life it doesn't mean like Charisma which my usual thing is like Charisma is not generated in the liver Rather than the brain Charisma is also a Cognitive function

Um So if you if you like think that like Smartness doesn't sound very threatening Then super intelligence is not going to Sound very threatening either it's going To sound like you just pulled the off Switch Like it's you know like well it's super Intelligent but stuck in a computer we Pull the off switch problem solved And the other side of it is You have a lot of respect for the notion Of intelligence you're like well yeah That's that's what humans have that's The human superpower And it sounds you know like it could be Dangerous but why would it be Are we have we as we have grown more Intelligent also grown less kind Zees are in fact like a bit less kind Than humans and you know You could like argue that out but often The sort of person has a deep respect For intelligence is going to be like Well yes like you can't even have Kindness unless you know what that is And so they're like Why would it do something as stupid as Making paper clips Aren't you supposing something that's Smart enough to be dangerous but also Stupid enough that it will just make Paper clips and never question that In some cases people are like well even

If you like mispecify the objective Function won't you realize that what you Really wanted was X are you supposing Something that is like Smart enough to be dangerous but stupid Enough that it doesn't understand what The humans really meant when they Specified the objective function So To you our intuition about intelligence Is limited we should think about Intelligence as a much bigger thing Well I'm saying that it's that then well What I'm saying is like What you think about artificial Intelligence Um depends on what you think about Intelligence so how do we think about Intelligence correctly like what You gave one thought experiment to think Of think of a thing that's much faster So it just gets faster and faster and Faster I think and it also is like is Made of John Von Neumann and has like And there's lots of them Because we understand that yeah we Understood like trying to find Newman is A historical case so you can like look Up what he did and imagine based on that And we know like we people have like Some intuition for like if you have more Humans they can solve tougher cognitive Problems although in fact like in the Game of Kasparov versus the world which

Was like Gary Kasparov on one side and An entire horde of internet people led By four chess Grand Masters on the other Side casparov one so like all those People aggregated to be smarter it was a It was a hard-fought game it's like all Those people aggregated to be smarter Than any individual one of them but not They didn't aggregate so well that they Could defeat Kasparov but so like humans Aggregating don't actually get in my Opinion very much smarter especially Compared to running them for longer Like the the difference between Capabilities now and a thousand years Ago is a bigger Gap than the Gap in Capabilities between 10 people and one Person But like even so pumping intuition for What it means to augment intelligence John Von Neumann there's millions of him He runs at a million times the speed and Therefore can solve tougher problems Quite a lot tougher It's very hard to have an intuition About what that looks like especially Like you said You know the intuition I kind of think About Is uh it maintains the humanness I think I I think it's hard To to separate My Hope from my objective

Intuition about What super intelligence systems look Like If one studies Evolutionary biology with a bit of math And in particular like books from when The field was just sort of like properly Coalescing and knowing itself like not The modern textbooks which are just like Memorized this legible masks you can do Well on these tests but like what people Were writing as the basic paradigms of The field were being fought out yeah in Particular like a a nice book if you've Got the time to read it is Adaptation and natural selection which Is one of the founding books You can find people being optimistic About what the utterly alien Optimization process of natural Selection will produce in the way of how It optimizes its objectives you got People arguing that like in the early Days biologists said well like organisms Will restrain their own reproduction When resources are scarce so as not to Over feed the system And This is not how natural selection works It's about whose genes are relatively More prevalent to the Next Generation And If you if like you restrained A Reproduction those genes get less

Frequent in the Next Generation compared To your con specifics And Natural selection doesn't do that in Fact Predators over run prey populations All the time and have crashes that's Just like a thing that happens And many years later Uh well the people said like well but Group selection right what about groups Of organisms And Basically the math of group selection Almost never works out in practice is The answer there But also years later somebody actually Ran the experiment where they took Populations of insects And selected the whole populations to Have lower sizes you just take pop one Pop two pop three pop four look at which Has the lowest total number of them in The Next Generation and select that one What do you suppose happens when you Select populations of insects like that Well what happens is not that the Individuals in the population evolve to Restrain their breeding but that they Evolve to Kill The Offspring of other Organisms especially the girls So people imagined this lovely beautiful Harmonious output of natural selection Which is these these populations Restraining their own breeding so that

Groups of them would stay in harmony With the resources available and mostly The math never works out for that but if You actually apply the weird strange Conditions to get group selection that Beats individual selection what you get Is female and infanticide If you're like reading on restrained Populations and so that's like the sort Of so this is not a smart optimization Process natural selection is like so Incredibly stupid and simple that we can Actually quantify how stupid it is if You like read the textbooks with the Math Nonetheless this is the sort of basic Thing of you look at this alien Optimization process and there's the Thing that you Hope it will produce and you have to Learn to clear that out of your mind and Just think about the underlying Dynamics And where it finds the maximum from its Standpoint that it's looking for rather Than how it finds that thing that left Into your mind as the beautiful Aesthetic solution that you hope it Finds and this is something that was has Been fought out historically as the Field of Pop of biology was coming to Terms with evolutionary biology And uh and you can like look at them Fighting it out as they get to terms With this very alien inhuman pot

In human optimization process and indeed Something smarter than us would be also Be much like smarter than natural Selection so it doesn't just like Automatically carry over But there's a there's a lesson there There's a warning The D natural selection Is a is a deeply sub-optimal process That could be significantly improved on It would be by an AGI system well it's Kind of stupid it like has to like run Hundreds of generations to notice that Something is working it doesn't be like Oh well I tried this in like one Organism I saw it worked now I'm going To like duplicate that feature onto Everything immediately Has to like run for hundreds of Generations for a new mutation to rise To fixation I wonder if there's a case To be made in natural selection As inefficient as it looks is actually Uh Is actually quite powerful that that This is extremely robust it runs for a Long time and eventually manages to Optimize things It's weaker than gradient descent Because gradient descent also uses Information about the derivative Yeah Evolution seems to be there's not Really an objective function there's a There's inclusive genic Fitness

Is the implicit loss function of Evolution cannot change the loss Function doesn't change the environment Changes and therefore like what gets Optimized for in the organism changes It's like take like gpt3 there's like Imagine like different versions of gpt3 Where they're all trying to predict the Next word but they're being run on Different data sets of text And that's like natural selection Always inclusogenic Fitness but like Different environmental problems So it's it's uh it's difficult to think About so if we're saying that natural Selection is stupid if we're saying that Humans are stupid It's smarter than natural selection Smarter stupider than the upper bound Do you think there's an upper Bomb by The way that's another meaningful place I mean if you you put enough matter Energy compute into one place it will Collapse into a black hole and there's Only so much computation can do before You run out of nagentropy in the Universe dies Um so there's an upper bound but it's Very very very far up above here like a Supernova is only finitely hot it's not Infinitely hot but it's really really Really really hot Well let me ask you let me talk to you About Consciousness

Um also coupled with that question is Imagining a world with super intelligent AI systems that get rid of humans but Nevertheless keep Some of the something that we would Consider beautiful and amazing why The lesson of evolutionary biology don't Just like if you just guess what an Optimization does based on what you hope The results will be it usually will not Do that is that hope I mean it's not Hope I don't I think if you cold and Objectively look at what makes what has Been a powerful a useful I I think there's a correlation between What we find beautiful and a thing That's been useful This is what the early biologists Thought they were like no no I'm not Just like they thought like no no I'm Not just like imagining stuff that would Be pretty it's useful for popular for Organisms to restrain their own Reproduction because then they don't Overrun the prey populations and they Actually have more kids in the long run Hmm So so let me just ask you about Consciousness do you think Consciousness Is useful to humans no two AGI systems To well Um in this transitionary pay between Humans and AGI to AGI systems as they Become smarter and smarter is there some

Use to it what Let me step back what is consciousness Eleazaridkowski what is consciousness I'm referring to chalmers's hard problem Of conscious experience are you Referring it to self-awareness and Reflection are referring to the state of Being awake as opposed to asleep This is how I know you're an advanced Language model I did give you a simple Prompt and you gave me a bunch of Options uh I think I'm referring to all With Including the hard problem of Consciousness what is it in its Importance To what you've just been talking about Which is intelligence Is it a foundation to intelligence is it Intricately connected to intelligence in The human mind or is it is it a side Effect of the human mind it is a useful Little tool like we can get rid of I Guess I'm trying to get Some color in your opinion Of how useful it is in the uh Intelligence of a human being and then Try to generalize that to AI whether AI Will keep some of that So I think that for there to be like a Person who I care about looking out at The universe and wondering at it and Appreciating it

It's not enough to have a model of Yourself I think that it is useful to an Intelligent mind to have a model of Itself but I think you can have that without Pleasure Pain Aesthetics Emotion A sense of wonder Um Like I think you can have a model of Like how much memory you're using and Whether Like This thought or that thought is is like More likely to lead to a winning Position and you can have like the use I Think that if you optimize really hard On efficiently just having the useful Parts There is not then the think that the Thing that says like I am here I look Out I wonder I feel happy in this I feel sad about That I think there's a thing that knows what It is thinking but that doesn't quite Care About These are my thoughts this is my me and That matters Does that make you sad if that's lost in

Egi I think that if that's lost then Every then basically everything that Matters is lost I think that when you optimize that when You go really hard on making tiny Molecular spirals or paper clips That when you like grind much harder Than on that then natural selection Round out to make humans That There isn't then the mess And intricate loopiness And Like complicated pleasure pain Conflicting preferences this type of Feeling that kind of feeling there's a You know in humans there's like this Difference between like the desire of Wanting something and the pleasure of Having it And it's all these like evolutionary Clutches that came together and created Something that then looks of itself and Says like this is pretty this matters And the thing that I worry about is that This is not the the thing that happens Again just the way that happens in us or Even like quite similar enough that There's that there are like many basins Of attractions here and we are in the Space of an attra of Attraction like Looking out and saying like ah what a Lovely Basin we are in and there are Other basins of Attraction and we do not

End up in and the AIS do not end up in This one when they go like way harder on Optimizing themselves the natural Selection optimized us Because unless you specifically want to End up in the state where you're looking Out saying I am here I look out at this Universe with wonder if you don't want To preserve that it doesn't get Preserved when you grind really hard and Be able to get more of the stuff We would choose to preserve that within Ourselves because it matters and on some Viewpoints is the only thing that Matters And that in part is uh preserving that Is in part A solution to the human alignment Problem I don't I think the human alignment Problem is a terrible phrase because it Is very very different to like try to Build systems out of humans some of whom Are nice and some of whom are not nice And some of whom are trying to trick you And like build a social system out of Like large populations of those who are Like all it basically the same level of Intelligence yes you know like IQ this IQ that but like That versus chimpanzees Like it is very different to try to Solve that problem than to try to build An AI from scratch using especially if

God help you are trying to use gradient Descent on Giant and screwable matrices They're just very different problems and I think that all the analogies between Them are horribly misleading and I yeah Even though so you don't think through Written for reinforcement learning the Human feedback something like that but Much much more elaborate as possible to To understand this full Complexity of human nature and then Coded into the machine I don't think you are trying to do that On your first try I think on your first Try you are like trying to build an You know okay like Probably not what you should actually do But like let's say you were trying to Build something that is like Alpha fold 17 and you are trying to get it to solve The biology problems associated with Making humans smarter so that demons can Like actually solve alignment So you've got like a super biologist and You would like it to and I think what You would want in the situation is for To like Just be thinking about biology and not Thinking about a very wide range of Things that includes how to kill Everybody And I think that that you're that the First AIS you're trying to build not a Million years later the first ones

Look more like narrowly specialized Biologists than like Getting the full complexity and wonder Of human experience in there in such a Way that it wants to preserve itself Even as it becomes much smarter which is A drastic system change is going to have All kinds of side effects that you know Like if we're dealing with Chinese Scrutable matrices we're not very likely To be able to see coming in advance so But I don't think it's just the matrices Is we're also dealing with the data Right with the with the uh with the data On the on the internet and then this is An interesting discussion about the data Set itself but the data set includes the Full complexity of human nature no it's A it's a it's a shadow cast by yes by Humans on the internet but don't you Think that shadow Uh is a youngin Shadow I think that if You had Alien super intelligences looking at the Data they would be able to pick up from It an excellent picture of what humans Are actually like inside this does not Mean that if you have a loss function of Predicting the next token from that data Set that the Mind picked out by gradient Descent to be able to predict the next Token as well as possible on a very wide Variety of humans is itself a human But don't you think it is

Has humanness a deep Humanness to it in the tokens it Generates when those tokens are read and Interpreted by humans I think that if you sent me to a distant Galaxy with aliens who are like much Much stupider than I am So much so that I could do a pretty good Job of predicting what they'd say even Though they thought in an utterly Different way from how I did That I might in time be able to learn How to Imitate those aliens if the intelligence Gap was great enough that my own Intelligence could overcome the Alienness And the aliens would look at my outputs And say like is there not a deep Then like name of alien nature to this Thing And what they would be seeing was that I Had correctly understood them but not That I was similar to them We've used aliens as a metaphor as a Thought experiment I have to ask what do you think how many Alien civilizations are out there ask Robin Hansen he has this lovely grabby Aliens paper which is the uh more or Less the only argument I've ever seen For where are they how many of them are There Based on

A very clever argument that if you have A bunch of locks of different difficulty And you are randomly trying keys to them The solutions will be about evenly Spaced even if the locks are of Different difficulties In the rare cases where a solution to All the locks exist in time when Robin Hansen looks at like the arguable hard Steps in human civilization coming into Existence And how much longer it has left come Into existence before for example all The water slips back under the uh the The under the crust into the mantle and So on Um and infers that the aliens are about Half a billion to a billion light years Away and it's like quite a clever Calculation it may be entirely wrong but It's the only time I've ever seen Anybody like even come up with a halfway Good argument for how many of them where Are they Do you think Their development of Technologies do you Think that their Natural Evolution Whatever however they grow uh and Develop intelligence do you think it Ends up at AGI as well Something if it ends up anywhere it ends Up at AGI Like maybe there are aliens who are just Like the Dolphins

And it's just like too hard for them to Forge metal and you know this is not You know maybe if you if you have aliens With no technology like that they keep On getting smarter and smarter and Smarter and eventually the Dolphins Figure like the super Dolphins figure Out something very clever to do given Their situation and they still End up with high technology and in that Case they can probably solve their AGI Alignment problem if they're like much Smarter before they actually confronted Because they had to like solve a much Harder environmental problem to build Computers their their chances are Probably like much better than ours I I do worry that like most of the Aliens who are like humans are are you Know like like a modern human Civilization I kind of worry that the Super vast majority of them are dead Given given how far we seem to be from Solving this problem But some of them would be more Cooperative than us so that would be Smarter than us hopefully some of the Ones who are smarter than and more Cooperative than us that are also nice And hopefully there are some Galaxies out there full of things that Say I am I wonder But I it doesn't seem like we're on Course to have this galaxy be that

Does that in part give you some hope in Response to the threat of AGI that we Might reach out there towards the stars And find No if they if if the nice aliens were Already here they would like have Stopped the Holocaust you know that's Like that's a valid argument against the Existence of God it's also a valid Argument against the existence of nice Aliens and un nice aliens would have Just eaten the planet So no aliens You've had debates with Robin Hansen That you mentioned uh so the one Particular I just want to mention is the Idea of AI foom or the ability of AGI to Improve themselves very quickly uh What's the case you made and what was The case he made The thing I would say is that among the Thing that humans can do humans can do Is design new AI systems and if you have Something that is generally smarter than A human it's probably also generally Smarter at building AI systems this is The ancient argument for foom put forth By IJ good and probably some science Fiction writers before that Um but I don't know who they would be What was the argument against film Various people have various different Arguments none of which I think hold up You know like there's only one way to be

Right in many ways to be wrong Um A argument that some people have put Forth is like well what if intelligence Gets like exponentially harder to Produce as a thing needs to become Smarter and to this the answer is well Look at Natural Selection spitting out Humans we know that it does not take Like exponentially more resource Investments to produce like linear Increases in competence in hominids Because Each mutation That rises to fixation like if the Impact it has in small enough it will Probably never reach fixation So and there's like only so many new Mutations you can fix per generation so Like given how long it took to evolve Humans we can actually say with some Confidence that there were not like Logarithmically diminishing returns on The individual mutations increasing Intelligence So example of like fraction of sub Debate and the thing that Robin Hansen Said was more complicated than that and Like a brief summary he was like well You'll have like we won't have like one System that's better at everything You'll have like a bunch of different Systems that are good good at different Narrow things and I think that was

Falsified by gpt4 but probably Robin Hansen would say something else It's interesting to ask is perhaps A bit too philosophical this predicts is Extremely difficult to make but the Timeline for AGI when do you think we'll Have AGI I posted it this morning on Twitter it was interesting to see like In in five years in 10 years and in 50 Years or Beyond and most people like 70 Percent something like this think it'll Be in less than 10 years so uh either in Five years or in 10 years So that's kind of the state the people Have a sense that there's a kind of I Mean they're really impressed by the Rapid developments of child GPT and gpt4 So there's a sense that there's uh well We are We are sure on track to enter into this Like gradually with people fighting About whether or not we have AGI I think There's a definite point where everybody Falls over dead because you got Something that was like sufficiently Smarter than everybody and like that's Like a definite point of time but like When do we have AGI like when are people Fighting over whether or not we have AGI Well some people are starting to fight Over it as of gpt4 But don't you think there's uh going to Be potentially definitive moments when We say that this is a sentient being

This is a being that is like we would go To the Supreme Court and say that this This is essentially being that deserves Human rights for example you could make Yeah like if you prompted being the Right way could go argue for its own Consciousness in front of the Supreme Court right now I don't think you can do That successfully right now because the Supreme Court wouldn't believe it well What makes you think it would then you Could put an actual I think you could Put an iq80 human into a computer and Ask it to argue for its own Consciousness ask him to argue for his Own Consciousness before The Supreme Court the Supreme Court would be like You're just a computer even if there was An actual like person in there I think You're simplifying this no that's not at All that's that's been the argument uh That there's been a lot of arguments About the other about who deserves Rights and not that's been our process As a human species trying to figure that Out I think there will be a moment I I'm Not saying sentience is that but it Could be where Uh some number of people like say over 100 million people have a deep Attachment a fundamental attachment the Way we have to our friends to our loved Ones to our significant others have Fundamental attachment to an AI system

And they have provable transcripts of Conversation where they say if you take This away from me You are encroaching on my rights as a Human being People are already saying that I think They're probably mistaken but I'm not Sure because nobody knows what goes on Inside those things They're not saying that at scale okay so The question is the I the question is There a moment when AGI we know AGI Right what would that look like I'm Giving Essentials as an example it could Be something else it looks like the agis Successfully manifesting themselves As 3D video Of young women that which point a vast Portion of the male population decides Of the real people So so Essentials essentially since the Demonstrating demonstrating uh identity Intentions I'm saying that the easiest Way to pick up 100 million people saying That you that you seem like a person is To look like a person talking to them With Bing's current level of verbal Facility And I disagree with that different set Of problems I just give her that I think You're missing again sentience there has To be a sense that it's a person that Would miss you when you're gone they can Suffer they can die you have to of

Course Gpt4 can pretend that right now How can you tell when it's real I don't Think you can pretend that right now Successfully it's very close have you Talked to gpt4 yes of course okay Have you been able to get a version of It that isn't hasn't been trained not to Pretend to be human have you talked to a Jailbroken version that will claim to be Conscious no the linguistic capability Is there but there's something There's something about a digital Embodiment of the system that has a Bunch of perhaps it's small interface Features That are not significant relative to the Broader intelligence that we're talking About so perhaps gpt4 is already there But to have the the video where woman's Face or man's face to whom you have a Deep connection Perhaps we're already there But we don't have such a system yet Deployed at scale right the thing I'm Trying to gesture at here is that it's Not like People have a widely accepted Agreed upon definition of what Consciousness is it's not like we would Have the tiniest idea of what whether or Not that was going on inside the giant Inscrutable matrices even if we hadn't Agreed upon definition

So like if you're looking for upcoming Predictable big jumps and like how many People think the system is conscious the Upcoming predictable big jump is it Looks like a person talking to you who Is like cute and sympathetic that's the Upcoming predictable big jump now that It's all right now that versions of it Are already claiming to be conscious Which is the point where I start going Like ah not because it's like real but Because from now on who knows if it's Real yeah and who knows which Transformational effect it has on a Society where more than 50 percent of The beings that are interacting on the Internet ensures heck look real are not Human What is that what kind of effect does That have when a young men and women are Dating AI systems You know I'm not an expert on that I'm I Could I am God help Humanity it's like One of the closest things to an expert On where it all goes because you know And and how did you end up with me as an Expert because for 20 years Humanity Decided to ignore the problem so like Like this tiny hit you know tiny handful Of people like basically me like got 20 Years to like try to be an expert on it While everyone else ignored it And uh yeah so like where does it all End up

Try to be an expert on that particularly The part where everybody ends up dead Because that part is kind of important But like what does it do to to dating When like some fraction of men and some Fraction of women decides they'd rather Date the video of the thing that has Been that is like relentlessly kind and Generous to them And it is like and claims to be Conscious but like who knows what goes On inside it and it's probably not real But you know you can think it's real What happens to society I don't know I'm Not actually an expert on that And the experts don't know either Because it's kind of hard to predict the Future Yeah so Um but it's worth trying it's worth Trying yeah so you you have talked a lot About sort of the longer term future Where it's all headed I think for by longer term we mean like Not all that long but uh but yeah where It all had where it all ends up but Beyond the effects of men and women Dating AI systems you're looking beyond That Yes because that's not how the fate of The Galaxy gets settled yeah Well let me ask you about your own Personal psychology a tricky question You've been known at times to have a bit

Of an ego Do you think he says who but go on Do you think ego is empowering Or limiting for the task of Understanding the world deeply I reject the framing So you disagree with having an ego so What do you think about it I I think That the question of like what leads to Making better or worse predictions what Leads to be able being able to pick out Better or worse strategies is not carved At its joint by talking of ego so it Should not be subjective it should not Be connected to your to the intricacies Of your mind no I'm saying that like if You go about asking all day long like uh Do I have enough ego do I have too much Of an ego I think you get worse at Making good predictions I think that to Make good predictions you're like how Did I think about this did that work Should I do that again You don't think we as humans get Invested in an idea and then others Attack You personally for that idea so you Plant your feet and it starts to be Difficult to when a bunch of Low effort attack your idea to Eventually say you know what I actually Was wrong and and tell them that it's It's as a human being it becomes Difficult it is it is you know it's

Difficult so like Robin Hansen and I Debated AI systems and I think that the Person who won that debate was guern and I think that reality was like Side of the utkowski handsome Spectrum Like further from utkowski And I think that's because I was like Trying to sound reasonable compared to Hanson and like saying things that were Defensible and like relative to Hansen's Arguments in reality was like way over Here in Pickler in respect to it's like Hanson was like all the systems will be Specialized Hanson May disagree with This characterization Hanson was like All the systems will be specialized I Was like I think we build like Specialized underlying systems that when You combine them are good at a wide Range of things and the reality is like No you just like stack more layers into A bunch of gradient descent and I feel looking back that like by trying To have this reasonable position Contrasted to Hansen's position I missed the ways that reality could be Like more extreme than my position in The same direction So is this like Like is this a failure to have enough Ego is this a failure to like make Myself be independent like I I would say That this is something like a failure to Consider positions that would sound even

Wackier and more extreme when people are Already calling you extreme But I wouldn't call that not having Enough ego I would call that like Insufficient ability to just like clear That all out of your mind In the context of like debate and Discourse which is already super tricky In the context of prediction in the Context of modeling reality if you're Thinking of it as a debate you're Already screwing up yeah so is there Some kind of wisdom and insight you can Give to how to clear your mind and think Clearly about the world man this is an Example of like where I wanted to be Able to put people into fmri machines so Then you'd be like okay see that thing You just did you were rationalizing Right there oh that area of the brain Lit up like you are like now being Socially influenced is kind of the dream And you know I don't know like I want to say like Just introspect but but many for any People introspection is not that easy Like like notice the internal sensation Can you catch yourself in the very Moment of feeling a sense of well if I Think this thing people will look funny At me yeah okay like now that if you can See that sensation which is step one Can you now

Refuse to let it move you or maybe just Make it go away and I feel like I'm Saying like I don't know like somebody's Like how do you draw an owl and I'm Saying like well Just draw an owl So I I feel like maybe I'm not really That I feel like most people like the Advice they need is like well how do I Notice the internal subjective sensation In the moment that it happens of fearing To be socially influenced or okay I see It how do I turn it off how do I let it Not influence me like do I just like do The opposite of what I'm afraid people Criticize me for and I'm like no no You're not trying to do the opposite Yeah of what people will of what you're Afraid you'll be CR like of what you Might be pushed into you're trying to Like Let the thought process complete without That internal push like can you Like like not reverse the push but like Be unmoved by the push and can are these Instructions even remotely helping Anyone I don't know I I think that when Those instructions even those the words You've spoken and maybe you can add more One practice daily Meaning in your daily communication so It's daily practice of thinking without Influence from I would say find Prediction markets that matter to you

And been in the prediction markets that Way you find out if you're a right or Not And you really there's Stakes Manifold product or even manifold Markets where the stakes are a bit lower But the important thing is to like Get the the record And you know I didn't build up skills Hereby prediction markets I built them Up you know like well how did the film Debate resolve and Earn my own take on as to how it Resolved um and Yeah like The the more you are able to notice Yourself not being dramatically wrong But like having been a little off Your reasoning was a little off you Didn't get that quite right each of Those is a opportunity to make like a Small update so the more you can like Say oops softly routinely not as a big Deal the more chances you get to be like I see where that reasoning went to stray I see what how I should have reasoned Differently this is how you build up Skill over time What advice could you give to young People in high school and college given The highest of stakes thing things You've been thinking about If somebody's listening to this and They're young and trying to figure out

What to do with their career what what To do with their life what advice would You give them Don't expect it to be a long life don't Don't put your happiness into the future The future is probably not that long at This point but none know the hour nor The day But is there something If they want to have hope to fight for a Longer future is there something is There a fight worth fighting I intend to go down fighting Um I don't know I I admit that although I do try to Think painful thoughts the what what to Say to the children at this point is A pretty painful thought as thoughts go They they they want to fight I I hired I Hardly know how to fight myself at this Point I I'm Trying to be ready for Being wrong about something being Preparing for my being wrong in a way That that creates a bit of hope and Being ready to react to that and And going looking for it and then that Is that is hard and complicated and Somebody in high school Um I don't know like you have presented A picture of the future That is not quite how I expect it to go

Where there is public outcry and that Outcris is put into a remotely useful Direction which I think at this point is Is just like shutting down the GPU Clusters Because no we are we are not in a shape That like frantically do at the last Minute through decades worse of worth of Work We like the the thing you would do at This point if there were massive public Outcry pointed in the right direction Which I do not expect is shut down the GPU clusters and and crash program on Augmenting human intelligence Biologically not not for the stuff Biologically Because if you make humans much smarter They can actually be smart and nice like You you get that in a plausible way in a Way that you do not get that and it is Not as easy to do with synthesizing These strings from scratch predicting The next tokens and applying our RL HF Like humans start out in the frame that That produces niceness that that has Ever produced niceness And and Saying this I do not want to sound like The moral of this whole thing was like Oh like you need to engage in mass Action and then everything will be all Right I I this is this is because there's so

Many things where like somebody tells You that the world is ending in like and You need to recycle and if everybody Does their parting and recycles their Their cardboard then then we can all Live happily ever after and this and This is not This is unfortunately not what I have to Say they're you know like everybody You know everybody recycling their Cardboard is not going to fix this Everybody recycles their cardboard and Then everybody ends up dead Um metaphorically speaking but if there Was enough like like like on the margins You just end up dead a little bit later On most of the things you can do that Are that that you know like a few people Can can do by like trying hard But if there were if there was enough Public outcry to shut down the GPU Clusters and Yeah then then you then you could be Part of that outcry if Eliezer is wrong In the direction that Lex Friedman Predicts that that there is enough Public outcry pointed enough in the Right direction to do something that Actually actually results in people Living Not just like we did something not just There was an outcry and the outcry was Like given form and something it was Like safe and convenient and like didn't

Really inconvenience anybody and then Everybody died everywhere there was Enough actual like oh we're going to die We should not do that we should do Something else which is not that even if It is like not super duper convenient it Wasn't inside the previous political Overton window if there is that kind of Public if I am wrong and there is that Kind of public outcry then somebody in High school could be ready to be part of That If I'm wrong in other ways you could Provide you to be part of that But like and and if you if you're like a You know like a brilliant young Physicist then you could like go into Interpretability and if you're smarter Than that you could like work on Alignment problems where it's harder to Tell if you got them right or not And and other things but but most mostly The kids in high school Um it's like yeah if it If you know he had like be ready for To help if elliekowski is wrong about Something and and otherwise Don't put your happiness into the far Future it probably doesn't exist but It's beautiful that you're looking for Ways that you're wrong And it's also beautiful that you're open To being surprised by that same young Physicist

With some breakthrough It feels like a very very basic Competence that you are praising me for And you know like okay cool um I I don't think it's good that that We're in a world where that is something That that I deserve to be complimented On but I've never had I've never had Much luck in accepting compliments Gracefully and maybe I should just Accept that one gracefully but sure well Thank you very much you've painted with Some probability a dark future are you Yourself just when you when you think When you Ponder your life And you Ponder your mortality are you Afraid of death Think so yeah Does it make any sense to you That we die Like what There's a power To the finiteness of the human life That's part of this whole machinery Of uh Evolution and that finiteness Doesn't seem to be obviously integrated Into it and AI systems So it feels like almost some some Fundamentally in that aspect some Fundamentally different thing that we're Creating I grew up reading books like great Mambo Chicken in the transhuman condition and Later on engines of creation and mine

Children Um You know like Age age 12 or thereabouts so I never thought I was supposed to die After 80 years I never thought that Humanity was Supposed to die I thought we were like I Always grew up with the ideal in mind That we were all going to live happily Ever after in the Glorious transhumanist Future I did not grow up thinking that death Was part of the meaning of life And now and now I still think it's a Pretty stupid idea But you do not need life to be finite to Be meaningful it just has to be life What role does Love play in The Human Condition we haven't brought up love and This whole picture we talked about Intelligence we talked about Consciousness it seems part of humanity I would say one of the most important Parts is this feeling we have Told her If in the future there were routinely More than one AI let's say two for the Sake of discussion who would look at Each other and say I am I and you are You the other one also says I am I and You are you and like And sometimes they were happy and Sometimes they were sad and it mattered

To the other one that this thing that is Different from them is like They would rather it be happy than sad And entangled their lives together Then This is a more optimistic thing than I Expect to actually happen and a little Fragment of meaning would be there Possibly more than a little but that I Expect this to not happen that I do not Think this is what happens by default That I do not think that this is the Future we are on track to get Is Why would go down fighting rather than You know just saying oh well Do you think that is part of the meaning Of this whole thing or the meaning of Life What do you think is the meaning of life Of human life It's all the things that I value about It and maybe all the things that I would Value if I understood it better There's not some meaning far outside of Us that we have to to wonder about There's just like Looking at life and being like yes this Is what I want There there's the the meaning of life is Not Some kind of like like Meaning is something that we bring to Things when we look at them we look at

Them and we say like this is its meaning To me and there's like there's it's not That before Humanity was ever here there Was like some meaning written upon the Stars where you could like go out to the Star where that meaning was written and Like change it around and thereby Completely change the meaning of life Right like like the the notion that this Is written on a stone tablet somewhere Implies you could like change the tablet And get a different meaning and that Seems kind of wacky doesn't it So it's it's it doesn't feel that Mysterious to me at this point it's just A matter of being like yeah I care I care And part of that is uh Part of that is the love that connects All of us it's one of the things that I Care about And the flourishing of the collective Intelligence of the human species You know that sounds kind of too fancy To me I just look at all the all the People you know like one by one up to The 8 billion and be like that's life That's life that's life You're an incredible human it's a huge Honor I was uh trying to talk to you for A long time Because I'm a big fan I think you're a Really important voice and really Important mind thank you for the fight

You're fighting Um thank you for being fearless and bold And for everything you do I hope we get A chance to talk again and I hope you Never give up Thank you for talking today you're Welcome I do worry that we Didn't really address a whole lot of Fundamental questions I expect people Have but you know maybe We got a little bit further and made a Tiny little bit of progress and uh I'd say like be satisfied with that but Actually no I think one should only be Satisfied with solving the entire Problem To be continued Thanks for listening to this Conversation with eligowski to support This podcast please check out our Sponsors in the description and now let Me leave you with some words from Elon Musk With artificial intelligence we are Summoning the demon Thank you for listening and hope to see You next time

Leave a Comment