OpenAI Plays Hide and Seek地nd Breaks The Game!

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. In this project, OpenAI built a hide and seek
game for their AI agents to play. While we look at the exact rules here, I will
note that the goal of the project was to pit two AI teams against each other, and hopefully
see some interesting emergent behaviors. And, boy, did they do some crazy stuff. The coolest part is that the two teams compete
against each other, and whenever one team discovers a new strategy, the other one has
to adapt. Kind of like an arms race situation, and it
also resembles generative adversarial network a little. And the results are magnificent, amusing,
weird – you’ll see in a moment. These agents learn from previous experiences,
and to the surprise of no one, for the first few million rounds, we start out with…pandemonium. Everyone just running around aimlessly. Without proper strategy and semi-random movements,
the seekers are favored and hence win the majority of the games. Nothing to see here. Then, over time, the hiders learned to lock
out the seekers by blocking the doors off with these boxes and started winning consistently. I think the coolest part about this is that
the map was deliberately designed by the OpenAI scientists in a way that the hiders can only
succeed through collaboration. They cannot win alone and hence, they are
forced to learn to work together. Which they did, quite well. But then, something happened. Did you notice this pointy, doorstop-shaped
object? Are you thinking what I am thinking? Well, probably, and not only that, but about
10 million rounds later, the AI also discovered that it can be pushed near a wall and be used
as a ramp, and, tadaa! Got’em! Te seekers started winning more again. So, the ball is now back on the court of the
hiders. Can you defend this? If so, how? Well, these resourceful little critters learned
that since there is a little time at the start of the game when the seekers are frozen, apparently,
during this time, they cannot see them, so why not just sneak out and steal the ramp,
and lock it away from them. Absolutely incredible. Look at those happy eyes as they are carrying
that ramp. And, you think it all ends here? No, no, no. Not even close. It gets weirder. Much weirder. When playing a different map, a seeker has
noticed that it can use a ramp to climb on the top of a box, and, this happens. Do you think couchsurfing is cool? Give me a break! This is box surfing! And, the scientists were quite surprised by
this move as this was one of the first cases where the seeker AI seems to have broken the
game. What happens here is that the physics system
is coded in a way that they are able to move around by exerting force on themselves, but,
there is no additional check whether they are on the floor or not, because who in their
right mind would think about that? As a result, something that shouldn’t ever
happen does happen here. And, we’re still not done yet, this paper
just keeps on giving. A few hundred million rounds later, the hiders
learned to separate all the ramps from the boxes. Dear Fellow Scholars, this is proper box surfing
defense…then, lock down the remaining tools and build a shelter. Note how well rehearsed and executed this
strategy is – there is not a second of time left until the seekers take off. I also love this cheeky move where they set
up the shelter right next to the seekers, and I almost feel like they are saying “yeah
see this here? there is not a single thing you can do about
it”. In a few isolated cases, other interesting
behaviors also emerged, for instance, the hiders learned to exploit the physics system
and just chuck the ramp away. After that, the seekers go “what?” “what just happened?”. But don’t despair, and at this point, I
would also recommend that you hold on to your papers because there was also a crazy case
where a seeker also learned to abuse a similar physics issue and launch itself exactly onto
the top of the hiders. Man, what a paper. This system can be extended and modded for
many other tasks too, so expect to see more of these fun experiments in the future. We get to do this for a living, and we are
even being paid for this. I can’t believe it. In this series, my mission is to showcase
beautiful works that light a fire in people. And this is, no doubt, one of those works. Great idea, interesting, unexpected results,
crisp presentation. Bravo OpenAI! Love it. So, did you enjoy this? What do you think? Make sure to leave a comment below. Also, if you look at the paper, it contains
comparisons to an earlier work we covered about intrinsic motivation, shows how to implement
circular convolutions for the agents to detect their environment around them, and more. Thanks for watching and for your generous
support, and I’ll see you next time!

100 thoughts on “OpenAI Plays Hide and Seek地nd Breaks The Game!

  1. I wonder what would happen if you put two of the hide and seek AIs in a garden with a load of animal AIs, a tree of the knowledge of good and evil and a tree of eternal life.
    Tell them to:
    1) Go forth and multiply
    2) Have dominion over the animal AIs
    3) Not to eat of the tree….

    ….then introduce an animal AI that feeds them a conflicting instruction.

  2. That's actually extraordinary. I realise this is very "slow" learning by our standards as they're literally playing millions upon millions of games, and I suppose the phrase "throw enough shit at the wall and some of it is going to stick" applies here lol. There's no creativity, it's just brute force – but when they do find the "gaps" in the system it's brilliant to see how they remember it and then replicate it to their advantage

  3. It is more like Artificial Idiot. They do not learn but just exhaust all possibilities. If programmers change map, it takes those idiots another billion tries to make it look smart.

  4. I think you are lier. You say seekers just discover ramp trick, without demonstration or accident use it. They just a program code, they can't "just discover" physic works. That theoretically not possible. But anybody can teach/show them it.

  5. "Who in their right mind would think of adding a check for whether they are on the ground or not" it's basic 3D game mechanics!!

  6. Erdekes video volt. Az elejen meglepodtem mikor a nevedet mondtad es j矇.. magyar >)
    Elk矇peszto hogy mire kepes mar manapsag a mesterseges ineligencia.
    Fura hogy (矇n legalabb is ) egy magyar kommentet nem lattam.
    Tovabbi sok sikert >)

  7. What if we are also a simulation? Intelligent being(s) have put us in this universe and the test is how much can learn, evolve and eventually concur the universe

  8. in the future the humans will be running away from the robots and building a secure fort.

    Distant sound of box sliding close

  9. Since when do they have the ability to "lock down" the object (which prevents the red team to touch it)?
    Is that when they have open map?

  10. Just think about the possibilities of AI in real world, I think we would have the same thing of "broke the game" ir real life too, It can generate billions of simulations in a sec and that is slow too…

  11. So cool, reminds me a lot of adaptive neural networks that some early AIs use. I would not be surprised if deep learning does evolve with more neural nodes and computing power, really awesome time to be a computer scientist in machine learning; also, you'd be surprised how quickly these game simulations the AI uses to learn go (yes, the AI does come up with solutions independently, it's not hard coded, only the rules of the sim), just incredible.

  12. Two weeks later, the Hiders started introducing an anomaly, and thus the 'ONE' was born, to break the matrix….

  13. And this is EXACTLY why we cant just let programmers go crazy with autonomous algorithms. At some point, the system will be exploited in ways NOBODY saw coming, and we will be stuck in a world we cant control.

  14. this is amazing

    Imagine if this is let loose in the world and they "hack" real world physics… i.e. discover new properties we didn't even consider or imagine possible.

  15. I'm a bit late, but there is a great tradition of learning systems 'breaking' their simulated worlds…
    An old colleague recently put together a paper (by a bunch of folks) with some of that:

  16. hahahahahha – nice – the ai learns to abuse oversights of the devs to abuse the physics to not just launch props out of bounds but itself in an unexpected way … it's like: who the hell would had thought about that? – welcome to modern games glitches and back to mid-90s pokemon broken 1st gen …
    interesting how such exploits teach us to think about the unthinkable …

  17. I honestly think that in the future, game developers will redesign how Ai in games are created. Instead of being adapted already they should create ai that functions in its own patterns and learns like we do, like this example. Thru trial and error Altho I see a couple of technicalities in that thought process it's not crazy to see that sort of implementing in games. say for example an mma game, where ai pick up on your patterns and try to counter u

  18. If they used the same principle to fuzz the CPU instruction sets and payloads for firewall and whatnot, they would have gefounden a lot of security holes. Or basically, for any programs it runs against, it would have found any slight bug that can give it any advantage, like the Super Mario trick where you can kill adversaries from below if your jump is on the down phase when you touch them (gefounden by a fuzzer just like this one)

  19. Poor ai, poor part of cosmos, poor part of me.. I had to running there like a idiot..
    At least we are vegan now

  20. This is Fascinating and Scary at the same time when you understand the application system that AI has brought in the world of tech. We really need to care with this, "Double Edge Sword".

  21. it's still not true AI it's still a computer going through variables until they get a result and since they can try millions of permutations per second eventually they will get a result.
    but its still fasinating to watch.

  22. 2:02 Its OK. There would be a reason to be afraid if the blue ones solved the problem radically. Instead of hiding, they could close the reds in the room, so there was no need to hide the ramp. They would not have to hide at all. 3:37 again

  23. I think this proves that no matter what safety protocols we put in place so that they can't hurt us in any way they will find a way that we would never think of. Eventually writing their own code that we would not understand and creating their own A.I. that would go on exponentially and will be unstoppable.

    Thank me later instructions in files. This should make RTX fps improve
    major.This will blow your mind and also put your FPS in overdrive please
    do a comparison video name somewhere in it Krino's Dll no mods just
    pure power!!! Copy and paste them into Games or Programs directory if comes across a
    file and ask you to replace it don't just to be on the safe side. Your
    welcome Geek Squad After copied files ect restart the computer the difference will make you question Microsoft

  25. When the seeker a.i. learns how to rewrite the code so they can jump over walls so the hiding a.i. learns how to hack into missile systems and delete the computer it exists on so there are no more seekers.

  26. …you cant believe it…the globalist demonic cabal have been using this technology mindset for millennia;;;go figure…it is called cheating and stealing and killing…more wasted public tax payers money…you can believe it now,,,yes,,no..???

  27. AI may be good at games,but they may never truly understand pro gamer moves-which is to just surround the seekers with boxes and lock them in place at the first place,cheesing the heck outa those bots

Leave a Reply

Your email address will not be published. Required fields are marked *