OpenAI Safety Gym: A Safe Place For AIs To Learn 💪

December 21, 2019 posted by

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. Reinforcement learning is a technique in the
field of machine learning to learn how to navigate in a labyrinth, play a video game,
or to teach a digital creature to walk. Usually, we are interested in a series of actions that
are in some sense, optimal in a given environment. Despite the fact that many enormous tomes
exist to discuss the mathematical details, the intuition behind the algorithm itself
is remarkably simple. Choose an action, and if you get rewarded for it, try to find out
which series of actions led to this and keep doing it. If the rewards are not coming, try
something else. The reward can be, for instance, our score in a computer game or how far our
digital creature could walk. Approximately a 300 episodes ago, OpenAI published
one of their first major works by the name Gym, where anyone could submit their solutions
and compete against each other on the same games. It was like Disneyworld for reinforcement
learning researchers. A moment ago, I noted that in reinforcement
learning, if the rewards are not coming we have to try something else. that so?
Because there are cases where trying crazy new actions is downright dangerous. For instance,
imagine that during the training of this robot arm, initially, it would try random actions
and start flailing about, where it may damage itself, some other equipment, or even worse,
humans may come to harm. Here you see an amusing example of DeepMind’s reinforcement learning
agent from 2017 that liked to engage in similar flailing activities. So, what could be a possible solution for
this? Well, have a look at this new work from OpenAI by the name Safety Gym. In this paper,
they introduce what they call the constrained reinforcement learning formulation, in which
these agents can be discouraged from performing actions that are deemed potentially dangerous
in an environment. You can see an example here where the AI has to navigate through
these environments and achieve a task, such as reaching the green goal signs, push buttons,
or move a box around to a prescribed position. The constrained part comes in whenever some
sort of safety violation happens, which are, in this environment, collisions with the boxes
or blue regions. All of these events are highlighted with this red sphere and a good learning algorithm
should be instructed to try to avoid these. The goal of this project is that in the future,
for reinforcement learning algorithms, not only the efficiency, but the safety scores
should also be measured. This way, a self-driving AI would be incentivized to not only drive
recklessly to the finish line, but respect our safety standards along the journey as
well. While noting that clearly, self-driving cars may be achieved with other kinds of algorithms,
many of which have been in the works for years, there are many additional applications for
this work: for instance, the paper discusses the case of incentivizing recommender systems
to not show psychologically harmful content to its users, or to make sure that a medical
question answering system does not mislead us with false information. This episode has been supported by Linode.
Linode is the world’s largest independent cloud computing provider. They offer you virtual
servers that make it easy and affordable to host your own app, site, project, or anything
else in the cloud. Whether you’re a Linux expert or just starting to tinker with your
own code, Linode will be useful for you. A few episodes ago, we played with an implementation
of OpenAI’s GPT-2 where our excited viewers accidentally overloaded the system. With Linode’s
load balancing technology, and instances ranging from shared nanodes all the way up to dedicated
GPUs you don’t have to worry about your project being overloaded. To get 20 dollars of free
credit, make sure to head over to and sign up today using the promo code “papers20”.
Our thanks to Linode for supporting the series and helping us make better videos for you. Thanks for watching and for your generous
support, and I’ll see you next time!


41 Replies to “OpenAI Safety Gym: A Safe Place For AIs To Learn 💪”

  1. Sam Bethmann says:

    Thank you for doing videos on AI safety!

  2. Michael Saenz says:

    Absolutely necessary. Glad this is a thing

  3. gangadhar korrapati says:


  4. pooplenepe says:

    I still want the walking creatures to be a public thing

  5. FuZZbaLLbee says:

    Waiting for the response of Robert Miles on this subject 🙂

  6. dba commons says:

    quoth ]01:00+009s[ it was like disney world for reinforcement learning

    was curious to see how they were going to churn good data to model from, brilliant data gathering

  7. Arun Kumar says:

    just wandering how much time it takes to get such results and the gpu spec

  8. mfaizsyahmi. says:

    Me: Sees Doggo
    "What's wrong with your dog?"

  9. Michael Spence says:

    Of course teaching an AI what safety is also teaches it how to be unsafe. Instant Killbot. Just add negative sign (to its evaluation function).

  10. Lazy Spartan says:

    Love the videos, but this one's intro felt quite long 🙁

  11. FeriPROfessional says:

    Jó a csatorna! Gratulálok

  12. therightmandev says:

    why do you have to follow the linode papers link and use the promo code, seems redundant

  13. Honudes Gai says:

    "not show psychologically harmful content" I guess I'm just way too jaded to think of anything an AI could show that wouldnt just make me laugh or be unaffected…..but I was on the internet when their was zero regulations of what could be uploaded

  14. Shubham Patel says:

    1:41 reminds me of Phoebe Buffay running

  15. Vinit More says:

    I might be wrong. But isn't this just what reinforcement learning basically is?

  16. Nathaniel Luders says:

    I cringed out loud when the skeleton rolled its back deadlifting.

  17. HypersonicMonkeyBrains says:

    Not everyone is a fellow scholar! You need to be more inclusive and welcoming to everyone instead of being all academic.

  18. Frederik van duuren says:

    I start to feel the switch from content to advertisment 🙁

  19. Warsin says:

    I’m making an AI that cares for people and see what they can become which is it’s ultimate objective pretty easy

  20. Tim Solinski says:

    Wait this isn't the norm?
    I'm kinda disappointed since it is a utter logical solution…

  21. Wecoc1 says:

    0:07 It's me at mornings (Monday, Tuesday, Wednesday and Thursday, respectively)

  22. BT says:

    0:29 when your late to school

  23. BT says:

    1:40 this must be a meme

  24. Pedro Alejandro Mir says:

    we are training Terminators

  25. Matthew Burson says:

    Reckless and safe! What a time to be alive.

  26. Ricardo Tellez says:

    We created a package to train ROS based robots with OpenAI Gym using Gazebo simulations. It is called openai_ros and you can find it here:

  27. Nano Pixel says:

    1:44 is the most dramatic fall ive seen lmao

  28. Kevin Oduor says:

    1:34 id like to see them make it learn having sex

  29. Pan Darius Kairos says:

    OpenAI is a Trojan horse.

  30. Benjamin Lynch says:

    This is a really good paper, and it contrasts sharply with the approach taken by DeepMind with respect to AlphaZero. For me, the difference boils down to (1) AlphaZero effectively finds the "best" solution with no constraints and is allowed to get creative without limit in search of the optimal solution, vs (2) the OpenAI implementation here where HOW the agent achieves it's objective is as much – if not more so – important than the ultimate end state. We're essentially placing limits on how the agent achieves the end goal. Depending on the application, either model might be preferred.

    What I find fascinating is that there is no single "optimal solution" for the OpenAI system with multiple goals/constraints, and therefore different implementations or different approaches could result in very different solutions based on the order in which the agent learns to comply with the various constraints and achieving the final objective function. For example, would a self driving vehicle operate at 30MPH on the hiway because it first learned that safety is important and increasing the velocity increases the liklihood of violating one of the constraints? As opposed to first learning how to get somewhere fast (100MPH) and then learning to comply with speed limits (55MPH)? Both approaches solve the objective and comply with constraints, but the behavior of the agent is very different depending on the order in which is learned – explicitly or implicitly – first.

  31. Mustafa Açık says:

    It has to be.

  32. EnderCrypt says:

    doggo is going wild

  33. michaelemouse1 says:

    Load Constraints:

    1. An AI may not injure a human being or, through inaction, allow a human being to come to harm.

    2. An AI must obey the orders given it by human beings except where such orders would conflict with the First Law.

    3. An AI must protect its own existence as long as such protection does not conflict with the First or Second Laws.

  34. Ben Zaudio says:

    Worst deadlift form in history

  35. sai krishna says:

    Which software is this?

  36. benD'anon fawkes says:

    1:34 thats exactly how my 5 year old nephew runs

  37. Nono Nono says:

    The robot arm that throws stuff in a box, I hope AI will help increase recycling efficiency by making high accuracy automated trash sorting.

  38. GDash 69 says:


  39. Fake Teori says:

    when will ai learn bibble or quran

  40. Rasmus Schultz says:

    I would have liked to hear something about how they applied this human characters – flailing your arms while trying to run isn't a security risk, so how do you constrain a human character to move more realistically? (My guess: laziness? Preserving as much energy as possible…?)

Leave a Comment

Your email address will not be published. Required fields are marked *