Training AI not to misbehave

November 22, 2019 posted by


– [Narrator] With machine learning, computers are effecting
our lives more today than ever before. From diagnosing patients to driving cars to influencing hiring
and criminal sentencing. But these machine learning
based systems can also fail, suggesting medical treatments
that could be fatal or making decisions that reflect racist, sexist and otherwise unfair biases. One of the root causes of
these misbehaving systems is the way that machine
learning algorithms are created. Today, machine learning
researchers are the designers of machine learning algorithms, which they provide to
users such as doctors, companies or other researchers. These users provide
the algorithm with data that it uses to make
predictions or decisions that influence people. For example, how to
treat a medical patient or whether to hire someone. Right now, the user of
the algorithm must ensure that the algorithm does not
exhibit undesirable behavior, such as discriminating
against or hurting people. However, while the user may
be an expert in their field and may know what safety or
fairness constraints they want, they may not have the
expertise in machine learning or statistics to convey these constraints to the algorithm. In our paper, titled
Preventing Undesirable Behavior of Intelligent Machines, presented in the November 22nd, 2019 issue of Science, we propose a new framework for designing machine learning algorithms. This new framework shifts
the burden of ensuring that the algorithm is well behaved from the user of the algorithm to the designer of the algorithm. In this framework, the
designer provides an interface that allows the user to easily define undesirable behaviors that would undermine fairness or safety. The designers algorithm now
has to be smarter than before. It needs to understand
the user’s definition of undesirable behavior, reason about what would cause
this undesirable behavior and avoid that behavior
with high probability. Algorithms designed using our framework are called Seldonian algorithms, an homage to a character in Isaac Asimov’s science fiction novels,
The Foundation Series. Seldonian algorithms empower their users by giving them more control, enabling them to ensure that
machine learning algorithms do not misbehave. But is it even possible
to create algorithms that are so smart that they can reason about undesirable behavior and avoid it? To show that it is, we created several. We used Seldonian algorithms and standard machine learning algorithms
to predict student GPA’s at a university from their
application materials while avoiding sexist behavior. This plot shows how much
each algorithm over-predicted the GPA’s of men, on average, and how much they under predicted the GPA’s of women, on average for standard machine learning algorithms and algorithms designed
using our framework. Our algorithms successfully
avoided the sexist behavior exhibited by the standard algorithms. In our paper, we show
how the user could use different definitions of fairness and compared to other fairness aware machine learning algorithms. We also applied our
algorithms to simulated type one diabetes treatment,
using several examples of how the user, here
a medical researcher, might define undesirable behavior. These definitions include
requiring the algorithm to only decrease the
frequency of dangerously low blood sugar levels, called hypoglycemia, while trying to avoid
high blood sugar levels. It is our hope that Seldonian algorithms, designed by the machine learning community will not only enable the
responsible use of machine learning for current applications but that they will open
up new applications for which the use of machine learning was previously deemed too risky. You can read more about our work in the November 22nd issue of Science.

No Comments

Leave a Comment

Your email address will not be published. Required fields are marked *