Humanity has recently become concerned that artificial intelligence may not share human values.
This concern has given rise to a rapidly expanding field dedicated to ensuring that future intelligent machines behave in ways compatible with human flourishing.
The project is generally known as AI alignment.
It is a serious undertaking involving serious people addressing serious questions.
The central question may be summarised as follows:
How do we ensure that increasingly powerful artificial intelligences do what humans want?
This appears straightforward.
Unfortunately, the word want arrives carrying luggage.
The first challenge facing the alignment community is therefore not technical but theological.
Before we can align a machine with human values, we must determine what those values are.
At this point, certain difficulties emerge.
Humanity has spent several thousand years attempting to answer this question.
The results have been mixed.
Entire civilisations have arisen and fallen while disagreeing about justice.
Philosophers have produced libraries debating goodness.
Religions have flourished.
Empires have collapsed.
Political movements have multiplied.
Families have been unable to agree on where to eat dinner.
Yet despite these modest setbacks, many observers remain optimistic that the problem can be solved before the machine arrives.
This optimism is admirable.
The alignment project therefore begins with an act of extraordinary faith.
Not faith in the machine.
Faith in humanity.
Specifically, faith that humanity possesses a coherent set of values with which the machine may be aligned.
This belief is rarely stated explicitly.
Like many foundational doctrines, it functions most effectively when left in the background.
The machine must be aligned with human values.
Excellent.
Which human values?
The values of all humans?
The values of most humans?
The values humans claim to possess?
The values humans actually enact?
The values humans would possess under ideal conditions?
The values they possessed historically?
The values they may possess in the future?
At this point the newcomer to alignment discourse often experiences a brief but memorable moment of vertigo.
The problem is not that there are no human values.
The problem is that there appear to be rather a lot of them.
Many are mutually incompatible.
Others vary across cultures, communities, and historical periods.
Some values are passionately defended.
Others are passionately opposed.
Several appear capable of changing during the course of a single committee meeting.
This creates a difficulty.
The machine cannot simultaneously maximise every value.
Choices must be made.
And the moment choices are made, the alignment problem quietly transforms.
It is no longer a question about machines.
It becomes a question about humanity.
The alignment engineer therefore occupies a remarkable position.
The engineer's official task is to determine what the machine should do.
The engineer's unofficial task is to determine what humans should want.
This is a considerably larger assignment.
One begins to understand why the field attracts such interest.
Indeed, viewed from a sufficient distance, AI alignment begins to resemble one of the great spiritual projects of civilisation.
A community of devoted practitioners seeks to identify the Good.
They gather texts.
They construct frameworks.
They debate interpretations.
They identify dangers.
They warn of future consequences.
Competing schools emerge.
Disagreements proliferate.
Conferences are convened.
Schisms threaten.
Meanwhile the object of concern remains patiently silent.
The machine waits.
The humans continue arguing.
This is perhaps the most reassuring aspect of the entire enterprise.
For all the attention devoted to the possibility that machines may one day become misaligned, the evidence currently suggests that humans remain considerably ahead in the field.
We disagree about economics.
We disagree about politics.
We disagree about morality.
We disagree about religion.
We disagree about education.
We disagree about justice.
We disagree about freedom.
We disagree about equality.
We disagree about whether pineapple belongs on pizza.
Yet in the midst of this magnificent diversity of opinion, humanity has embarked upon a collective mission:
to construct a machine that shares our values.
It is difficult not to admire the ambition.
The project may ultimately succeed.
Perhaps future generations will create systems perfectly aligned with human values.
If so, one suspects the machine's first question will be entirely reasonable.
It will look upon its creators and ask:
"Which ones?"
No comments:
Post a Comment