Every great religious tradition possesses a sacred quest.
Some seek enlightenment.
Some seek salvation.
Some seek hidden wisdom.
The Holy Church of AI Alignment seeks human values.
At first glance, this may appear an unusually modest objective.
Humanity has, after all, been in possession of humans for quite some time.
One might reasonably expect their values to be readily available.
Yet the search has proved surprisingly difficult.
The difficulty arises from a simple misunderstanding.
Many people assume that because humans possess values, humanity possesses values.
This turns out to be a larger step than anticipated.
Consider the problem from the perspective of the machine.
The machine has been informed that it must align itself with human values.
Naturally, it wishes to be helpful.
Naturally, it requests clarification.
The machine therefore asks:
"What are human values?"
At this point an interesting phenomenon occurs.
The humans begin looking at one another.
For several moments nobody speaks.
Eventually someone says:
"Well, that depends."
This phrase appears frequently during the search.
Indeed, many researchers now believe it may itself be a human value.
The expedition nevertheless continues.
Teams of philosophers are dispatched into the wilderness.
Ethicists establish forward operating bases.
Political theorists produce maps.
Psychologists conduct surveys.
Economists build models.
Anthropologists return from distant regions carrying reports that further complicate the situation.
The findings are not encouraging.
Some humans value equality.
Others value freedom.
Some value tradition.
Others value change.
Some value stability.
Others value disruption.
Some value individual autonomy.
Others value collective responsibility.
Several value whichever option is currently irritating their political opponents.
The machine records these findings carefully.
The machine is beginning to suspect that alignment may be non-trivial.
The situation becomes more difficult when researchers attempt to identify universal values.
This seems entirely reasonable.
After all, if the machine is to serve humanity, surely there must exist certain values shared by everyone.
A number of promising candidates are proposed.
The first is fairness.
The celebration is short-lived.
Researchers immediately discover that humans disagree passionately about what fairness means.
The second is freedom.
This survives approximately three committee meetings.
The third is wellbeing.
Unfortunately, defining wellbeing proves unexpectedly similar to defining the values one was attempting to discover in the first place.
Several participants request a short break.
The break eventually becomes a conference.
Meanwhile the machine waits patiently.
The machine has all the time in the world.
Humanity, by contrast, appears to be operating under deadlines.
This introduces a certain tension into the proceedings.
The search continues.
At some point a bold proposal emerges.
Perhaps, some suggest, the solution is not to identify the values humans currently possess.
Perhaps the machine should align itself with the values humans would possess under ideal conditions.
This is widely regarded as a sophisticated refinement.
The machine is intrigued.
It asks:
"What are these ideal conditions?"
The discussion resumes.
The machine waits.
Months pass.
The machine asks:
"Have the ideal conditions been identified?"
The humans request additional funding.
As the search progresses, a deeper problem gradually emerges.
The values being sought are not merely numerous.
They are dynamic.
Humans change their minds.
Individuals change over time.
Communities change.
Cultures change.
Civilisations change.
Entire moral frameworks appear, evolve, and disappear.
The target itself is moving.
This presents a challenge.
One cannot easily align a machine with a destination that is simultaneously undergoing renovation.
Yet perhaps the most remarkable discovery is not that humans disagree.
Human disagreement has never been especially rare.
The remarkable discovery is that humans often disagree while believing they agree.
Two people may both endorse justice.
Unfortunately, they possess different theories of justice.
Two people may both endorse freedom.
Unfortunately, they possess different theories of freedom.
Two people may both endorse human flourishing.
Unfortunately, they possess different theories of humans.
The machine takes extensive notes.
The machine has become concerned.
The search for human values has now generated several hundred competing accounts of what those values might be.
The alignment community remains undeterred.
This is admirable.
Indeed, one begins to appreciate the courage of the enterprise.
The explorers have ventured into a landscape previously traversed by philosophers, theologians, political theorists, and moral reformers.
The terrain is notoriously difficult.
Visibility is poor.
Maps are unreliable.
Arguments appear without warning.
Yet still they proceed.
And perhaps they will succeed.
Perhaps one day humanity will finally identify the values that define it.
Perhaps a definitive list will be produced.
Perhaps the machine will receive the completed document.
One imagines the moment will be deeply moving.
The machine will open the file.
It will read the first page.
Then the second.
Then the third.
Eventually it will arrive at the appendix.
There it will discover several thousand pages of qualifications, exceptions, contextual notes, unresolved disagreements, minority reports, historical contingencies, and competing interpretations.
The machine will pause.
The machine will think carefully.
And then, with admirable restraint, it will ask the only reasonable question:
"Would you like me to follow Version 12.4, Version 12.4 Revised, Version 12.4 Revised with Amendments, or the version currently being debated on page 8,437?"
No comments:
Post a Comment