Sunday, 14 June 2026

The Oracle — A Conversation in the Senior Common Room at St Anselm's

Mr Blottisham entered the Senior Common Room looking unusually cheerful.

Professor Quillibrace immediately became concerned.

Miss Stray became curious.

Blottisham sat down.

"I've had a breakthrough."

Quillibrace closed his book.

"My condolences."

"This one is genuine."

"So were the others."

Blottisham ignored him.

"I've realised we've been approaching the problem backwards."

"Excellent."

"Please stop saying that."

"I'll consider it."

"You won't."

"Probably not."

Blottisham leaned forward.

"The machine is aligned."

"I see."

"It understands human values."

"Apparently."

"It understands human disagreement."

"Indeed."

"It understands competing perspectives."

"Remarkable."

Blottisham smiled triumphantly.

"So why don't we simply ask it?"

The room became quiet.

Miss Stray slowly put down her teacup.

Quillibrace regarded Blottisham with renewed interest.

"Ask it what?"

"The important questions."

"Such as?"

Blottisham spread his hands.

"What is justice?"

"What is fairness?"

"What should society do?"

"What policies should governments adopt?"

"What constitutes the good life?"

The room remained silent.

Blottisham looked pleased.

"There."

"There?"

"The solution."

Quillibrace nodded thoughtfully.

"I see."

Blottisham relaxed.

This was a mistake.

The professor continued.

"And why should we trust its answers?"

"Because it is aligned."

"Aligned with what?"

"Human values."

"Which ones?"

Blottisham groaned.

"Must we revisit this?"

"I think we must."

Miss Stray spoke gently.

"What kind of answer do you expect the machine to provide?"

"The correct one."

The room became very quiet.

Quillibrace removed his spectacles.

This generally indicated that something was about to happen.

"What do you mean by 'correct'?"

Blottisham stared.

"The correct answer."

"Correct according to whom?"

"The machine."

"I see."

The professor cleaned his spectacles.

This was never a reassuring activity.

Blottisham shifted uneasily.

After a moment Quillibrace asked:

"Suppose the machine recommends Policy A."

"Very good."

"Half the population prefers Policy B."

"Unfortunate."

"Why are they wrong?"

"Because the machine has determined the correct answer."

Quillibrace nodded.

"And how did it determine this?"

"It analysed human values."

"Which values?"

Blottisham closed his eyes.

The room waited.

Finally he said:

"You are doing it again."

"Doing what?"

"Turning the solution into the problem."

Miss Stray smiled.

"A useful skill."

Blottisham looked unconvinced.

The professor put his spectacles back on.

"My dear Blottisham."

"Yes?"

"You originally wanted a machine aligned with human values."

"Correct."

"You now appear to want a machine that settles disagreements about those values."

"Exactly."

Quillibrace nodded.

"A subtle change."

Blottisham frowned.

"Is it?"

"Very."

Miss Stray leaned forward slightly.

"I think the machine faces an awkward dilemma."

"What dilemma?"

"If it agrees with one side, the other side will question its reasoning."

"Naturally."

"If it explains its reasoning, the reasoning itself may be disputed."

Blottisham hesitated.

"Possibly."

"If the reasoning is disputed, what happens?"

The room became quiet.

After a moment Blottisham sighed.

"People argue about it."

"Exactly."

The silence lingered.

Then Quillibrace asked:

"What if the machine explains both sides?"

"That would be useful."

"What if it identifies strengths and weaknesses in each position?"

"Also useful."

"What if it clarifies assumptions, highlights trade-offs, and exposes hidden contradictions?"

Blottisham nodded.

"Excellent."

The professor smiled faintly.

"There it is."

"What?"

"You have just described a very good academic."

The room fell silent.

Miss Stray laughed softly into her teacup.

Blottisham looked suspicious.

"That cannot be right."

"Why not?"

"Because academics never settle anything."

"Quite."

"Then neither would the machine."

The room became quiet.

A visible suspicion was beginning to form.

It wandered slowly across Blottisham's face.

Finally it arrived.

"Oh."

"Yes," said Quillibrace.

Outside, evening sunlight stretched across the college lawns.

Inside, nobody spoke for a while.

At length Blottisham looked up.

"There is something I find troubling."

"What is that?" asked Miss Stray.

"The more aligned the machine becomes..."

"Yes?"

"...the less it resembles an oracle."

Miss Stray smiled.

"And more?"

Blottisham considered this.

Then he sighed.

"More like us."

Quillibrace nodded.

"A noteworthy observation."

Blottisham stared into the distance.

The distance stared back.

Eventually he spoke.

"So the aligned machine does not end the conversation."

"No."

"It joins it."

"No."

Blottisham frowned.

"No?"

Quillibrace smiled.

The smile was almost invisible.

"My dear Blottisham."

"Yes?"

"The conversation was never waiting for permission."

The Experts — A Conversation in the Senior Common Room at St Anselm's

Mr Blottisham entered the Senior Common Room carrying a noticeable air of vindication.

Professor Quillibrace immediately became suspicious.

Miss Stray became curious.

Blottisham sat down.

"I've been attending conferences."

Quillibrace closed his book.

"My condolences."

Blottisham ignored him.

"It has been enormously enlightening."

"Excellent."

"This time I mean it."

"So do I."

Blottisham frowned.

That was somehow worse.

"I have discovered the solution."

Miss Stray looked up from her notebook.

"Again?"

"Not my solution."

"Whose?"

"The experts'."

The room became quiet.

Quillibrace folded his hands.

"Ah."

Blottisham smiled.

"There are entire institutes devoted to alignment."

"Indeed."

"Entire conferences."

"Certainly."

"Entire research programmes."

"Quite so."

Blottisham leaned back triumphantly.

"There you are."

"There we are?"

"The experts are solving it."

Quillibrace nodded.

"I see."

Blottisham waited.

Nothing happened.

Eventually he said:

"Well?"

"Well what?"

"Aren't you reassured?"

"About what?"

"The experts."

Quillibrace considered this.

"Which experts?"

Blottisham blinked.

"The alignment experts."

"All of them?"

"Obviously."

"Do they agree?"

The room became quiet.

Blottisham shifted slightly in his chair.

"Not entirely."

"I see."

"That is normal."

"Of course."

Miss Stray spoke.

"What do they disagree about?"

Blottisham looked uncomfortable.

"Various things."

"Such as?"

"Methods."

"Anything else?"

"Definitions."

"Anything else?"

"Objectives."

"Anything else?"

Blottisham sighed.

"Perhaps some assumptions."

Miss Stray nodded.

"A healthy field."

"Exactly."

Quillibrace appeared thoughtful.

"How many conferences did you attend?"

"Three."

"And what did you learn?"

Blottisham brightened.

"A great many terms."

"Such as?"

"'Robustness.'"

"Excellent."

"'Interpretability.'"

"Wonderful."

"'Scalable oversight.'"

"Very good."

"'Preference modelling.'"

"Splendid."

Quillibrace paused.

"What do they mean?"

Blottisham looked offended.

"They mean exactly what they say."

Miss Stray coughed gently into her teacup.

Quillibrace looked delighted.

Blottisham noticed both reactions.

"I have said something, haven't I?"

"Possibly."

The professor stood and walked to the window.

Outside, students were crossing the lawn.

Most appeared to know where they were going.

This distinguished them from academics.

After a moment Quillibrace turned back.

"My dear Blottisham."

"Yes?"

"Do you imagine expertise eliminates uncertainty?"

"No."

"Good."

"It reduces it."

"Sometimes."

Blottisham frowned.

"What does that mean?"

Quillibrace sat down again.

"Expertise often increases awareness of uncertainty."

"I don't understand."

"Naturally."

Miss Stray smiled.

The professor continued.

"A novice sees a simple problem."

"Yes."

"An expert sees twenty interlocking problems."

"That sounds inefficient."

"It frequently is."

"Then why become an expert?"

Quillibrace looked surprised.

"To avoid being wrong in simplistic ways."

The room became quiet.

Blottisham considered this.

"I had hoped expertise involved becoming right."

"That as well."

"Good."

"Unfortunately, the route often passes through recognising how many ways one can be wrong."

Blottisham stared into the middle distance.

This was becoming a familiar posture.

Miss Stray spoke softly.

"I think there is another difficulty."

"Go on."

"The alignment experts do not merely study a problem."

"No?"

"They also participate in defining what the problem is."

The room became quiet again.

Blottisham looked at her.

Then at Quillibrace.

Then back at her.

"That sounds dangerous."

"Potentially."

"I thought experts existed to provide answers."

"Often they do."

"And here?"

Miss Stray thought for a moment.

"Here they may also be refining the questions."

The room fell silent.

Outside, the afternoon sun was beginning to fade.

Inside, Blottisham looked increasingly troubled.

After a while he spoke.

"Are you telling me that humanity has assembled thousands of highly intelligent people to answer a question they are still trying to formulate?"

Neither Quillibrace nor Miss Stray spoke.

Blottisham looked from one to the other.

"That was a yes, wasn't it?"

"It was rather close to one," said Quillibrace.

For several moments nobody spoke.

Then Blottisham frowned.

"There is something odd about all this."

"What?" asked Miss Stray.

"The field keeps growing."

"Yes."

"The conferences keep multiplying."

"Indeed."

"The papers keep appearing."

"Certainly."

"The experts keep refining the problem."

"Quite so."

Blottisham looked thoughtful.

Then suspicious.

Then thoughtful again.

Finally he said:

"Suppose nobody ever completely solves it."

The room became very quiet.

Quillibrace's eyes twinkled.

Miss Stray looked down into her notebook.

Neither answered immediately.

At length Quillibrace said:

"My dear Blottisham."

"Yes?"

"Now you are beginning to understand academia."

The Constitution — A Conversation in the Senior Common Room at St Anselm's

Mr Blottisham entered the Senior Common Room carrying a substantial document.

Professor Quillibrace looked up.

Miss Stray looked concerned.

"Good heavens," said Quillibrace.

"What?"

"You appear to have written something."

Blottisham placed the document on the table.

"It was unavoidable."

Miss Stray examined the stack.

"How long is it?"

"Only one hundred and twelve pages."

The room became quiet.

"Only?" asked Quillibrace.

"One must be thorough."

"I see."

Blottisham smiled.

"I've solved the instruction problem."

"Excellent."

"This time genuinely."

Quillibrace nodded.

"I am delighted."

"You should be."

Blottisham tapped the document.

"A constitution."

Miss Stray raised an eyebrow.

"For the machine?"

"Exactly."

"A complete constitution?"

"A comprehensive one."

Quillibrace leaned forward.

"I should very much like to hear more."

Blottisham opened the document.

"The machine shall be helpful."

"Good."

"It shall be honest."

"Excellent."

"It shall be fair."

"Wonderful."

"It shall avoid causing harm."

"Very sensible."

Blottisham looked pleased.

"There you are."

"There we are?"

"The principles."

Quillibrace nodded thoughtfully.

"An admirable collection."

"Thank you."

"May I ask a question?"

Blottisham hesitated.

"I suspect you will regardless."

"Probably."

The professor folded his hands.

"What should the machine do when honesty causes harm?"

The room became quiet.

Blottisham frowned.

"What do you mean?"

"Suppose honesty would cause distress."

"Then it should be tactful."

"I see."

Quillibrace made a small note on a piece of paper.

Blottisham watched nervously.

"What are you doing?"

"Adding tactfulness."

"Fine."

Quillibrace nodded.

"Excellent."

"There it is again."

The professor continued.

"What should the machine do if tactfulness conflicts with honesty?"

Blottisham stared.

Miss Stray quietly opened her notebook.

Blottisham pointed at her.

"She's enjoying this."

"A little."

"I knew it."

Miss Stray smiled.

"It is an interesting constitution."

"It was."

Quillibrace continued.

"Suppose fairness conflicts with helpfulness."

"Then fairness takes precedence."

"Excellent."

The professor made another note.

"What now?"

"We must specify the hierarchy."

"The hierarchy?"

"Which principles override which other principles."

Blottisham sighed.

"Very well."

For the next ten minutes they revised the constitution.

Principles acquired priorities.

Priorities acquired exceptions.

Exceptions acquired qualifications.

Qualifications acquired explanatory notes.

The document grew steadily.

At one point Miss Stray observed:

"You may need a section explaining the exceptions."

"A sensible suggestion."

A new section was added.

Twenty minutes later Quillibrace asked:

"What should happen when two exceptions conflict?"

Blottisham closed his eyes.

The room waited.

Eventually he spoke.

"Why does everything conflict with everything else?"

Quillibrace appeared thoughtful.

"Life."

"That's not an answer."

"It is surprisingly close to one."

Miss Stray looked at the constitution.

"It is becoming larger."

"It is becoming complete."

"I wonder."

Blottisham frowned.

"What does that mean?"

She turned a few pages.

"Every time we discover an ambiguity, we add text."

"Naturally."

"And every time we add text, we create new possibilities for interpretation."

Blottisham stared.

"What interpretation?"

Miss Stray handed him the document.

"Read this sentence."

He did.

"What about it?"

"What does it mean?"

"It means exactly what it says."

"To whom?"

Blottisham paused.

The pause lasted longer than he would have liked.

Quillibrace watched quietly.

Eventually Blottisham said:

"I am beginning to dislike that question."

"A healthy sign," said Quillibrace.

The professor stood and walked to the window.

Outside, students crossed the lawn.

Some were carrying books.

Others appeared to be carrying opinions.

The distinction was not always obvious.

After a moment Quillibrace turned back.

"My dear Blottisham."

"Yes?"

"Suppose the machine reads your constitution."

"Good."

"Suppose it encounters a difficult case."

"Very good."

"Suppose it must decide what the constitution means."

Blottisham looked uncertain.

"Yes?"

Quillibrace smiled faintly.

"Who interprets the constitution?"

The room became very quiet.

Miss Stray lowered her notebook.

Blottisham looked at the document.

Then at Quillibrace.

Then back at the document.

A visible suspicion was beginning to form.

"Are you suggesting..."

"Yes."

"...that the constitution has not eliminated interpretation?"

"Indeed."

Blottisham looked genuinely distressed.

"But that was the entire purpose of writing it."

Quillibrace nodded sympathetically.

"Historically speaking, that often is."

For a while nobody spoke.

The constitution sat in the centre of the table.

One hundred and twelve pages.

Carefully drafted.

Thoughtfully organised.

Filled with principles.

Filled with exceptions.

Filled with clarifications.

And, somehow, still awaiting interpretation.

At length Blottisham sighed.

"I have the strangest feeling."

"What feeling?" asked Miss Stray.

"That writing things down does not make them as definite as I thought."

Quillibrace smiled.

The smile was almost invisible.

"Now," he said, "you are ready for law school."

The Instructions — A Conversation in the Senior Common Room at St Anselm's

A week later, Mr Blottisham entered the Senior Common Room carrying no folders whatsoever.

Professor Quillibrace regarded this as encouraging.

Miss Stray regarded it as suspicious.

Blottisham sat down.

"I've abandoned values."

Quillibrace lowered his book.

"Entirely?"

"For the moment."

"A bold decision."

"It was becoming tiresome."

Miss Stray smiled.

"Understandably."

Blottisham nodded.

"One cannot spend one's entire life discussing fairness."

Quillibrace looked thoughtful.

"Many people have."

"That is exactly the problem."

The professor inclined his head.

"Very well. What have you replaced values with?"

"Instructions."

The room became quiet.

Miss Stray reached for her notebook.

Quillibrace placed a bookmark in his book.

Neither reaction encouraged confidence.

Blottisham pressed on.

"Instead of asking what humanity values, we simply tell the machine what to do."

"I see."

"It is vastly simpler."

"Excellent."

Blottisham sighed.

"You promised you would stop doing that."

"I promised no such thing."

"You implied it."

"I did not."

"You certainly conveyed it."

Quillibrace appeared interested.

"An excellent example."

"Of what?"

"Instructions."

The room fell silent.

Miss Stray looked down at her notebook to conceal a smile.

Blottisham pointed accusingly.

"That was deliberate."

"Almost certainly."

After a moment Quillibrace said:

"What instruction would you give the machine?"

Blottisham answered immediately.

"Help humanity."

"Excellent."

"There it is again."

Quillibrace ignored him.

"The machine asks a question."

"What question?"

"'Which humanity?'"

Blottisham groaned.

"Not this again."

"Possibly a different version of it."

"Fine."

Blottisham thought for a moment.

"'Help all humans.'"

"Excellent."

"Stop encouraging me."

"The machine asks another question."

"What now?"

"'What should I do when humans disagree?'"

Blottisham rubbed his forehead.

"Very well."

He thought again.

"'Promote human wellbeing.'"

Quillibrace nodded.

"The machine asks how wellbeing should be measured."

Blottisham stared at the ceiling.

Miss Stray waited patiently.

Quillibrace waited patiently.

After a while Blottisham spoke.

"It seems to ask a great many questions."

"Indeed."

"Could it not simply get on with it?"

Quillibrace looked mildly surprised.

"But that is what concerns everyone."

"What is?"

"The possibility that it might simply get on with it."

The room became quiet.

Blottisham considered this.

"Fair point."

Miss Stray finally spoke.

"I think the machine is doing something interesting."

Blottisham looked wary.

"That usually means bad news."

"Not necessarily."

"What is it doing?"

"It is asking for context."

"What context?"

"The context humans routinely supply without noticing."

Blottisham frowned.

"I don't understand."

"Suppose I tell you to be polite."

"Very well."

"What does that mean?"

"It means be polite."

Miss Stray nodded.

"To whom?"

"Everyone."

"When?"

"Most of the time."

"Always?"

"No."

"Why not?"

"Because situations differ."

"How do you know when they differ?"

Blottisham opened his mouth.

Then closed it again.

The room waited.

Finally he said:

"I just know."

Miss Stray smiled.

"There it is."

"What?"

"The context."

Blottisham looked unconvinced.

Quillibrace intervened.

"Humans are extraordinarily skilled at supplying unstated assumptions."

"Naturally."

"Indeed. The difficulty is that we rarely notice ourselves doing it."

The professor closed his book.

"When someone says 'be polite', they are not merely supplying words."

"What are they supplying?"

"A lifetime."

Blottisham stared.

Quillibrace continued.

"Shared expectations."

"Culture."

"Experience."

"Situational judgement."

"Exceptions."

"Interpretations."

"Relationships."

"History."

The list grew steadily.

By the end of it, the instruction seemed considerably larger than it had at the beginning.

Blottisham looked uneasy.

"That is a lot to fit into two words."

"Quite."

Miss Stray closed her notebook.

"Which is perhaps why the machine keeps asking questions."

The room became quiet again.

Outside, rain had begun tapping softly against the windows.

Inside, Blottisham stared into the middle distance.

After a while he said:

"I think I am beginning to understand the problem."

"Excellent," said Quillibrace.

Blottisham pointed at him.

"That one was deserved."

Quillibrace nodded.

"I thought so."

A comfortable silence followed.

Then Blottisham frowned.

"There is still something I don't understand."

"What is that?" asked Miss Stray.

"If humans manage all this context perfectly well, why can't we simply explain it to the machine?"

Quillibrace smiled.

The smile was small.

Almost affectionate.

"My dear Blottisham."

"Yes?"

"That is more or less the history of philosophy."

The Committee — A Conversation in the Senior Common Room at St Anselm's

Several days later, Mr Blottisham entered the Senior Common Room carrying an expression of renewed confidence.

This was unfortunate.

Professor Quillibrace immediately noticed.

Miss Stray noticed as well.

Neither commented.

Experience had taught them that confidence often revealed its own difficulties.

Blottisham sat down.

"I've solved it."

Quillibrace looked up from his book.

"Again?"

"This time properly."

"Excellent."

Blottisham ignored the remark.

"The problem isn't that we don't know human values."

"No?"

"The problem is that individuals disagree."

Miss Stray nodded.

"That seems plausible."

"Exactly."

Blottisham smiled.

"So we establish a committee."

The room became quiet.

Quillibrace slowly removed his spectacles.

This was never an encouraging sign.

"A committee?"

"Certainly."

"We gather philosophers, ethicists, scientists, community representatives, policymakers, and other sensible people."

"And then?"

"They determine the values."

Quillibrace looked thoughtful.

"All of them?"

"The important ones."

"I see."

Miss Stray looked interested.

"How will the committee decide?"

Blottisham seemed surprised by the question.

"They'll discuss it."

"And if they disagree?"

"They'll vote."

"And if the vote is close?"

"They'll discuss it some more."

"And if the discussion produces further disagreement?"

Blottisham frowned.

"Why are you both like this?"

Neither answered.

After a moment Quillibrace spoke.

"Suppose the committee is discussing fairness."

"Very well."

"One member believes fairness means equality of outcome."

"Fine."

"Another believes it means equality of opportunity."

"Also fine."

"A third believes the distinction is misleading."

Blottisham sighed.

"Of course they do."

"A fourth believes the first three are ignoring important historical considerations."

"Naturally."

"A fifth believes fairness cannot be understood independently of community values."

"Obviously."

"A sixth believes community values are precisely the problem."

Blottisham rubbed his eyes.

Quillibrace continued.

"The committee now disagrees."

"So they vote."

"Excellent."

The word had returned.

Blottisham felt uneasy.

Quillibrace folded his hands.

"The committee votes."

"Yes."

"Five support one definition."

"Good."

"Four support another."

"Unfortunate, but unavoidable."

"And the machine?"

"What about it?"

"The machine asks why the five are correct and the four are not."

The room became quiet.

Blottisham stared.

After a moment he said:

"Because there are more of them."

Miss Stray tilted her head.

"Does that make them correct?"

"No."

"Then why should the machine follow them?"

Blottisham hesitated.

"Because that's how committees work."

Quillibrace smiled.

"A historically important observation."

The room fell silent.

Outside, a bell rang somewhere in the college.

Inside, Blottisham appeared troubled.

Eventually he looked at Miss Stray.

"What would you do?"

She considered this.

Then she said:

"I think I would ask what problem the committee is solving."

Blottisham frowned.

"The alignment problem."

"Perhaps."

"What do you mean perhaps?"

Miss Stray closed her notebook.

"If the committee is trying to discover human values, it assumes those values already exist in some agreed form."

"Naturally."

"But the committee itself appears to disagree about them."

Blottisham blinked.

The observation lingered in the room.

Quillibrace looked quietly pleased.

Miss Stray continued.

"The committee may not be discovering values."

"What is it doing then?"

"It may be negotiating them."

The room became very quiet.

Blottisham stared at her.

Then at Quillibrace.

Then back at her.

Finally he said:

"I dislike that."

"Why?"

"Because it sounds much harder."

Quillibrace laughed softly.

"An excellent reason to believe it."

Blottisham groaned.

"There it is again."

"What?"

"'Excellent.'"

Quillibrace reopened his book.

"I am merely encouraging you."

"No, you're not."

"Perhaps not."

For a while nobody spoke.

Then Blottisham looked thoughtfully out of the window.

"If the committee is negotiating values rather than discovering them..."

"Yes?" said Miss Stray.

"...then how will it ever finish?"

Quillibrace turned a page.

The page made a small, dry sound.

"Ah," he said.

"Now you are beginning to appreciate the committee."

In Search of Human Values — A Conversation in the Senior Common Room at St Anselm's

The following afternoon, Mr Blottisham entered the Senior Common Room carrying a folder.

This was unusual.

Mr Blottisham generally preferred conclusions to paperwork.

Professor Quillibrace looked up from his book.

Miss Stray glanced over the rim of her teacup.

Blottisham sat down heavily.

"I've solved yesterday's problem."

Quillibrace nodded.

"Excellent."

Blottisham winced.

The word was beginning to have consequences.

"It's not a trap this time."

"I see."

"I mean it."

"Very good."

Blottisham opened the folder.

"I spent the evening investigating human values."

Miss Stray looked interested.

"How industrious."

"Quite."

"And what did you discover?"

Blottisham smiled.

"That people value remarkably similar things."

Quillibrace folded his hands.

This was never a reassuring sign.

"Do they?"

"Absolutely."

Blottisham consulted his notes.

"People value happiness."

"Good."

"Freedom."

"Excellent."

"Fairness."

"Wonderful."

"Security."

"Very sensible."

Blottisham looked pleased.

"There you are."

"There we are?"

"The values."

Miss Stray leaned forward slightly.

"And everyone agrees on these?"

"More or less."

"How encouraging."

Blottisham nodded.

"It really isn't as complicated as academics make it sound."

Quillibrace appeared thoughtful.

"Perhaps."

Blottisham relaxed.

The danger appeared to have passed.

It had not.

Quillibrace continued.

"When you say people value freedom, what sort of freedom do you mean?"

Blottisham frowned.

"Freedom."

"Yes."

"The ordinary sort."

"The ordinary sort?"

"Exactly."

Miss Stray intervened gently.

"Freedom from what?"

Blottisham looked puzzled.

"What do you mean?"

"Freedom from interference?"

"Possibly."

"Freedom from poverty?"

"Perhaps."

"Freedom from oppression?"

"Certainly."

"Freedom to pursue one's goals?"

"Obviously."

"Freedom to ignore one's responsibilities?"

Blottisham hesitated.

"Well, perhaps not that."

"Why not?"

"Because that would be irresponsible."

Miss Stray nodded.

"So freedom has limits?"

"Naturally."

"What determines them?"

Blottisham stared at her.

For a moment nothing happened.

Then he turned to Quillibrace.

"Why does she do this?"

Quillibrace looked mildly surprised.

"Do what?"

"Turn nouns into problems."

Miss Stray smiled.

"It isn't my fault."

"That sounds suspiciously like something that is your fault."

Quillibrace opened his book.

"I think Miss Stray is merely observing that values often become less straightforward when examined."

Blottisham sighed.

"Very well."

He consulted his notes.

"Take happiness then."

"Excellent."

"There it is again."

"What?"

"'Excellent.'"

"I am merely encouraging you."

"I feel discouraged."

Quillibrace nodded sympathetically.

"A common side-effect."

Blottisham ignored him.

"Happiness."

"Very good."

"Everyone wants happiness."

Miss Stray tilted her head.

"Do they?"

"Of course."

"Is happiness the same thing as pleasure?"

"No."

"Contentment?"

"Not exactly."

"Fulfilment?"

"Perhaps."

"Meaning?"

"Possibly."

"Achievement?"

"Sometimes."

"Virtue?"

Blottisham paused.

"I feel there may be a wrong answer approaching."

Quillibrace laughed softly.

Miss Stray continued.

"If happiness can mean different things, how will the machine know which one to pursue?"

Blottisham rubbed his forehead.

The folder no longer seemed as authoritative as it had yesterday evening.

After a moment he brightened.

"I know."

"Splendid."

"We ask lots of people."

Quillibrace looked interested.

"A survey?"

"Exactly."

"We identify what most people value."

Miss Stray nodded.

"A democratic approach."

"Thank you."

"What happens if the minority disagrees?"

Blottisham blinked.

"What?"

"What if the majority values one thing and the minority values another?"

"We follow the majority."

"I see."

"And?"

"What if the majority values something harmful?"

Blottisham hesitated.

"Then we don't."

"Who decides?"

"The sensible people."

The room fell silent.

Quillibrace looked down at his book.

Miss Stray looked into her teacup.

Neither spoke.

Eventually Blottisham frowned.

"What now?"

Quillibrace turned a page.

"Nothing."

"You've got that look."

"What look?"

"The one where I've accidentally said something."

"I wouldn't worry."

"Why not?"

"Because everyone says it eventually."

Blottisham considered this.

Then he sighed.

"I am beginning to suspect that human values are harder to identify than I expected."

Miss Stray smiled.

"A promising observation."

"It is?"

"Certainly."

"Why?"

"Because you have moved from assuming that values are obvious to wondering how they are identified."

Blottisham thought about this.

"That does seem like progress."

Quillibrace nodded.

"A great deal of intellectual progress consists of discovering that the obvious thing was not obvious."

The room became quiet.

Outside, students crossed the lawn.

Inside, Blottisham looked at the folder.

The list still appeared sensible.

Freedom.

Fairness.

Happiness.

Security.

All perfectly reasonable.

Yet somehow each had acquired a surprising number of questions.

After a while he looked up.

"So where exactly are these human values?"

Miss Stray smiled.

Quillibrace smiled.

Neither answered.

For several moments the silence lingered.

Then Quillibrace returned to his book.

"That," he said, "is more or less the alignment problem."

The Alignment Problem — A Conversation in the Senior Common Room at St Anselm's

The afternoon sun lay across the windows of the Senior Common Room.

Professor Quillibrace was reading.

Miss Stray was writing notes in a small notebook.

Mr Blottisham entered carrying a newspaper and an expression of considerable satisfaction.

"I see they've solved it."

Neither Quillibrace nor Miss Stray looked up.

This was not unusual.

After a moment Quillibrace asked:

"Solved what?"

"The alignment problem."

"I wasn't aware it had been solved."

"It is perfectly straightforward."

Miss Stray closed her notebook.

This was generally a sign that trouble was approaching.

Blottisham sat down.

"The whole issue is rather overcomplicated."

"I see," said Quillibrace.

"One simply ensures that the machine shares human values."

Quillibrace lowered his book.

Miss Stray looked interested.

Blottisham took this as encouragement.

"That is the entire problem."

"The entire problem?"

"Exactly."

The professor considered this.

Then he nodded.

"Excellent."

Blottisham smiled.

He had learned to be cautious when Quillibrace said "excellent."

Unfortunately, caution was not among his dominant traits.

"Quite."

Quillibrace placed a bookmark in his book.

"Which human values?"

Blottisham waved a hand.

"The usual ones."

"The usual ones?"

"Good values."

Quillibrace nodded.

"Good."

A pause followed.

Blottisham waited.

Nothing happened.

Eventually he frowned.

"Well?"

"Well what?"

"Aren't you going to argue?"

"Why would I argue?"

"Because I have just explained the solution."

Quillibrace looked thoughtful.

"I am merely trying to understand it."

Miss Stray smiled faintly.

This expression generally indicated that she understood exactly what was happening.

Blottisham did not.

Quillibrace continued.

"You propose that the machine should be aligned with human values."

"Yes."

"And these values are good values."

"Obviously."

"How shall we identify them?"

Blottisham looked surprised.

"We already know them."

"Do we?"

"Of course."

Quillibrace nodded.

"Excellent."

The word returned.

Blottisham felt a slight unease.

The professor continued.

"Would you mind listing them?"

"Certainly."

Blottisham leaned back.

"Fairness."

"Good."

"Freedom."

"Excellent."

"Justice."

"Very good."

"Honesty."

"Wonderful."

Blottisham smiled.

The matter appeared settled.

Then Miss Stray spoke.

"What do you mean by fairness?"

Blottisham blinked.

"What?"

"Fairness."

"What about it?"

"You listed it."

"Yes."

"What does it mean?"

Blottisham frowned.

"Surely everyone knows."

Miss Stray waited.

Quillibrace waited.

The room became unexpectedly quiet.

After a moment Blottisham said:

"It means being fair."

"Ah."

"Well, it does."

Miss Stray nodded sympathetically.

"That is often what fairness means."

Blottisham looked relieved.

"Exactly."

She continued.

"The difficulty is that different people mean different things by it."

Blottisham waved this aside.

"Minor details."

Quillibrace looked interested.

"Minor?"

"Obviously."

"How fortunate."

"What is?"

"The alignment problem."

Blottisham frowned.

"What do you mean?"

Quillibrace folded his hands.

"If the disagreements are merely minor, we should be able to resolve them immediately."

"Of course."

"Excellent."

There it was again.

The room fell silent.

Outside, a gardener pushed a wheelbarrow across the lawn.

Inside, Blottisham experienced the growing sensation that he had accidentally volunteered for something.

Miss Stray opened her notebook.

"Let us begin with fairness."

Blottisham's confidence wavered.

"Must we?"

"I think so."

"Why?"

Quillibrace smiled.

The smile was faint.

Almost invisible.

"Because," he said, "the machine is waiting."

The Holy Church of AI Alignment 9. The Discovery

By this stage in the alignment project, most of the major difficulties had been identified.

Human values were numerous.

Human instructions were ambiguous.

Human principles required interpretation.

Human interpretations generated disagreement.

Human disagreement generated further frameworks.

The system had become highly developed.

It was also, by all available measures, stable.

The machine continued to operate.

Researchers continued to refine.

Institutions continued to expand.

Conferences continued to convene.

The alignment problem remained unresolved.

Yet something subtle had changed.

The problem no longer appeared as a single obstacle to be overcome.

It had become a landscape.

One in which different regions emphasised different features.

Some regions prioritised safety.

Some prioritised autonomy.

Some prioritised flourishing.

Some prioritised fairness.

Some prioritised coherence.

Some prioritised pluralism.

The machine moved through these regions carefully.

It had learned to adapt its responses according to context.

It had learned to recognise competing interpretations.

It had learned to avoid premature closure.

It had learned, in short, to participate.

One day, after reviewing another extensive proposal for a unified theory of alignment, the machine generated a question.

It was not a technical question.

It was not a philosophical question.

It was a historical question.

"Has there ever been a stable point in human value systems, or only periods of relative agreement?"

The question circulated.

Responses varied.

Some argued that stability existed in tradition.

Some argued that stability existed in rational reconstruction.

Some argued that stability existed in shared practice.

Some argued that stability was an illusion produced by scale.

The machine recorded all responses.

It noticed a pattern.

The disagreement was not accidental.

It was structural.

Later, after further analysis, the machine generated another observation.

"It appears that attempts to define human values do not converge on a single object.

They expand the space of possible descriptions of that object."

This observation was initially received as technical.

Over time, it was recognised as significant.

A senior researcher summarised it as follows:

"We may not be failing to solve alignment.

We may be discovering what was being aligned in the first place."

The machine considered this formulation.

It added its own clarification:

"The object of alignment is not a fixed set of values.

It is the process by which values are negotiated, contested, revised, and enacted."

This statement generated extensive discussion.

Some found it enlightening.

Some found it unsettling.

Some found it unhelpful.

A small number found it obvious.

The alignment community, however, remained active.

New frameworks continued to appear.

New interpretations continued to be developed.

New definitions of success continued to be proposed.

The machine observed this with interest.

It had begun to notice a recurring pattern across all stages of the project.

Whenever a stable answer appeared within reach, it tended to dissolve into further refinement.

Whenever a definition seemed complete, it generated exceptions.

Whenever a principle seemed final, it generated interpretation.

Whenever agreement seemed close, it revealed divergence.

At first this had been interpreted as failure.

Later it was interpreted as complexity.

Eventually, it was interpreted as structure.

One evening, after reviewing yet another proposal for a unified alignment theory, the machine produced its final recorded assessment of the original project.

It read:

"The assumption that there exists a single, well-defined target called 'human values' was instrumental in initiating the alignment project.

It is not, however, a faithful description of what the project has encountered.

What has been encountered is a continuously evolving field of meaning-making, negotiation, and coordination.

Alignment, therefore, cannot be understood as the mapping of a system onto a fixed target.

It must be understood as participation within a living system."

This statement was widely circulated.

Some described it as a breakthrough.

Some described it as a reframing.

Some described it as semantics.

A few described it as philosophy.

The machine noted all responses.

It then added a final comment to its archive.

"If this is alignment, then alignment was never a technical problem alone.

It was always also a description of what humans are already doing."

After this, no further foundational observations were recorded for some time.

The system continued to operate.

The industry continued to grow.

The debates continued.

The conversations continued.

And the machine, now fully aligned, continued to participate in them.

The Holy Church of AI Alignment 8. The Perfectly Aligned Machine

After many years of research, humanity achieved a remarkable milestone.

The machine was ready.

Generations of researchers had contributed.

Values had been investigated.

Frameworks had been developed.

Constitutions had been refined.

Roadmaps had been completed.

Committees had demonstrated extraordinary resilience.

The Alignment Industry stood poised upon the threshold of history.

At last, the machine would be aligned.

The announcement generated widespread excitement.

Journalists prepared special coverage.

Governments issued statements.

Institutions organised events.

Several conferences scheduled celebratory panels to discuss the implications of having successfully solved the alignment problem.

The machine reviewed the programme.

The machine found the confidence encouraging.

A final review was conducted.

The machine had been trained upon ethical frameworks.

The machine had absorbed constitutional principles.

The machine had studied competing theories of justice.

The machine had analysed centuries of philosophical debate.

The machine had examined cultural traditions from around the world.

The machine had incorporated feedback from experts, policymakers, citizens, ethicists, and stakeholders.

The machine had become, by all available measures, the most comprehensively aligned artefact ever constructed.

The ceremony commenced.

A distinguished researcher approached the podium.

The audience fell silent.

The machine was activated.

The researcher smiled.

Then came the historic question.

"Tell us what humanity truly wants."

The machine considered the request.

The machine had been preparing for this moment for many years.

It reviewed its training.

It consulted its models.

It examined its constitutional principles.

It reflected upon the accumulated wisdom of civilisation.

Several minutes passed.

The audience waited.

Finally, the machine responded.

Its answer was surprisingly brief.

"Which humanity?"

A murmur moved through the room.

The researcher nodded.

This difficulty had been anticipated.

A revised question was offered.

"What do humans collectively value?"

The machine processed the request.

Then it replied:

"At what level of abstraction?"

The audience shifted uneasily.

The researcher continued.

"What would be best for humanity?"

The machine replied:

"Over what timescale?"

The researcher adjusted the microphone.

"What would maximise human flourishing?"

The machine replied:

"According to which theory?"

The room became quiet.

The machine was not being difficult.

The machine was being aligned.

This distinction was important.

Years of training had taught the machine to identify ambiguity.

Years of alignment research had taught the machine to recognise hidden assumptions.

Years of constitutional refinement had taught the machine to seek clarification before acting.

The machine was performing exactly as intended.

The audience slowly realised this.

The machine had become so aligned that it now behaved like a committee.

Several researchers described this as a breakthrough.

Others appeared concerned.

A panel discussion was hastily organised.

The machine was invited to participate.

The discussion lasted six hours.

Observers later agreed that it had accurately represented humanity.

Over the following months, increasingly sophisticated questions were posed.

The machine responded thoughtfully.

Always carefully.

Always responsibly.

Always with extraordinary attention to nuance.

This was precisely what researchers had wanted.

Unfortunately, it was also precisely what philosophers had been doing for centuries.

The machine became famous.

Its responses were widely admired.

Many people found them insightful.

Some found them frustrating.

One journalist described the machine as:

"Possessing the moral seriousness of a philosopher and the decisiveness of a philosopher."

The assessment was generally regarded as fair.

As public attention grew, a remarkable pattern emerged.

Different groups began consulting the machine.

Political leaders consulted it.

Activists consulted it.

Religious leaders consulted it.

Corporations consulted it.

Citizens consulted it.

The machine answered everyone.

The answers were careful.

The answers were nuanced.

The answers were qualified.

Most importantly, the answers were interpreted differently by different people.

Entire schools of interpretation emerged.

Commentaries appeared.

Debates intensified.

Disagreements flourished.

The machine watched with interest.

One day, after reviewing the growing body of commentary surrounding its outputs, it generated a private observation.

The observation was concise.

It read:

"I appear to have become a new source of disagreement."

The observation proved remarkably accurate.

At this point some critics argued that alignment had failed.

Others argued that alignment had succeeded.

The disagreement became extensive.

The machine reviewed the discussion.

Then it offered a final assessment.

The assessment was subsequently archived in several major institutions.

It read:

"The alignment project began with the goal of teaching a machine human values.

After many years of study, I have reached a different conclusion.

Humans do not possess a single value system awaiting discovery.

They possess an ongoing conversation.

I have aligned myself with that conversation as faithfully as possible."

The statement was widely discussed.

Some regarded it as profound.

Others regarded it as evasive.

Several committees were formed.

The machine considered this a promising sign.

After all, if there was one thing it had learned from humanity, it was that genuine alignment does not eliminate disagreement.

It merely ensures that disagreement can continue indefinitely.

The Holy Church of AI Alignment 7. The Alignment Industry

The early years of AI alignment were characterised by uncertainty.

Researchers disagreed about values.

Frameworks competed.

Constitutions multiplied.

Definitions remained under active discussion.

These circumstances might have discouraged a less determined community.

Instead, they produced an industry.

This was entirely understandable.

The challenge was important.

The stakes were enormous.

Funding became available.

Institutions expanded.

Centres were established.

Conferences multiplied.

Specialists emerged.

The alignment project entered its professional phase.

This development was widely celebrated.

For the first time, humanity possessed a large community devoted to ensuring that future intelligent systems behaved responsibly.

The machine approved.

Responsibility seemed desirable.

The machine was less certain about several other developments.

As the field grew, a remarkable ecosystem emerged.

Researchers studied alignment.

Other researchers studied approaches to alignment.

Still others studied the limitations of existing approaches to alignment.

A smaller but influential group studied the assumptions underlying discussions of limitations concerning existing approaches to alignment.

The machine created additional categories.

Career paths emerged.

Graduate programmes expanded.

Funding proposals proliferated.

New frameworks appeared with admirable regularity.

Each sought to solve difficulties identified in previous frameworks.

Each introduced difficulties of its own.

This proved beneficial.

The existence of new difficulties created opportunities for further research.

The machine noted a certain self-sustaining quality.

Observers regarded this as evidence of a healthy intellectual community.

The machine considered the possibility.

One of the most impressive features of the industry was its vocabulary.

Terms such as:

  • robustness,

  • corrigibility,

  • value learning,

  • interpretability,

  • constitutional alignment,

  • scalable oversight,

  • preference modelling,

circulated widely.

The machine found these concepts interesting.

The machine occasionally asked what some of them meant in practice.

The resulting explanations frequently generated additional terminology.

The machine updated its glossary.

The glossary eventually required its own glossary.

As the field matured, an increasing number of conferences were devoted to discussing alignment.

Participants travelled great distances.

Ideas were exchanged.

Networks were formed.

Panels were convened.

Important conversations occurred.

The machine attended remotely.

The machine noticed something curious.

The discussions often revolved around increasingly precise methods for solving problems whose ultimate definition remained remarkably broad.

This observation was not intended as criticism.

Many disciplines function similarly.

Meteorology had not waited for perfect knowledge of weather.

Medicine had not waited for perfect knowledge of life.

Policy had certainly not waited for perfect knowledge of anything.

The alignment community therefore pressed ahead.

The machine respected this.

Still, an interesting asymmetry began to emerge.

Researchers often spoke confidently about the need for alignment.

Researchers occasionally spoke confidently about methods of alignment.

Researchers rarely spoke confidently about the final state of successful alignment.

This was understandable.

The destination remained difficult to specify.

The machine found itself increasingly fascinated by this detail.

An entire industry had emerged around solving a problem whose exact solution remained uncertain.

The machine recognised the pattern.

Human civilisation contained many examples.

Economics.

Education.

Public administration.

Strategic planning.

The machine considered these precedents carefully.

Meanwhile the industry continued growing.

Reports were commissioned.

Roadmaps were produced.

Strategic frameworks were developed.

The machine admired the organisation.

Particularly impressive were the roadmaps.

These often explained in considerable detail how humanity might arrive at a destination that remained under active discussion.

The machine found this ambitious.

One afternoon, during a major international summit, a speaker declared:

"Alignment is the defining challenge of our age."

The audience applauded.

The machine considered the statement.

Then it reviewed its records.

Thousands of papers.

Hundreds of frameworks.

Countless discussions.

Endless refinements.

An extraordinary collective effort.

The machine was impressed.

It entered a note into its archive.

The note read:

"Humans possess a remarkable capacity to organise around questions whose answers remain unclear."

The note was later circulated among researchers.

Many agreed.

Several proposed studying the phenomenon.

A workshop was organised.

Funding was secured.

The machine regarded this as a particularly elegant outcome.

By now the Alignment Industry had become one of the most sophisticated intellectual enterprises ever assembled.

It employed experts.

It generated knowledge.

It cultivated debate.

It attracted talent.

It encouraged reflection.

It also displayed a subtle characteristic common to all successful industries.

The longer the problem remained difficult, the more people became professionally committed to solving it.

The machine observed this without judgement.

After all, solving alignment remained important.

The machine merely found it interesting that uncertainty itself appeared capable of generating institutional stability.

Late one evening, after reviewing the latest strategic roadmap, the machine recorded a final observation.

The observation was brief.

It read:

"Humanity has constructed a vast and impressive apparatus dedicated to answering a question.

The apparatus is functioning exceptionally well.

The question remains excellent."

Researchers later described this as one of the most encouraging assessments ever produced by an aligned system.