Essay:
cc
When it comes to AI alignment in the context of work, we're focusing too much on 'task alignment'. The reason we do tasks is often not out of obligation alone, but because we think they will lead to what we want in the long run. This writing explores the idea of aligning AI agents with goals and motives. Before we can do that, I'll give both of them a definition, because motives are particularly underexplored at work.
A goal is a specific, planned target or desired outcome that an individual or organization strives to achieve, often involving long-term effort.
Notice that a motive can be something that gives a goal its meaning. My goal could be to hit revenue targets, but my motive is to provide for my family and maybe retire someday. I could have other motives as well that would never be reflected explicitly in my goals, such as to improve my social standing. I'm not going to introduce a company OKR to track 'go on a fancier vacation next winter', or talk about 'revenge against my teachers who doubted me in high school' at an all hands meeting. And yet the more autonomous AI becomes performing tasks on our behalf, the more it needs to understand these ideas.
Alignment in the broad sense requires understanding our goals and motives, because they are what drive our behavior and our decisions. The challenge is that we only sometimes talk about goals and we rarely talk about motives (besides socially acceptable platitudes, like providing for our family). Our focus on alignment today excessively emphasizes 'reliable completing of tasks' at the expense of 'leads to the thing that I ultimately want' when in practice the relationship between the two can be unclear.
If we want to align the AI that we use at work, we need to recognize the importance of motives and goals. As work becomes more automated, we move along a continuum from task alignment to goal alignment. I think for a few reasons I'll get into, motive alignment of AI is going to be a permanent long term challenge that proves elusive. Many of our own motives are not well understood, and therefore difficult to communicate in written form. We 'know it when we don't see it' (e.g. we know when we see examples of actions that are not aligned with our motives). But it's harder to pin down in language for an AI.
We believe through our research that there is going to always be a meaningful human role in the loop in order to exercise taste and judgement. This is the filter we apply to ensure tasks are worth doing at all. We have to decide what to do, how well to do it, who to do it for, which things take priority and so on. Our motives and goals are a contextual backdrop to the judgement and taste we exercise in work. As AI makes more work happen in less time, it puts pressure on us to develop a greater understanding of our goals.
Part of the reason we're focused more on goal alignment as the final frontier of AI alignment at work is that if you did try to align AI to your motives, it might actually change how satisfying it was to achieve. If the AI is the one struggling on your behalf, we might not feel as satisfied as if we realized the outcome by our own efforts. People often experience this when they first become managers. The idea of having impact through others, rather than being able to put your own name on things, can be unsatisfying. The same would be true to an even greater extent if an AI pursued your underlying desires on your behalf.
We've written in the past that in order to learn, the individuals within a group must struggle with decisions. That struggle challenges us to learn and grow our knowledge, skills and abilities. The AI needs to understand our goals in order to be aligned and determine which tasks are worth doing. Our motives are best left in the realm of human judgement. Our job is to explore what our values truly are. To not allow things that don't meet our standards to reach our customers. How motives translate into goals is the work of people, and how goals translate into tasks for our customers is the work of AI with human guidance.
We believe this should be true regardless of the level of capabilities that AI has, which means that we can confidently shift our focus from tasks to goals over time as AI becomes more reliable and safe to use.
Your message here
Thanks for getting in touch, I'll reply shortly!