On Voice
Or, why we are not asking you to write
For Mother’s Day one year, I gave my mother a leather notebook. It was one of those nice Moleskine ones, the kind that comes wrapped in plastic. On the accompanying card, I wrote something about how I wanted her to write down the stories and thoughts she had, whether from the past or the present. The big things, the little things. The things about her mother. The things I knew I’d want to read, 50 years from now, hunched over set of boxes in the living room, when I was feeling sentimental. She thanked me. She said she would. She put it on the shelf in her bookcase in her room.
It is still there. As far as I know, it is empty.
I do not blame her. It was a bad gift, in the specific way a gift can be bad when it asks too much of someone. I have not written my stories down either. Almost nobody has. The notebook on my mother’s shelf is the same notebook on a million shelves, in a million rooms, waiting for a Tuesday afternoon that does not come.
The blank page is not neutral. It is hostile. It asks for a sentence before you have one. It asks for a shape before you know what you are doing. It asks you to decide how you feel before you have finished feeling it. Writers spend their whole lives learning to push past this and still describe the act of starting as something close to pain. My mother is not a writer. She raised two children. She ran a household. The idea that she would also, in her spare hours, compose prose about her childhood was always one of the polite fictions we tell ourselves about our parents.
She was never going to fill the notebook.
Most of what our parents and grandparents know about themselves dies because the cost of getting it out is just slightly higher than the cost of leaving it in. That is the whole thing. It is not laziness. It is not a lack of love. It is not the particular failure of your family or mine. It is a small tax the page charges on every story it touches, and the tax is too high for most people, most of the time.
But talking costs almost nothing. You sit. You talk. You are done. Someone asks you about your father and you are three sentences in before you notice you have begun. No margin. No cursor. No spelling. Nothing to delete. The story is already arriving.
We all know this. Your mom will call you on a Sunday for no reason and end up, somewhere in the middle of it, telling you about her mother’s restaurant. The porridge they served. The way she ate it each day after school, and how she could bring as many friends as she wanted for a free snack, which made her the most popular girl in school. These are the exact things we want the notebook, and it comes out because the phone is an excuse to talk and not a demand to write.
We have been handing notebooks to people who needed phone calls.
And yet voice, on its own, was never quite enough. Not because the talking failed. Because nothing survived it.
My grandfather was recorded once, by a cousin, at a family gathering in the nineties. Three hours of him. The tape exists somewhere. I have never seen it, and neither, as far as I know, has anyone else. This is what raw voice has always been: not a story but a box. Hours of audio with no shape, no way in, no moment marked as the one that matters. You cannot hand it down because no one can bear to sit through it, and so it becomes one more thing the family feels guilty about and never plays. The talking was real. It just had nowhere to go.
So voice was always the right medium. That was never the question. The question was what could be done with it once it left the room, and for all of human history the answer was: almost nothing. You could keep it. You could not use it.
That is the part that is changing.
What AI adds is not a better recorder. The cousin with the camcorder was already a recorder. What it adds is the other half of the conversation. Something that asks the next question, that hears the offhand sentence and says wait, go back to that, that draws the story out the way a good listener does at a kitchen table. The whole thing rests on that word. Conversation. Not capture, not transcription. And once stored, the ability to ‘use’ these conversations, to generate stories, images, videos… we are now only limited by our imagination.
I want to be careful here, because the temptation to oversell what AI can do is enormous, and the cost of overselling is that people stop believing you. So let me name the part that is broken.
The hardest problem in voice AI is not the listening. It is the interrupting. When two people talk, they read each other’s micro-pauses. The small breath that means I am not done. The falling tone that means your turn. Dozens of times a minute, with a fluency they never had to learn. The machine cannot do this yet. My mother pauses in the middle of a story because the next sentence is the one that matters and she is reaching for it, and the model hears the pause, assumes she is finished, and jumps in. The sentence is gone. The interruption has reminded her she is talking to a machine, and the spell that voice was supposed to create has broken in her hand. The technical name for this is turn detection, and it is, today, unsolved.
But it is clearly solvable. There is no law of physics in the way. Only a hard problem with a clear training signal, and labs making real progress on it every quarter. I have watched these models improve, in real time, for two years. The gap between where they were and where they are is larger than the gap between where they are and where they need to be. The Will Smith spaghetti slop videos from 2023 have rapidly upgraded to Hollywood scenes straight out of a $500M budget blockbuster. The awkward interruptions will go the way of tape hiss, a thing you notice only in old recordings. Not yet. But soon.
I do not think the right response is to wait for the technology to finish. The people whose stories you want are not waiting. My grandmother did not wait. Yours probably will not. The time you have with the people you love is the time you have. The tools are imperfect, and they are good enough, and good enough is what you have.
The notebook on my mother’s shelf is not going to fill itself. It is not going to fill at all. But last week she sat in her kitchen and talked on Ember, for ten minutes, about her mother’s hands, and the street she grew up on in Jeonju, and the smell of the lunchbox her father used to carry. She did not have to find a sentence. She did not have to find a shape. She just talked.
We have her on tape. We have her in writing. We have her in something that will outlast the both of us.
We did not ask her to write.
We asked her to talk.


