Receiver experience

Read the question, speak the answer.

The fastest way to ask is to write. The fastest way to answer is to talk. HeySpeak combines both into a 60-second exchange that beats forms, calls, and email threads.

Try it on a real question

5 free responses, no credit card

Why the read-and-speak pattern works

Reading is the fastest way to load a question into your head. The words stay on screen the whole time, so nothing has to be remembered. Speaking is the fastest way to get an answer out, around four times faster than thumb-typing on a phone. Reading-in plus speaking-out is the lowest-friction shape asynchronous communication can take.

250 wpm

average silent reading speed for adults

150 wpm

average speaking speed in conversation

40 wpm

average mobile typing speed

Why reading and speaking together beat typing alone

Form-based feedback asks the recipient to do two slow things at once. Read the question, then translate the answer into typed characters on a glass screen. Typing on a phone is the slowest input a literate adult uses in a normal week. It is also the most effortful, which is why so many forms get abandoned halfway.

A phone call removes the typing problem but introduces three new ones. Both people need to be free at the same time. The question has to be remembered. And once the call starts, the cost of hanging up after 90 seconds feels rude, so the call drags on. That is why so many calendars are full and so few decisions get made.

Read-and-speak keeps the best half of each. The question stays on screen as a visual anchor, no memory load. The answer goes out through the mouth, the channel humans were built for. The recipient stays in control of when they answer, and the sender gets a recording they can scan in seconds.

The cognitive math behind it

The speeds are not close. Adults read silently at around 250 words per minute. They speak in conversation at around 150 words per minute. They type on a mobile keyboard at around 40 words per minute. The asymmetry is the whole point.

A 30-word question takes about 7 seconds to read. A 200-word answer takes about 80 seconds to speak. The same answer, typed on a phone, takes 5 minutes and feels like a chore halfway through. That extra effort is why typed responses are shorter, more guarded, and more likely to be abandoned. The voice answer is longer and more honest because the friction never spiked.

There is a second effect that does not show up in raw words per minute. Speaking bypasses the editor. People say what they mean before they have time to soften it. Typed answers go through a second pass before the send button gets pressed, and the second pass is where the useful detail gets sanded off.

Where the magic compounds: asynchronous

The read-and-speak pattern would still be useful in a live setting, but it gets dramatically better when nothing has to happen at the same time. The sender writes the question once. The receiver answers when their morning is calm enough. The sender listens to the answer when their own morning is calm enough. Three separate windows of attention, no calendar collision.

The transcription handles the rest. By the time the sender opens the dashboard, the answer is already text, already summarised, already searchable. Reading a 30-second summary of a 90-second answer takes 12 seconds. That is the part that lets one person actually process feedback from twenty different respondents in a single sitting, instead of booking twenty calls and burning a week.

When voice is the wrong choice

Read-and-speak is not for every situation. If your audience is hearing-impaired, voice cuts them out of the conversation. A text-based form is the right call there, full stop. HeySpeak should not be the only option you offer if accessibility is part of the brief.

Voice also struggles in some physical contexts. People on a quiet commuter train, in an open-plan office, or in a library will not record. The Magic Link handles this by giving them a second option, book a call, but you should expect a slower response from audiences who are mostly in those settings. Knowledge workers in offices reply later in the day, from home.

And some people just do not like recording themselves. The split across HeySpeak campaigns runs about 70 to 80 percent voice and 20 to 30 percent call bookings. Plan for both. The point of the pattern is not to force voice on anyone. It is to remove the typing tax for the majority who would rather skip it.

See the pattern in real use cases

Two pages that show what read-and-speak looks like in the wild.

Playbook

Voice customer feedback: the why behind the score

NPS gives you a number. Voice gives you the reason. How to collect honest feedback in 60 seconds.

Use case

Event reviews and venue feedback via QR code

A QR code on the table or at the exit. Customers scan, speak, submit. Reviews land in the dashboard before the night is over.

Common questions

Is voice feedback actually faster than typing?

Yes, by a wide margin. Most people type around 40 words per minute on a phone keyboard. The same people speak around 150 words per minute without effort. A 90-second voice note carries roughly the same content as 6 minutes of typing. The difference is also cognitive: speaking pulls answers out before the internal editor kicks in, which is exactly the version you want.

What if the recipient is in a quiet place and cannot speak?

The receiver page always offers a second option: book a call. If someone is on a quiet train, in a library, or just does not want to talk out loud, they pick a slot from your calendar instead. The choice is theirs. Some respondents also wait until they get home and answer there. The Magic Link does not expire after one visit.

Why not just ask people to record a video instead?

Video adds friction without adding signal. The recipient has to think about lighting, framing, and what they look like. That selfie-camera self-consciousness is exactly the editor you want to switch off. Voice keeps the eyes on the question and the brain on the answer. You get a more honest response and a smaller file to deal with.

Does the transcription work for accents and non-native speakers?

HeySpeak uses Voxtral by Mistral AI, with OpenAI Whisper as a fallback. Both handle accented English and most major European languages well. Transcripts are not perfect, but they are good enough to skim. The original audio stays in the dashboard, so if a transcript looks off, you click play and listen to the source.

What about people who hate voice notes?

Some people will never record one, and that is fine. They click the second button and book a call instead. Across HeySpeak campaigns, the split is roughly 70 to 80 percent voice and 20 to 30 percent call bookings. Both are useful: a voice note in 60 seconds, or a 15-minute call you would have had anyway, but with the right person.

Can the receiver re-listen to their own answer before submitting?

Yes. After recording, the receiver can play the answer back, re-record if they want, and only then submit. This matters for getting the response actually sent. People who are unsure they sounded clear are more likely to abandon. Letting them check first removes that worry.

Send one question. Hear the answer in your inbox.

Five free responses to start. Setup takes 2 minutes. No credit card.

Create your first Magic Link