Speech-to-text transcription has historically been fraught with hilarious and sometimes controversial mishaps. Controversial among linguists, at least.
Neil Armstrong, by his own admission, fudged his first words on the moon when he said, “That's one small step for a man, one giant leap for mankind.” The official NASA transcripts now reflect the corrected, more universal statement instead, with the missing ‘a’ attributed to “static in the transmission.”
“One small step for man, one giant leap for mankind.”
Linguists care less about the hilarious ones.
He drank so much, he fell into a B̶o̶l̶i̶v̶i̶a̶n̶ Oblivion.
The therapist taught me to be m̶i̶c̶e̶ e̶l̶f̶ myself.
The parcel was secured by a g̶r̶e̶a̶t̶ a̶p̶e̶ grey tape.
Speech transcription is tricky, and as NASA showed, even with humans tuned into the transmission. Transcriptions of NASA’s space-to-ground transmissions are today powered instead by a company born in… (there’s no guessing this one) one of the quietest places on planet earth. Deep underground, and next to the site identified to build the tallest dam in the world.
If you haven’t heard of Deepgram yet, one way or the other, you’re about to.
Started underground now we here 🌏
What happens when you put two particle physicists 2 miles underground in a government-controlled region of China?
Bleeding edge multi-lingual end-to-end automatic speech recognition and real-time transcription, apparently.
Back in 2015, Deepgram founders Scott Stephenson and Noah Shutty were building dark matter detectors in tunnels deep underground when they turned to AI to find particle events and pinpoint valuable timestamps in their recordings. The duo turned to the Microsofts, Amazons, and Googles of the world for a tool that would help them organize and understand their “life logs” underground and found that the tech just did not exist at the time.
So they built it themselves.
“We decided to build one using the same AI we were using to find dark matter particle events, and that was how Deepgram was created.” - Scott Stephenson, CEO & Co-founder
The duo raised $1.8M in financing from Metamorphic Ventures and Y Combinator and took their underground side-project up-market to the enterprise market. And it doesn’t get more “enterprise” than NASA, Spotify, Auth0, and Citi - just some of the logos Deepgram works with today.
What is Deepgram?
A majority (~80%) of enterprise data still lives in voice. The signal-to-noise ratio in the data is high, and access to insights is limited. The Googles, Amazons, and Microsofts of the world all have general models that are, in the words of the Deepgram CEO “the jack of all trades and master of none.”
Deepgram’s API-based speech AI helps developers leverage speech-recognition models that can transcribe AND understand industry-specific brands, jargon, accents, and languages to adapt to challenging audio environments - like NASA’s comms between Mission Control and the International Space Station, for example.
On November 26, Deepgram announced the second tranche of their Series B round, which brought their total Series B raise to a whopping $72M - the largest Series B raise by a speech AI company ever.
This round is an extension of the Series B that was led by Tiger Global back in February 2021 - and was led by Madrona Venture Group, with participation from Alkeon Capital Management and Citi Ventures.
What did Deepgram do differently? Let’s take a peek under the hood of the hottest speech AI tool in town.
Technical Bedtime 🛏
Six years ago, Deepgram threw the old way of doing speech recognition out of the window and replaced it with an end-to-end deep learning approach, fundamentally altering the voice intelligence industry.
The voice revolution didn’t happen overnight, though.
It took the team 2 years to put the “tech risk” behind them and emerge with a product that had up to 30% higher speech recognition accuracy compared with industry baselines while speeding up transcription 200 times AND adding on the ability to handle thousands of simultaneous audio streams.
Deepgram scored better than the incumbents on accuracy, latency, and scalability by the time they came out of what Scott calls “technical bedtime” and strapped on GTM afterburners.
Why was this a masterstroke? Improving existing tech would have left them with low gross margins, but building fundamentally new tech that brought a step-function change in accuracy to speech recognition left their competition in the dust.
Next up - GTM.
How to win developers and influence relationships 👨🏻💻
Deepgram’s GTM motion takes a multi-pronged approach to reaching their Ideal Customer Persona (ICP): developers.
The DevRel team led by Michael Jolley evangelizes Deepgram through a near-exhaustive list of developer watering holes:
- Conferences and hackathons
- Twitch developer forums
- Github Discussions
- Twitter Spaces
- Stack Overflow
The DevRel team channels the community to showcase demos & apps built using Deepgram. DevRel meets marketing in the sphere of influencer marketing through which the team takes the concept of case studies and pumps it with steroids. “Built With Deepgram” is a goldmine of speech AI use cases such as “the hoodie that subtitles everything you say,” “coding a website using only your voice,” and the “LED dress that turns you into the Disney princess you are singing.”
Step 1: Land
Although the GTM is largely targeted at large enterprise customers, Deepgram’s adoption within organizations is largely bottom-up through developer teams. The Deepgram masterclass in bottom-up product-led growth adoption begins seconds after signup.
The Land: The “Time to Value” (TTV) optimizations kick in as soon as you sign up.
Signing up presents you with a gamified mission-driven onboarding ladder with a leaderboard and a points system. Expect your race to the aha moment to be ridden with sweet dopamine hits.
Step 2: Expand
Expansions are product-led and powered through virality mechanisms. Invite other developers from your team to the Deepgram experience, and before you know it, your whole team’s heard of the new funky speech transcription tool with industry-leading accuracy, latency, and throughput.
Step 3: Embed
What starts with $150 worth of free delight credits granted to your account (WAY more than what’s required for you and your team to experience value) ends with proof of value to entire developer teams working on speech AI.
Once primed with the organic expansion motion, teams are met with the enterprise nuance of Deepgram’s sales (led by Chris Dyer - VP of Sales), and sales-assist teams.
Coming from a world of particle physics with no GTM background, CEO Scott Stephenson learned very quickly that in a crowded market, the technically superior product doesn’t always win.
With strong tailwinds of Product-Led Growth on their backs (and especially among developer communities), Deepgram now has the go-to-market steroids its superior tech deserves.
On the back of their Series B round, Deepgram hinted at the next frontier for the team: Speech understanding. The “how” and “why” of why something was said, encompassing:
- Smart formatting - Detection of smart text, including phone numbers, emails, and addresses
- Replacement - Keeping private information private, fluid translations, and accessibility improvements
- Identification - Registering speaker changes, identifying important keywords and phrases
- Analysis - Actionable summaries, intents, and topics
A $72M Series B round in a market where Amazon’s Alexa division is reportedly on course to accrue a $10B loss this year, and Google rumored to be trimming Google Assistant division is a testament to Deepgram’s technical mettle and foundational importance.
The speech-recognition market is estimated to be worth $48.8 billion by the end of the decade. It should come as no surprise if, by then, a chunk of that $50 billion dollar cake is taken home by the team that gave up the search for dark-matter to do this.