The man the California Gold Rush of the 1880's made richer than all others was a man who wasn't looking for gold at all. Samuel Brannan quickly became one of the richest men in the valley by selling shovels and wheelbarrows to the thousands who rushed to the west coast to pan for gold. His store at Sutter's Fort was the only store between San Francisco and the goldfields. It was a stroke of genius.
Fast forward to today. The old goldfields may have dried up in San Francisco, but there's plenty of gold still going around. The rush for gold deposits has been replaced by the rush to mine and use today's most valuable resource: data. The modern data stack has evolved to proactively uncover opportunities, risks, and insights from large data lakes at blazing fast speeds. The lowering of barriers to entry and improved end-user accessibility have opened the doors of this new age gold rush to anyone with a laptop and an internet connection.
The question is no longer one of finding relevant data. It is one of plumbing, and whether your data is flowing into the right places. And here, the hottest companies are the ones selling the piping. Brennan's store at Sutter's fort between San Francisco and the goldfields has taken on the shape of one of the fastest-growing data integration platforms in the world: Airbyte.
Lending and financial services for immigrants, teleoperated autonomous vehicles, tax-accounting-banking services for the gig economy, a customer portal for exchanging datasets, a tool that hacks around adblockers to reveal 100% of customer data... nope, not a Twitter thread that starts with💡 Startup idea
These are a few of the many ideas that Frenchmen Michel Tricot and John Lafleur 'invalidated' on their way to Airbyte.
In the process, the pre-product entrants to YC met with 45 data companies and spotted flaws with data integration - the part of the modern data stack that deals with moving data across the many siloes in which it's generated. Despite established ETL/ELT (extract, load 🔁 transform) solutions, data engineers were still spending considerable hours building and maintaining connectors (the pipes of the data world).
In just 3 months, the team put together an MVP with 6 connectors.
Today, the open-source product is a prominent data plumber competing against the likes of Fivetran and Stitch, offering:
- Extract and load: Replicate data from API sources, databases, and files using pre-built and custom connectors (more about this later 🤫)
- Transform: Convert data into friendly formats
- Embed: plug and play integrations for your product
- Orchestrate, monitor, and debug deployments
and has grown into a $1.5B+ company used by 10,000+ customers, raising $180M+ across 3 rounds in 1 year 🤯 from marquee investors like Benchmark and Accel!
But wait, why are the VCs so excited for a new entrant in an already established market? 🤔
∞ The long tail of data integration
The key differentiators for Airbyte are routed in the flaws that our founders spotted in incumbent solutions:
- Maintaining connectors is difficult! Limiting the number of pre-built connectors that are offered by existing solutions.
- Pre-built connectors couldn't handle custom requirements.
- Cloud-averse customers and use cases didn't want to send their data to the cloud.
- and invoices that can go haywire because of volume-based pricing.
The solution? An open-source data integration platform.
Airbyte offers open source pre-built & customizable connectors that can be self hosted, and a Connector Development Kit (CDK) that enabled open source contributors to build and edit for a long tail of use cases, while maintaining development standards.
The team is building towards 1000+ connectors by end of 2022, (as opposed to 150-170 offered by competition) by incentivizing contributors under a "participative model" where contributors can build and maintain (defined Service Levels) connectors in exchange for shared revenue.
As one would guess, success for Airbyte, like any other Open Source project rests in the hands of an engaged contributor community...
👨🏻💻 "Made by engineers for engineers" 👨🏻💻
With ETL/ELT being the core use case, Airbyte's ICP comprises data analysts (consumers) & data engineers. Overlap of future customers & contributors, make the community a must-win battle for the company. And boy, are they killing it! 🔥
'Deploy Airbyte now': Airbyte's connector codebase is completely open-source under the MIT license. Anyone who is interested has free and complete access to connectors, the CDK, integrations, APIs, and Slack support. Contributors are encouraged to contribute to
- add to the connector codebase
- documentation - comprehensive and up-to-date
- community content - tutorials and guides
and with 5.7k+ stars on Github and 500+ contributors, it’s safe to say that the OSS community has received Airbyte with open arms. 🤗
Within Airbyte, we define “user success” as our team’s focus to help our users be successful in whatever project they want to build around data, whether it be with Airbyte or another tool. We believe the best way to build trust with our community is by aligning our goals and incentives with theirs; we want them to know we have their back and always will. - John Lafleur
🗣 Join the conversation: Airbyte boasts the largest Slack channel around Data engineering with 4,000+ members. The Slack channel is a very active medium for solving user queries, sourcing feedback, and facilitating conversation between the community. A feat that was achieved by:
- Defining and chasing user success metrics: Time to value, Time to resolution.
- Dedicated community, outreach, and support resources: User success engineer, developer advocates, and cloud marketing managers.
- Community engagement processes and tools like Orbit.
Airbyte's product roadmap is public and always welcomes feature requests and feedback. Technical support vertical led by Erica Struthers and demand generation vertical led by Edward Farraye are the pillars behind this success.
Events: The Connector contest, the Airbyte community call, conferences, webinars, and office hours are community events that acquire and retain members of the community. This vertical is led by Chris Rose.
Marketplace and ecosystem: In line with the participative model, Airbyte hosts a connector marketplace where contributors can list connectors and are building a self-sustained ecosystem around open source data integration.
📈 Bottom-up adoption
Airbyte employs a "differentiated hosting" Open Source business model where they charge customers for managed hosting bundled with additional features which include a GUI workspace for managing connectors (data consumers rejoice 🥳). These core features follow a different non-resalable 'Elastic License v2' license.
Regarding monetization, our philosophy is the following: if it helps individual contributors or small teams, then it should be free and open-source; if it serves an organization’s needs, then it should be monetized.
Free open source adoption driven by community combined with Demand Gen efforts of content (long-tail keywords SEO, data engineering blogs) and digital acquisition drive user acquisition, and a frictionless adoption process (owned by Growth Product Manager Natalie Kwong) works to convert free trial users to the Airbyte cloud via a self serve flow.
Removing price barriers and reducing time to value becomes key levers to lift users over the aha moment wall and into the promised land. For Airbyte this aha moment was the first time a user synced a replication instance using Airbyte cloud.
🥘 Recipes, demos, and onboarding guides
Pre-sign up, users can take a tour of the dashboard via a live demo, and once in, a new user is greeted by the cutest guide in the business 😍. Octavia Squidington III ensures that you are well versed with your dashboard workflows with non-intrusive and timely tips.
Early activation is further strengthened by Recipes and How-to guides on the infinite possibilities of pre-built connectors, urging users to make that holy first sync! ⭐️
💸 Free trial followed by simple and transparent pricing
30 days & $1000 free credits are at the disposal of a new user exploring the product. Bye-bye price barrier 👋🏻
For activated users planning to stick around, Airbyte offers a category-disrupting pricing model. Offering both build and buy options, a cost-based pricing model was absolutely necessary for Airbyte cloud (users can compare the cost of buying vs building using Airbyte). Speaking a language most familiar with data professionals (thanks to all those Snowflake invoices 🙈), the company identified 'compute time' as the unit metric for cost calculation. A derived unit called "credit", converts compute time across data sources and the cloud product is linearly priced basis credits consumed.
💰 Product-led sales
Linear pricing stops making sense for large customers needing replication of 1TB+ data a month. Airbyte recognizes this and attends to these enterprise clients through a sales process.
User behavior across the free trial is monitored to identify aha moment-satisfying product qualified leads (albeit only with one criterion). These users are further segmented basis credit usage and larger accounts (>$10k) are passed to Adam Knobel's sales team. The budding team (actively hiring for a growth account executive) will own discovery, qualification, assessment, negotiation, and closed-won for BANT (budget, authority, need, timeline) qualified leads.
⏩ Next steps - become the standard for modern data stacks
Flush with VC capital, the Airbyte team has big plans for 2022 🤩 The north star for the year of the Tiger is to provide the most reliable and ubiquitous platform for moving data, and the KRs to watch out for include -
👫 Team - Increase headcount from 40 to 300 while not deviating from talent density
🤝 Connectors - Increase connector count from ~170 to 1,000 via the participative model
🔁 New data movements - Having built a solid contributor base around ETL/ELT, Airbyte will move to data movements of reverse ETL and event streaming.
💰 Revenue, deployments, and customer counts - Grow grow grow. Building a strong product-led self-serve funnel assisted by sales.
If 2021 is a yardstick for what this amazing team can achieve, the ascent to become the standard for the modern data stack is imminent and we are super excited to watch it unfold 🥳🚀