How Estuary helps companies leverage historical and real-time data pipelines

That Transform Technology Summits launch on October 13 with Low-Code / No Code: Enabling Enterprise Agility. Register now!


Let it OSS Enterprise newsletter your guide open source trip! Sign up here.

It is often said that the world’s most valuable resource today is data given the role it plays in driving all kinds of business decisions. But combining data from countless different sources, such as SaaS applications to unlock insights, is a big task that is made even more difficult when real-time, low-latency data streaming is the name of the game.

This is something that the New York-based estuary is solving with a “data operations platform” that combines the benefits of “batch” and “stream” processing of data pipelines.

“There’s a Cambrian explosion of databases and other data tools that are extremely valuable to businesses but difficult to use,” Estuary co-founder and CEO David Yaffe told VentureBeat. “We help clients get their data out of their current systems and into these cloud-based systems without having to maintain infrastructure in a way that is optimized for each of them.”

To help with its mission, Estuary announced today that it has raised $ 7 million in a seed of funding led by FirstMark Capital, with the participation of a large number of angel investors, including Datadog CEO Olivier Pomel and Cockroach Labs CEO Spencer Kimball.

Game state

Batch data processing, for the uninitiated, describes the concept of integrating data into batches at fixed intervals – this can be useful for processing last week’s sales data to prepare a department report. Stream data processing, on the other hand, is about utilizing data in real time as it is generated – this is more useful if a company wants to generate quick insight into the sales as they happen, or where customer support teams need all the recent data about a customer, eg. their purchases and the site’s interactions.

Although significant progress has been made in the field of batch data processing in terms of being able to extract data from SaaS systems with minimal technical support, the same cannot be said of real-time data. “Engineers working on lower-delay operating systems still need to manage and maintain a massive infrastructure burden,” Yaffe said. At Estuary, we bring the best of both worlds to data integrations. Simplicity and data storage of batch systems and [low] delay of streaming. ”

Above: An estuary conceptualization

Achieving all of the above is of course already possible through existing technologies. If a company wants low latency data capture, they can use various open source tools like Plusar or Kafka to create and manage their own infrastructure. Or they can use existing vendor-driven tools such as HVR, which Fivetran recently acquired, although it is mostly focused on capturing real-time data from databases, with limited support for SaaS applications.

This is where Estuary enters the fray and offers a fully managed ELT service (extract, load, transform) “that combines both millisecond latency and point-and-click simplicity,” the company said, bringing open source connectors similar to Airbyte for use cases with low delay.

“We’re creating a new paradigm,” Yaffe said. So far, there have been no products to retrieve data from SaaS applications in real time – for the most part, this is a new concept. We are essentially bringing a millisecond latency version of Airbyte that works across SaaS, database, pub / sub and filestores to market. ”

There has been an explosion of activity across the data integration space of late, with Dbt Labs raising $ 150 million to help analysts transform data in the warehouse, while Airbyte closed a $ 26 million funding round. Elsewhere, GitLab unveiled an open source data integration platform called Meltano. Estuary is certainly jives with all of these technologies, but its focus on both batch and stream data processing is where it will stand out and cover more utility cases in the process.

“It’s such a different focus that we do not see ourselves as competitive with them, but some of the same application cases can be carried out by both systems,” Yaffe said.

The story so far

Yaffe was a former co-founder and CEO of Arbor, a data-focused martech company he sold to LiveRamp in 2016. At Arbor, they created the Gazette, the backbone on which its managed commercial service Flow — which is currently in private beta — builds.

Businesses can use the Gazette “as a replacement for Kafka” according to Yaffe, and it has been completely open source since 2018. The Gazette builds a real-time data lake that stores data like regular files in the cloud and allows users to integrate with other tools. It may be a useful solution in itself, but it still needs significant technical resources for use as part of a holistic ELT toolkit where Flow comes into play. Companies use flow to integrate all the systems they use to generate, process and consume data, uniting the “batch vs streaming paradigms” to ensure that a company’s current and future systems are “synchronized around the same data sets.”

Flow is available from source, which means it offers many of the freedoms associated with open source, except its Business Source License (BSL) prevents developers from creating competing products from source code. On top of that, Estuary licenses a fully managed version of Flow.

“Gazette is a great solution compared to what many companies do today, but it still requires talented engineering teams to build and operate applications that will move and process their data – we still think it’s too much of a challenge in terms of to the simpler ergonomics of tools within the batch area, ”explained Yaffe. “Flow takes the concept of streaming that the Gazette enables and makes it as simple as Fivetran to capture data. The company uses it to gain that kind of advantage without having to manage infrastructure or be experts in building and operating power management pipelines. ”

Although Estuary does not publish its prices, Yaffe said it charges fees based on the amount of input data that Flow collects and processes each month. As for existing customers, Yaffe did not have the freedom to reveal specific names, but he said its typical client operates in martech or adtech, while companies also use it to migrate data from a local database to the cloud.

VentureBeat

VentureBeat’s mission is to be a digital urban space for technical decision makers to gain knowledge about transformative technology and transactions. Our site provides important information about data technologies and strategies to guide you as you lead your organizations. We invite you to join our community to access:

  • updated information on topics that interest you
  • our newsletters
  • gated thought-leader content and discount access to our valued events, such as Transform 2021: Learn more
  • networking features and more

sign up

Leave a Comment