OpenSearch: How the Project Went From Fork to Foundation

The open source analytics engine is highly flexible and getting better all the time, says Anandhi Bumstead of AWS in this episode of The New Stack Makers.

Nov 26th, 2024 10:00am by Heather Joslyn

Featued image for: OpenSearch: How the Project Went From Fork to Foundation

RALEIGH, N.C. — OpenSearch, a flexible open source data ingestion and analytics engine, was transferred in September by its creator, Amazon Web Services, to the Linux Foundation. In this On the Road episode of The New Stack Makers, Anandhi Bumstead, director of software engineering at AWS, talked to me about the project’s genesis and its journey to foundation sponsorship, and what comes next.

This episode of Makers was recorded at All Things Open in October.

The OpenSearch project began as a fork of Elasticsearch; in 2021, Elasticsearch’s open source Apache 2.0 license was changed to a more restrictive one. (In September, Elastic changed the licensing of Elasticsearch and its visualization dashboard, Kibana, introducing the open source GNU Affero General Public License as an option for users.)

For the OpenSearch project maintainers, the move to the Linux Foundation offered a number of advantages, Bumstead said.

“We really looked at a neutral foundation for neutral governance and also to bring in a broader community,” she said. “And if you look at foundations, like most foundations, they really facilitate neutral governance and also enabling companies to collaborate in open source.

“Linux Foundation has been very successful. I mean, tons of open source projects Linux Foundation has been governing. So we really wanted to lean into that.”

OpenSearch as a ‘Swiss Army Knife’

In his keynote address at the Open Source Summit in Vienna in September, Carl Meadows, director of product management for OpenSearch, likened the project to a “Swiss Army knife,” suitable for a variety of uses. Bumstead echoed that assessment.

OpenSearch, she told the Makers audience, “builds on the core search engine analytics, and also as a visualization out of the box.”

She added, “People widely use it for observability, log analytics. They use it for security analytics, for alert detection, and they’re heavily used, especially with the [generative AI] momentum, as a vector database. OpenSearch is also used for a lot of search scenarios. We have what is called semantic search, and we have this thing called hybrid search, where you can do a keyword search and you can do semantic search.”

These capabilities and the use cases they enable, she said, account for its reputation for flexibility.

However, OpenSearch has faced criticism from users for being slower than Elasticsearch in terms of indexing speed and handling complex queries.

“Earlier, we were focused on stabilizing and bringing the community on board,” Bumstead acknowledged. “But our primary focus continues to be performance, and we spent a lot of time tuning performance.”

Among the highlights of that “tuning,” she mentioned:

OpenSearch benchmarks, released in 2023, allow users to measure the project’s performance and its workloads. “You can look at the query, how the query is doing, how is indexing doing, and all of the measurement,” she said.
Also, in 2023, it added segment replication. “We have about 25% throughput on indexing improvement compared to a default, like document replication,” Bumstead said.
In September, OpenSearch released its latest version, 2.17. In that version, Bumstead said, “The performance of the query, of the most complex queries, is 6.5x faster than our first openSearch release … so it’s a significant shift we’ve done in terms of the query performance and from an indexing perspective.”

As for what’s next, she said, work continues on indexing, search and storage, and vector capabilities, “We’re doing a ton of work with performance and cost optimization in mind,” Bumstead added.

A big question the project maintainers are currently exploring, she said, is “How do we be more efficient in cost and storage? The thing which really excites me is some of the work we’re doing in that area. we’re doing work around [optimizing] storage in in the vector space. We would love to collaborate with more people to innovate in this area and really focus on vector performance and cost efficiency.”

Check out the full episode to learn more about OpenSearch and how it can be useful in conjunction with generative AI. Go to opensearch.org to contribute to the project.

Heather Joslyn is editor in chief of The New Stack, with a special interest in management and careers issues that are relevant to software developers and engineers. She previously worked as editor in chief of Container Solutions, a Cloud Native consulting...