When Pinecone launched last year, the company’s message was about building a serverless vector database designed specifically for the needs of data scientists. While that database is at the heart of what the company does, it’s evolving into a more sophisticated use case for that database around AI-driven search, helping those data scientists find the proverbial needle in a haystack.
When we spoke to Pinecone founder and CEO Edo Liberty last year during his $10 million seed round, his company just felt its way and built out the database. He came from Amazon, where he helped build the SageMaker database service. He says they’ve come a long way since then.
“A lot has changed since our seed announcement, so first of all we launched our true paid production service in October, and it’s grown rapidly since then, both in adoption and revenue, and so it’s going really well,” said Liberty.
He described the rationale for a purpose-built database for data scientists at the time of the seed funding as follows:
“The data that a machine learning model expects is not a JSON record, it is a high dimensional vector that is either a list of functions or a so-called embedding which is a numerical representation of the items or the objects in the world. [format] is much more semantically rich and usable for machine learning,” he explained.
He says that semantically rich approach drives customers to use Pinecone today.” The predominant use of the vector databases is for search, and search in the broad sense of the word. It’s search through documents, but you can think of search as information retrieval in general, discovery, recommendation, anomaly detection and so on,” he said.
The system is organized in pods, which are collections of resources designed to process the data in the Pinecone database. The company offers a single pod for free to help customers familiarize themselves with the product and conduct a simple proof of concept. After that, they start paying based on the number of pods.
He is convinced that the company designed the system so that it can be scaled up to billions of objects. “You can scale up to as much as your software can handle and you can really orchestrate. We designed the system so that there is no clearly defined limit to the amount of data you can index and use,” he said.
Since it is a serverless database, the customer does not have to worry about provisioning, but must tell Pinecone how much they are willing to spend each month, based on the amount of data they need to process.
“They’re kind of pushing the back of the envelope to figure out that x pods will be enough for what we’re using in terms of the data it can hold and the performance it would give me and that’s it.” After that, the person just signs in and with a few clicks in the console and an API call to create the index, it’s up and running.
Liberty didn’t want to share growth figures or headcount, but he says he expects to double the workforce (whatever that means) over the next year. It is worth noting that the startup had 10 employees at the time of the seed announcement.
In terms of diversity, he said last year: “We have instructed our recruiters to be proactive [in finding more diverse applicants], to make sure they don’t miss out on great candidates and bring us a diverse set of candidates.” In practice, he says, this has resulted in 50% of new tech hires (as opposed to the total workforce) being women this year.
The company today announced a $28 million Series A led by Menlo Ventures with the participation of new investor Tiger Global, along with previous investors including Wing Venture Capital, which led the company’s seed funding. The company has now raised $38 million.