April 24, 2023

Vector Databases

Vector databases store vector representations of data, known as vector embeddings. Data such as audio recordings and images are converted to vectors of numerical values, storing a representation of the object and its features. These representations make it easier to find similarities between an object and a large set of unstructured data. Vector databases are most useful for semantic search, recommendation engines, and anomaly detection.

Unstructured data is growing exponentially, and we are all part of a huge unstructured data workforce. This blog post is unstructured data; your visit here produces unstructured and semi-structured data with every web interaction, as does every photo you take or email you send. The global datasphere will grow to 165 zettabytes by 2025, and about 80% of that will be unstructured. At the same time, the rising demand for AI is vastly outpacing existing infrastructure. Around 90% of machine learning research results fail to reach production because of a lack of tools.

Thankfully there’s a new generation of tools that let developers work with unstructured data in the form of vector embeddings, which are deep representations of objects obtained from a neural network model. A vector database, also known as a vector similarity search engine or approximate nearest neighbour (ANN) search database, is a database designed to store, manage, and search high-dimensional data with an additional payload.

On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round, Qdrant

Funding

Oblivious, a framework to build apps that live inside of enclaves, which are useful for processing confidential data, raised €5.35m in Seed funding.

Qdrant, a vector database and vector similarity search engine, raised $7.5m in Seed funding.

Fluree, an open-source semantic graph database, raised $10m in Series A funding.

Groundlight, a startup combining natural language and computer vision, raised $10m in Seed funding.

Ditto, cross-platform peer-to-peer database that allows apps to sync with and even without internet connectivity, raised $45m in Series A funding.

Weaviate, an open-source vector database allowing teams to store data objects and vector embeddings from ML models, raised $50m in Series B funding.

Semgrep, an open-source static analysis engine for finding bugs and vulnerabilities in code, raised $53m in Series C funding.

CoreWeave, a GPU-focused cloud provider, raised $221m in Series B funding.