Oracle AI Vector Search : Demonstration

Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads and allows you to query data based on semantics, rather than keywords.

Why Use Oracle AI Vector Search?

One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.
This is not only powerful but also significantly more effective because you don’t need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.

Oracle AI Vector Search Workflow

  • Generate Vector Embeddings from Your Unstructured Data
  • Store Vector Embeddings, Unstructured Data, and Relational Business Data in Oracle Database
  • Create Vector Indexes
  • Query Data with Similarity Searches
  • Generate a Prompt and Send it to an LLM for a Full RAG Inference
Demo

The demo provides an overview of the VECTOR data type, which captures the semantic meaning of your unstructured data. Additionally, you will learn to use Vector Search SQL operators to perform similarity searches, identifying vectors and related content with similar characteristics. Vector indexes are also created to enhance the efficiency of similarity searches through approximate methods.

To execute this script, you will need three files similar to the ones below:

SCP files to your directory
 
 scp my_embedding_model.onnx /ADAT/datadir
 scp database-concepts.pdf /ADAT/datadir
 scp oracle-ai-vector-search-users-guide.pdf /ADAT/datadir

 [oracle@localhost datadir]$ ls -rlt
 total 103452
 -rwxrwx---. 1 root vboxsf 2688358 Jun 20 11:58 oracle-ai-vector-search-users-guide.pdf 
 -rwxrwx---. 1 root vboxsf 90621438 Jun 20 19:19 my_embedding_model.onnx
 -rwxrwx---. 1 root vboxsf 12616683 Jun 23 11:03 database-concepts.pdf

Create tablespace, user, and grant privileges

Create a new tablespace and user account. Assign the DB_DEVELOPER_ROLE and define an Oracle directory for accessing the PDF files.

 
 CREATE TABLESPACE VECTORDEMO
 DATAFILE '/opt/oracle/oradata/FREE/FREEPDB1/FREE_VECTORDEMO_01.dbf' SIZE 5G AUTOEXTEND ON 
 EXTENT MANAGEMENT LOCAL
 SEGMENT SPACE MANAGEMENT AUTO;

 create user AITEST identified by welcome1 
 DEFAULT TABLESPACE VECTORDEMO 
 quota unlimited on VECTORDEMO;

 grant DB_DEVELOPER_ROLE to AITEST;

 create or replace directory AITEST_VEC_DUMP as '/ADAT/datadir/';

 grant read, write on directory AITEST_VEC_DUMP to AITEST;

Load embedding model into Database

Using the DBMS_VECTOR package, load your embedding model into Database. You must specify the directory where you stored your model in ONNX format as well as describe what type of model it is and how you want to use it.

 
 connect AITEST/welcome1

 exec dbms_vector.drop_onnx_model(model_name => 'AIDEMO_DOC_MODEL', force => true);

 EXECUTE dbms_vector.load_onnx_model('AITEST_VEC_DUMP', 'my_embedding_model.onnx', 
                    'AIDEMO_DOC_MODEL', 
                    JSON('{"function" : "embedding", "embeddingOutput" : "embedding" , "input": {"input": ["DATA"]}}')); 

Create table to store books in the PDF format

You now create a table containing all the books you want to chunk and vectorize. You associate each new book with an ID and a pointer to your local directory where the books are stored.

 
 drop table if exists AITEST_DOCUMENTATION_TAB purge;
 create table AITEST_DOCUMENTATION_TAB (id number, data blob);

 insert into AITEST_DOCUMENTATION_TAB values(1, to_blob(bfilename('AITEST_VEC_DUMP','database-concepts.pdf')));
 insert into AITEST_DOCUMENTATION_TAB values(2, to_blob(bfilename('AITEST_VEC_DUMP','oracle-ai-vector-search-users-guide.pdf'))); 
 commit;

 select dbms_lob.getlength(data) 
 from AITEST_DOCUMENTATION_TAB;

Create table to store unstructured data chunks and associated vector embeddings

You start by creating the table structure using the VECTOR data type.

The INSERT statement processes each PDF file from AITEST_DOCUMENTATION_TAB by converting it to text, segmenting the text into chunks, and creating vector embeddings for each chunk. This entire workflow is executed within a single INSERT SELECT operation.

Here you choose to use Vector Utility PL/SQL package DBMS_VECTOR_CHAIN to convert, chunk, and vectorize your unstructured data in one end-to-end pipeline.

 
 drop table if exists AITEST_DOCUMENTATION_CHUNKS purge;

 CREATE TABLE AITEST_DOCUMENTATION_CHUNKS
   (
     doc_id          NUMBER,
     chunk_id        NUMBER,
     chunk_data      VARCHAR2(4000),
     chunk_embedding VECTOR
   );

 INSERT INTO AITEST_DOCUMENTATION_CHUNKS
 SELECT dt.id                      doc_id,
        et.embed_id                chunk_id,
        et.embed_data              chunk_data,
        To_vector(et.embed_vector) chunk_embedding
 FROM   AITEST_DOCUMENTATION_TAB dt,
        dbms_vector_chain.utl_to_embeddings( dbms_vector_chain.utl_to_chunks(dbms_vector_chain.utl_to_text(dt.data), json('{"normalize":"all"}')),
        json('{"provider":"database", "model":"AIDEMO_DOC_MODEL"}')) t,
        json_table(t.column_value, '$[*]' COLUMNS (embed_id NUMBER path '$.embed_id', embed_data VARCHAR2(4000) path '$.embed_data', embed_vector clob  
        path '$.embed_vector')) et;

 commit;

Generate a query vector for use in a similarity search

For a similarity search you will need query vectors. Here you enter your query text and generate an associated vector embedding.

For example, you can use the following text: ‘different methods of backup and recovery’.You use the VECTOR_EMBEDDING SQL function to generate the vector embeddings from the input text. The function takes an embedding model name and a text string to generate the corresponding vector.

 
 ACCEPT text_input CHAR PROMPT 'Enter text: '

 VARIABLE text_variable VARCHAR2(1000)
 VARIABLE query_vector VECTOR
 BEGIN
 :text_variable := '&text_input';
 SELECT vector_embedding(AIDEMO_DOC_MODEL using :text_variable as data) 
 into :query_vector;
 END;
 /

 PRINT query_vector 

Search your books for the top 2 chunks most relevant to backup and recovery

Using the generated query vector, you search similar chunks in the AITEST_DOCUMENTATION_CHUNKS table. For this, you use the VECTOR_DISTANCE SQL function and the FETCH SQL clause to retrieve the most similar chunks.

 
 SELECT doc_id, chunk_id, chunk_data
 FROM AITEST_DOCUMENTATION_CHUNKS
 ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE) 
 FETCH FIRST 2 ROWS ONLY;

You can also add a WHERE clause to further filter your search, for instance if you only want to look at one particular book.

 
 SELECT doc_id, chunk_id, chunk_data
 FROM AITEST_DOCUMENTATION_CHUNKS
 WHERE doc_id=1
 ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE)
 FETCH FIRST 2 ROWS ONLY;

Use the EXPLAIN PLAN command to determine how the optimizer execute the query

Leveraging the generated query vector, you identify similar chunks within the AITEST_DOCUMENTATION_CHUNKS table by employing the VECTOR_DISTANCE SQL function and the FETCH SQL clause to obtain the most closely matching chunks.

 
 EXPLAIN PLAN FOR
 SELECT doc_id, chunk_id, chunk_data
 FROM AITEST_DOCUMENTATION_CHUNKS
 ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE) 
 FETCH FIRST 2 ROWS ONLY;

 select * from table(dbms_xplan.display);

Run a multi-vector similarity search to identify the topmost relevant chunk within the most relevant books
 
 SELECT doc_id, chunk_id, chunk_data
 FROM AITEST_DOCUMENTATION_CHUNKS
 ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE) 
 FETCH FIRST 2 PARTITIONS BY doc_id, 1 ROWS ONLY;

Create an In-Memory Neighbor Graph Vector Index

When working with large vector embedding spaces, creating vector indexes can significantly speed up similarity searches. Rather than examining every single vector embedding in your dataset, a vector index employs heuristics to narrow down the search space, thus accelerating the process. This technique is known as approximate similarity search.

 
 create vector index IDX_ADC_CHUNK_EMBED on AITEST_DOCUMENTATION_CHUNKS(chunk_embedding) 
 organization INMEMORY NEIGHBOR GRAPH
 distance COSINE
 with target accuracy 95;

 col index_name for a30
 SELECT INDEX_NAME, INDEX_TYPE, INDEX_SUBTYPE
 FROM USER_INDEXES
 WHERE INDEX_NAME='IDX_ADC_CHUNK_EMBED';

 SELECT JSON_SERIALIZE(IDX_PARAMS returning varchar2 PRETTY)
 FROM VECSYS.VECTOR$INDEX
 where IDX_NAME = 'IDX_ADC_CHUNK_EMBED';

Determine the memory allocation in the vector memory area

To get an idea about the size of your In-Memory Neighbor Graph Vector Index in memory, you can use the V$VECTOR_MEMORY_POOL view.

 
 select CON_ID, POOL, 
		ALLOC_BYTES/1024/1024 as ALLOC_BYTES_MB, 
		USED_BYTES/1024/1024 as USED_BYTES_MB
 from V$VECTOR_MEMORY_POOL 
 order by 1,2;

Perform an approximate similarity search to find the two most relevant chunks within your books

Using the previously generated query vector, you search chunks in the AITEST_DOCUMENTATION_CHUNKS table that are similar to your query vector. For this, you use the VECTOR_DISTANCE function and the FETCH APPROX SQL clause to retrieve the most similar chunks using your vector index.

 
 SELECT doc_id, chunk_id, chunk_data
 FROM AITEST_DOCUMENTATION_CHUNKS
 ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE) 
 FETCH APPROX FIRST 2 ROWS ONLY 
 WITH TARGET ACCURACY 80;

Use EXPLAIN PLAN to see how the optimizer processes the query
 
 EXPLAIN PLAN FOR
 SELECT doc_id, chunk_id, chunk_data
 FROM AITEST_DOCUMENTATION_CHUNKS
 ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE) 
 FETCH APPROX FIRST 2 ROWS ONLY WITH TARGET ACCURACY 80;

 select * from table(dbms_xplan.display);

Determine vector index performance for approximate similarity searches

The index accuracy reporting feature allows you to determine the accuracy of your vector indexes. After creating a vector index, you may want to know the accuracy of your approximate vector searches.

The DBMS_VECTOR.INDEX_ACCURACY_QUERY PL/SQL procedure provides an accuracy report for a top-K index search for a specific query vector and a specific target accuracy. In this case you keep using the query vector generated previously.

 
 SET SERVEROUTPUT ON
 declare
 report varchar2(128);
 begin
	report := dbms_vector.index_accuracy_query( 
	OWNER_NAME => 'AITEST',
	INDEX_NAME => 'IDX_ADC_CHUNK_EMBED',
	qv => :query_vector,
	top_K => 10,
	target_accuracy => 90 );
	dbms_output.put_line(report);
 end;
 /

The report looks like the following: Accuracy achieved (100%) is 10% higher than the Target Accuracy requested (90%).

 

Stay tuned for surplus Oracle 23ai demos !!!

 

2 thoughts on “Oracle AI Vector Search : Demonstration”

Leave a Reply

Your email address will not be published. Required fields are marked *