Abstract
In a previous article, we saw an example of the integration of SingleStoreDB with LangChain. Another emerging framework is LlamaIndex. In this short article, we'll take an example from the LlamaIndex documentation and apply it to SingleStoreDB.
Introduction
In this short article, we'll test the ability of SingleStoreDB to work with LlamaIndex using an example from the documentation. This should be very straightforward. In later articles, we'll see if we can achieve tighter integration.
As described in a previous article, we'll follow the instructions to create a SingleStoreDB Cloud account, Workspace Group, Workspace, and Notebook.
Create the Database
From the left nav in our SingleStoreDB Cloud account, we'll select the SQL Editor and run the following commands:
CREATE DATABASE IF NOT EXISTS demo_db;
USE demo_db;
DROP TABLE IF EXISTS city_stats;
Fill out the Notebook
We'll follow the example described in the documentation and modify it, where required, to work with SingleStoreDB.
First, we'll suppress some warnings which result from SQLAlchemy as we'll use the version installed by LlamaIndex rather than the version installed by SingleStore:
import warnings
warnings.filterwarnings("ignore")
Next, we'll install a recent version of LlamaIndex:
!pip uninstall sqlalchemy-singlestoredb -y --quiet
!pip install llama-index==0.6.35 --quiet
We'll now enter and save our OpenAI API Key
:
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
We'll keep the logging-level commented out:
#import logging
#import sys
#logging.basicConfig(stream=sys.stdout, level=logging.INFO)
#logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
And directly import from IPython.display
:
from IPython.display import Markdown, display
Create Database Schema
We'll get the appropriate imports from SQLAlchemy:
from sqlalchemy import (
create_engine,
MetaData,
Table,
Column,
String,
Integer,
select,
column,
)
Next, we'll create the connection:
engine = create_engine("mysql+pymysql://admin:<password>@<host>:3306/demo_db")
metadata_obj = MetaData()
We'll replace the <password>
and <host>
with the values from our SingleStoreDB Cloud account.
Now we'll create a table:
# create city SQL table
table_name = "city_stats"
city_stats_table = Table(
table_name,
metadata_obj,
Column("city_name", String(16), primary_key=True),
Column("population", Integer),
Column("country", String(16), nullable=False),
)
metadata_obj.create_all(engine)
Define SQL Database
We'll continue with the LlamaIndex documentation:
from llama_index import SQLDatabase, ServiceContext
from langchain import OpenAI
from llama_index import LLMPredictor
llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm)
sql_database = SQLDatabase(engine, include_tables=["city_stats"])
sql_database.table_info
The result should be as follows:
'\nCREATE TABLE city_stats (\n\tcity_name VARCHAR(16) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, \n\tpopulation INTEGER(11), \n\tcountry VARCHAR(16) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL\n)\n\n/*\n3 rows from city_stats table:\ncity_name\tpopulation\tcountry\n\n*/'
Next, we'll continue with the code:
sql_database = SQLDatabase(engine, include_tables=["city_stats"])
from sqlalchemy import insert
rows = [
{"city_name": "Toronto", "population": 2930000, "country": "Canada"},
{"city_name": "Tokyo", "population": 13960000, "country": "Japan"},
{"city_name": "Chicago", "population": 2679000, "country": "United States"},
{"city_name": "Seoul", "population": 9776000, "country": "South Korea"},
]
for row in rows:
stmt = insert(city_stats_table).values(**row)
with engine.connect() as connection:
cursor = connection.execute(stmt)
connection.commit()
# view current table
stmt = select(city_stats_table.c["city_name", "population", "country"]).select_from(
city_stats_table
)
with engine.connect() as connection:
results = connection.execute(stmt).fetchall()
print(results)
The result should be as follows:
[('Seoul', 9776000, 'South Korea'), ('Toronto', 2930000, 'Canada'), ('Tokyo', 13960000, 'Japan'), ('Chicago', 2679000, 'United States')]
Query Index
We'll now run the following query:
from sqlalchemy import text
with engine.connect() as con:
rows = con.execute(text("SELECT city_name from city_stats"))
for row in rows:
print(row)
The result should be as follows:
('Toronto',)
('Tokyo',)
('Chicago',)
('Seoul',)
Natural language SQL
We can also prepare a natural language query:
from llama_index.indices.struct_store.sql_query import NLSQLTableQueryEngine
query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["city_stats"],
)
query_str = (
"Which city has the highest population?"
)
response = query_engine.query(query_str)
Building our Table Index
import openai
openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.indices.struct_store.sql_query import SQLTableRetrieverQueryEngine
from llama_index.objects import SQLTableNodeMapping, ObjectIndex, SQLTableSchema
from llama_index import VectorStoreIndex
# set Logging to DEBUG for more detailed outputs
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [(SQLTableSchema(table_name="city_stats"))] # add a SQLTableSchema for each table
obj_index = ObjectIndex.from_objects(
table_schema_objs,
table_node_mapping,
VectorStoreIndex,
)
query_engine = SQLTableRetrieverQueryEngine(
sql_database, obj_index.as_retriever(similarity_top_k=1)
)
Now we can use the natural language query:
response = query_engine.query("Which city has the highest population?")
display(Markdown(f"<b>{response}</b>"))
The result should be similar to the following:
Tokyo has the highest population with 13,960,000 people.
The raw output can also be obtained using the following:
# you can also fetch the raw result from SQLAlchemy!
response.metadata["result"]
The result should be as follows:
[('Tokyo', 13960000)]
Additional context info can be added as follows:
# manually set context text
city_stats_text = (
"This table gives information regarding the population and country of a given city.\n"
"The user will query with codewords, where 'foo' corresponds to population and 'bar'"
"corresponds to city."
)
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [(SQLTableSchema(table_name="city_stats", context_str=city_stats_text))]
Using LangChain for Querying
LangChain can also be used:
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain
llm = OpenAI(temperature=0)
# set Logging to DEBUG for more detailed outputs
db_chain = SQLDatabaseChain(llm=llm, database=sql_database)
db_chain.run("Which city has the highest population?")
The result should be similar to the following:
'Tokyo has the highest population with 13960000 people.'
Summary
In this quick example, we have seen how to use LlamaIndex with SingleStoreDB. Only a few minor code modifications were required to the example from the LlamaIndex documentation.
License
The MIT License
Copyright © Jerry Liu
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.