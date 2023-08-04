Many people in the community have shown interest in using Semantic Kernel for querying relational databases using natural language expressions. In response to this, we have developed NL2SQL, a sandbox that allows you to explore the capabilities of GPT-4 in generating SQL queries.

Our approach is guided by a few key principles. First, we avoid the need for data-movement or consistency considerations by not synchronizing the database to vector-storage. Second, we steer clear of using hardcoded prompts specific to a particular database schema or platform. Instead, we support multiple schemas to accommodate multiple data sources or large schemas.

We use an abbreviated object model to express schema meta-data, primarily focusing on table and column names. Optional descriptions are provided for additional clarity. The schema model also includes platform direction to ensure the generation of the correct SQL variant.

Query generation in NL2SQL involves storing schema expressions in semantic memory. We utilize two prompts for this purpose. The “IsQuery” prompt screens whether the objective can be solved with a SELECT statement against the schema. The “GenerateQuery” prompt generates the query based on the schema and objective. We intentionally employ low/no-shot approaches for these prompts to highlight the model’s core capabilities.

NL2SQL, while demonstrating impressive capabilities, may still exhibit false-positive and false-negative behaviors in response to ambiguous or conflicting directions. We have observed that the model responds well to semantic clarity and nuances in objective queries. It can reason over the schema and generate appropriate queries based on semantic cues. Additionally, it can handle queries that specify the desired shape of the resulting dataset, resulting in more specific and informative outputs.

However, the model has limitations when it comes to reasoning over the actual data. It lacks semantic understanding of the data content or the ability to apply reasoning beyond the structure of the schema. Consequently, it may generate invalid queries in certain cases.

Overall, NL2SQL serves as a powerful tool for querying relational databases using natural language expressions. It provides a glimpse into the capabilities and limitations of GPT-4 in generating relevant SQL queries.