City Research Online

Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example

Fariha, A., Cousins, L., Mahyar, N. ORCID: 0000-0003-1781-0029 & Meliou, A. (2026). Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example. Information Systems, 138, article number 102687. doi: 10.1016/j.is.2026.102687

Abstract

Traditional relational data interfaces require precise structured queries over potentially complex schemas. These rigid data retrieval mechanisms pose hurdles for nonexpert users, who typically lack programming language expertise and are unfamiliar with the details of the schema. Existing tools assist in formulating queries through keyword search, query recommendation, and query auto-completion, but still require some technical expertise. An alternative method for accessing data is query by example (QBE), where users express their data exploration intent simply by providing examples of their intended data and the system infers the intended query. However, existing QBE approaches focus on the structural similarity of the examples and ignore the richer context present in the data. As a result, they typically produce queries that are too general, and fail to capture the user’s intent effectively. In this article, we present SQuID , a system that performs semantic-similarity-aware query intent discovery from user-provided example tuples. Our work makes the following contributions: (1) We design SQuID : an end-to-end system that automatically formulates select-project-join queries with optional group-by aggregation and intersection operators – a much larger class than what prior QBE techniques support – from user-provided examples, in an open-world setting. (2) We express the problem of query intent discovery using a probabilistic abduction model that infers a query as the most likely explanation of the provided examples. (3) We introduce the notion of an abduction-ready database, which precomputes semantic properties and related statistics, allowing SQuID to achieve real-time performance. (4) We present an extensive empirical evaluation on three real-world datasets, including user intent case studies, demonstrating that SQuID is efficient and effective, and outperforms machine learning methods, as well as the state of the art in the related query reverse engineering problem. (5) We contrast SQuID with traditional SQL querying through a comparative user study, which demonstrates that users with varying expertise are significantly more effective and efficient with SQuID than SQL . We find that SQuID eliminates the barriers in studying the database schema, formalizing task semantics, and writing syntactically correct SQL queries, and, thus, substantially alleviates the need for technical expertise in data exploration.

Publication Type: Article
Additional Information: © 2026 The Author(s). Published by Elsevier Ltd. This article is available under the Creative Commons CC-BY-NC-ND license and permits non-commercial use of the work as published, without adaptation or alteration provided the work is fully attributed.
Publisher Keywords: Query by example, Abductive reasoning, User studies
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Science & Technology
School of Science & Technology > Department of Computer Science
SWORD Depositor:
[thumbnail of 1-s2.0-S0306437926000013-main.pdf]
Preview
Text - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login