Functions for basic SQL validation

Motivation

Sometimes, users may make some syntax errors that in turn makes the formatter fail. We want therefore to catch this errors before formatting and give the user a hint to where the error lies

Missing semicolon to separate queries

One mistake that would make the formatter fail is if the SQL queries are not properly delimited by semicolon. We therefore make a basic validation and look for the keyword CREATE appearing twice in a query. As this cannot happen within one query, the validation should fail and point the user out that she / he may have forgotten a semicolon

validate_semicolon[source]

validate_semicolon(s)

Validate query s by looking for forgotten semicolon. The implication could be the keyword CREATE appearing twice

assert_and_print(
    validate_semicolon(
"""
create or replace table my_table as
select asdf, qwer from table1

create view my_view as select asdf from my_table
"""
    ), {"exit_code": 1, "val_lines": [2, 5], "total_lines": 5}
)
{'exit_code': 1, 'total_lines': 5, 'val_lines': [2, 5]}

This should not throw an error because it is not CREATE TABLE / VIEW twice but CREATE TASK + CREATE TABLE

assert_and_print(
    validate_semicolon(
"""
create or replace task my_task as
create or replace table my_table as
select asdf, qwer from table1;

"""
    ), {"exit_code": 0, "total_lines": 5}
)
{'exit_code': 0, 'total_lines': 5}
assert_and_print(
    validate_semicolon(
"""
create or replace table my_table as
select asdf, qwer from table1

"""
    ), {"exit_code": 0, "total_lines": 4}
)
{'exit_code': 0, 'total_lines': 4}

Unbalanced parenthesis

validate_balanced_parenthesis[source]

validate_balanced_parenthesis(s)

Validate query s by looking for unbalanced parenthesis

exit_code:

  • 0 = balanced parenthesis
  • 1 = unbalanced parenthesis, too many (
  • 2 = unbalanced parenthesis, too many )
    validate_balanced_parenthesis("() () ( () )")
{'exit_code': 0, 'total_lines': 0}
assert_and_print(
    validate_balanced_parenthesis("() () ( () )"),
    {"exit_code": 0, "total_lines": 0}
)
{'exit_code': 0, 'total_lines': 0}
assert_and_print(
    validate_balanced_parenthesis(
"""
(
(
)
(
"""
    ),
    {"exit_code": 1, "val_lines": [2, 5], "total_lines": 5}
)
{'exit_code': 1, 'val_lines': [2, 5], 'total_lines': 5}
assert_and_print(
    validate_balanced_parenthesis(
"""
( )
-- ) ( )( ) ()
)
"""
    ),
    {"exit_code": 1, "val_lines": [4], "total_lines": 4}
)
{'exit_code': 1, 'val_lines': [4], 'total_lines': 4}
assert_and_print(
    validate_balanced_parenthesis("( )( )"),
    {"exit_code": 0, "total_lines": 0}
)
{'exit_code': 0, 'total_lines': 0}

Unbalanced case ... end

Sometimes we may forget to write the end of a case statement

validate_case_when[source]

validate_case_when(s)

Validate query s looking for unbalanced case ... end

assert_and_print(
    validate_case_when(
"""
select asdf,
case when bla bla as asdf, -- some case when in comments
qwer
from table1
"""
    ),
    {"exit_code": 1, "val_lines": [3], "total_lines": 5}
)
{'exit_code': 1, 'total_lines': 5, 'val_lines': [3]}
assert_and_print(
    validate_case_when(
"""
select asdf,
case when bla bla end as asdf, -- some case when in comments
qwer
from table1
"""
    ),
    {"exit_code": 0, "total_lines": 5}
)
{'exit_code': 0, 'total_lines': 5}
assert_and_print(
    validate_case_when(
"""
select asdf,
case when bla bla end as asdf, -- some case when in comments
qwer,
case when something else as qwer
from table1
"""
    ),
    {"exit_code": 1, "val_lines": [5], "total_lines": 6}
)
{'exit_code': 1, 'total_lines': 6, 'val_lines': [5]}
assert_and_print(
    validate_case_when(
"""
select asdf,
when bla bla end as asdf, -- some case when in comments
qwer
from table1
"""
    ),
    {"exit_code": 1, "val_lines": [3], "total_lines": 5}
)
{'exit_code': 1, 'total_lines': 5, 'val_lines': [3]}
assert_and_print(
    validate_case_when(
"""
create or replace transient table my_table as /* some table */
select asdf,
qwer,
case when asdf >=1 
and -- some comment
asdf<=10 and substr(qwer, 1, 2) = 'abc' 
and -- some comment
substr(qwer, 3, 2) = 'qwerty' then 1 /* another comment */
    else 0 end as case_field,
substr(case when asdf=1 then 'a' else 'b' end, 1, 2) as end_file,
asdf2,
asdf3
from table1
"""
    ),
    {"exit_code": 0, "total_lines": 14}
)
{'exit_code': 0, 'total_lines': 14}