May 10, 2019 · software

Insights on SQL from Hacker News

Structured Query Language, or SQL, is the de facto standard used to create, update, and query large-scale relational databases. It is ubiquitous in business analytics, data science, finance, etc. and is probably part of your daily life as well, albeit with some degree of separation... For example, SQLite, the most used database engine in the world (and by extension most used SQL database engine), is found in Android, iOS, Windows, and Mac operating systems as well as Firefox, Chrome, and Safari browsers. Don't tell me you aren't using any of those.

SQL, being almost 50 years old, has seen many alternative databases and query languages introduced over the years, with some completely abandoning the relational structure and opting for a key/value or document structure, e.g. MongoDB, and others opting for a graph structure, e.g. Neo4j. Even with this myriad of alternative database structures, many companies still stick to SQL databases, i.e. the tried and true.

As I was learning about SQL, I stumbled upon a Hacker News discussion on why SQL is one of the most valuable skills, which is surprising to me. The reasons listed in the article as to why SQL is so valuable are SQL's versatility, SQL's long-term outlook, and the SQL skill gap between workers.

Most of the Hacker News comments agree with the author's sentiment, saying that there is immense power to be gained through SQL knowledge. One commenter even says that SQL gives power not only to programmers, but to non-programmers alike, assisted by SQL's user-friendly syntax. Some believe that this user-friendliness came at the expense of verbosity, but one commenter claims otherwise, giving an example that a well composed SQL query can save pages of code in comparison to a Java or VBA implementation.

The main consensus in the comment section was that SQL was powerful but not without fault, one example being SQL's inconsistent grammars, with SELECT (columns) coming before FROM (table), and UPDATE requiring the table to come first. One commenter wished that SQL was exposed not through a text/shell interface, but through built-in functions in the programming language, thus negating the need to poorly replicate functions such as string manipulation. Others noted the need for intermediate variables within a SQL query, and how it is solved by the introduction of WITH clauses, also known as Common Table Expressions (CTEs). Another solution to the problem was to create views (or materialized views for performance) instead of sub-queries, although this results in a secondary problem of slow rebuilds when the database changes.

Finally, an interesting theory raised by Hacker News commenters as to why SQL will remain so prevalent, is the timelessness of the math that laid the foundation for SQL, specifically, the relational algebra that described how to logically apply set theory to databases and tuple structures. In effect, SQL wasn't created through trial and error, but manifested from a set of mathematical axioms which (probably) will not change. However, this doesn't mean SQL is the magic bullet for all use cases, e.g. MapReduce is a much better solution when conducting highly-distributed processing of data.

To conclude, here are some SQL tips that were mentioned in the comment section. When carrying out any operation on a production database, make sure to prepend your SQL statement with BEGIN TRANSACTION, as this allows you to undo your changes using ROLLBACK, or commit the changes using COMMIT. When analyzing datasets, play to the strengths of both SQL and programming languages, by using the highly optimized SQL queries to shrink your dataset to the minimum size possible, and then using your libraries (such as pandas) and data structures to perform the subsequent analysis. Lastly, learn advanced SQL statements that help you create more efficient queries, including lateral joins, recursive CTEs, window functions, and the CREATE TABLE AS statement.

Further Reading