Feature image Blog post Top 5 SQL challenges

Top 5 SQL challenges preventing you from mastering it - Feedback from 100+ Data professionals!

By Brian Laleye · July 19, 2022 · 12 min read

SQL is one of the most important data languages in the world.

It’s used by millions of people and it’s everywhere. It stands for Structured Query Language. It is the most common language used to interact with databases and by extension, this is the go-to language for data transformation.

All this makes SQL seem easy to use, right?

Unfortunately, using SQL is not as easy as it looks like. It’s a complex programming language and mastering it is not an easy task.

This is why I’ve contacted during several weeks more than 1,000 data professionals (Data Engineers, ETL & DBA experts, Data analysts and Data scientists) through Linkedin and asked them the following question: “Could you help me by sharing your 3 most important obstacles when using SQL?”

If you’re just getting started with SQL, then you’ll probably find that it takes some time to get used to the language and its nuances.

If you’ve been using SQL for a while, then I’m sure you’ve experienced these obstacles.

So, let’s dive into the main obstacles encountered while using classic SQL editors based on 100+ data enthusiasts’ answers!

Top 5 challenges shared by 100+ Data Professionals

There are many reasons why SQL is hard to master:

  • It’s a declarative language, which means that it doesn’t contain procedural constructs like loops and conditionals.
  • SQL is a relational language, which means that it has no data types or object orientation.
  • The vocabulary of SQL is very large and diverse; you have to learn all about relational algebra, SQL standards, database management systems (DBMS) and other topics before you can even start writing queries. Especially, when it comes to cleaning and transforming data from different databases.
 

Let’s look at the main obstacles that make learning SQL so difficult.

SQL dialects and syntax: Learning SQL dialects & syntax is cumbersome

The first obstacle is the syntax.

It’s a shame that so many people have problems with SQL because it has a very simple syntax and it’s not that hard to learn.

In fact, you can learn it in less than a day if you have someone showing you the basic commands.

SQL has a very complex syntax because it was designed to be relational, which means you can join data from multiple tables in the same database.

But this also makes it hard to understand if you’re not familiar with it.

Examples of answers from Data professionals:

“Long complex query syntax makes it difficult to come back to.”

“Difference of syntax depending on the type of database.”

“Missing some functions or the code of tools (Snowflake, BigQuery, PostgreSQL) have a slightly specific syntax for the same object.”

“They are not flexible and hard to remember the syntax…”

“Initially it was a little different to also learn the syntax of the tool I use (because the language is similar), but for being different I was lost in the beginning.”

“SQL now offers so many choices for grouping and aggregation that even experienced users can become confused.”

Interoperability of versions: Need to learn and adapt to each version

The second obstacle is the fact that there are many versions of SQL.

There are two main versions of SQL: 

  • ANSI SQL (the most common version)
  • ISO SQL (the international standard)
 

Additionally, there are other variations that include MySQL, Microsoft Access, Oracle and PostgreSQL etc., which means that if you want to learn how to use these different tools, then there will be different syntaxes that you need to know about.

This makes mastering SQL even harder because you have to learn all these different variations at once!

If you are just starting out, then you will want to start with basic SQL queries and then move on to MySQL or PostgreSQL.

However, if you want to go beyond that and use other database engines such as Oracle or Microsoft SQL Server, then you will need more training and experience in order to make those work properly. 

Examples of answers from Data professionals:

“Different datatypes and codepages(collating) when you merge data: e.g. datetime in SQL Server has 3 digits offset, in DB2 we have 6. You have to pay attention in every type, depending on database engine and casting data.”

“Versioning issues and difficulties associated when co-working are also major obstacles.”

“Some functions are specific for SQL dialect - e.g. in SQL server there are no RegEx string operation embedded. You must write some code (function) to solve it. It is easy, when you have sufficient rights to database, but try to do it without writing function…”

Query speed and efficiency

The SQL language has been around for decades and is still a popular way to query data in a relational database.

But the language wasn’t designed for today’s big data applications.

SQL databases are optimized to work with predefined, fixed-length columns.

This makes it easy to write a query that retrieves the data you need, but it doesn’t support queries that require dynamic filtering or sorting.

While many database management systems have ways to optimize these queries, they are rarely as efficient as they could be because they don’t have information about how your data is structured or what operations you will perform on it.

 

Thus, the biggest problem with SQL is that it’s not very efficient.

If you’re trying to run a query across millions of rows, it can take forever to return results. That’s because each row needs to be read from disk and then processed by the CPU before returning the next row.

Consequently, many developers don’t know how to use it properly or efficiently because they just don’t have enough experience with SQL queries yet.

They might have spent some time working with databases but they still don’t know how to use them properly and this can lead to problems later on when trying to scale.

Examples of answers from Data professionals:

“Recent graduates will face a problem in writing the query in a efficient way as it need some time for them to get experience.”

“DB are resource expensive to store procedures and views.”

“High CPU usage which can occur for different reasons such as Poorly-Written Queries, Temporary table usage, extreme compilations and recompilations, system Threads (spikes), etc.”

“Limitations of memory querying.”

“Handling data volumes (exponential growth).”

“SQL not to be very effective when it comes to large and distributed data.”

“Subqueries that may slow down the performance.”

“Performances are important when operating with quite amount of data. A simple working query is not the same as a good performant state-of-the-art query. Experience required in this field too.”

“Performance factors are not obvious, since the SQL optimizer calculates the data retrieval method, and folks get poorly performing queries when they try to do too much in one step.”

Understanding table structure and table relationships in SQL is hard

One of the most common mistakes that could be made when learning SQL is to jump right in and start writing queries without learning the basics first.

It’s like trying to ride a bike without ever having pedaled one before.

SQL is a declarative language, which means it doesn’t specify how to perform operations on data.

Instead, it specifies what operations should be performed and leaves it up to the database system to figure out how to do it.

This is great in theory because it allows databases to be portable across platforms, but in practice it means that SQL queries can become very complex because they need to take into account all possible ways that data can be related.

For example, if you’re trying to find all sales made by user A and then filter those sales by date range and product category, you’ll have to write a separate query for each way that user A’s sales data could be related (e.g., by date range or product category).

In contrast, a procedural language would allow you to write one query that says something like “Find all sales made by user A and then filter them by date range.”

This would work because procedural languages give you more control over how operations are performed.

Examples of answers from Data professionals:

“A visualized way of being able to join tables based on columns needed and have the interface be able to manage all that.”

“Many tables available and not a clear view of how they are connected.”

“Finding the right table, working out how the table is meant to work to check it does what I need.”

“When reading / writing a SQL query, you would typically start from the middle, then go to the end and then read the very beginning. It's unlike a programming code.”

“Another thing would be to have a preview of the interconnection, or suggestions on what are other tables that can potentially be useful for joining and getting additional data from. Especially, when I might not know what other information could be relevant and available in e.g. a full data warehouse.”

“One of the most important points is to know how to connect the tables in such a way that the estimated cost is optimal. This is not always trivial since, in large clients, the maintenance of the applications is usually done by patching, including fields in the joins, without doing a previous analysis and this is then complicated to 'fix'.”

Non-intuitive user interface

The rigid structure of SQL statements makes it difficult to write long scripts.

You have to break up long scripts into multiple statements, which can make them more complex than necessary and harder to read.

SQL doesn’t provide native support for functions or procedures, so you must use stored procedures written in other languages such as Java or C++ instead of using native SQL functions such as SUM() or AVG().

This can make your queries harder to maintain because they require changes throughout your application whenever there are updates made to the stored procedure codebase.

 

SQL doesn’t provide a way to pass parameters between stored procedures or database triggers (which run automatically after certain events occur), so developers often resort to using global variables, which makes their code harder to understand and maintain over time as well as increasing the risk of bugs due to shared state between different modules in an application.

Thus, in most cases, the interface consists of lots of text with no clear structure.

This can be really confusing for beginners and even for more experienced users who have never seen this type of interface before.

Examples of answers from Data professionals:

“The biggest challenge for me - speaking as a developer now - in using SQL is that the interface between SQL and other languages seems really clunky.”

“The interface to be not very user friendly.”

“User interface and its lack of visibility into errors, finding schemas.”

Another challenge: Sharing knowledge with teammates

Sharing your queries and insights with your teammates can be very beneficial.

It helps you to understand the problem from different perspectives and get business context, and it also increases the number of minds working on the same problem.

It should be possible to do the same thing with queries, but this is not the case.

Why is it challenging to share knowledge with teammates when using SQL queries? 

Here are the main reasons: 

  • Queries are written individually not by teams
  • Queries are not standardized
  • Queries are not comparable
 

Another challenge is that people don’t have any easy way of comparing two different queries (like git diff does for code) since there is an infinite way to write a query in SQL.

This makes it difficult for team members who want to collaborate on projects or who want to learn from each other.

Examples of answers from Data professionals:

“Scalability and portability to non technical users/other users when documentation is scarce or original owners are no longer there.”

“The need to have business context.”

“Discrepancies when different people write different queries.”

“The main challenge I face with SQL is understanding the existing query if there are hundreds of line and if no comments are added and the other will be fine tuning the query.”

“Analyzing the complex queries written by people who have left the organization and optimizing it while revamping the tableau dashboards.”

“If you have also poor documentation without collating description, you must guess. It is really a Babel Tower if you work on many databases with local languages support.”

“Readability of complex queries written in the past.”

At RestApp, we have built the Data Activation Platform for modern data teams and designed our next-gen data modeling editor to be intuitive and easy-to-use.

If you’re interested in starting with your data journey, check out the RestApp website or book a demo.

Category

Share

Subscribe to our newsletter

Brian Laleye
Brian Laleye
Brian is the co-founder of RestApp. He is a technology evangelist and passionate about innovation. He has an extensive experience focusing on modern data stack.
Share this article
Subscribe to our newsletter
Ready to experience data activation
without code?