top 10 sql function

Top 10 SQL functions to clean your data in No Code.
A Complete Guide with Examples

By Othmane Lamrani · May 3, 2022 · 13 min read

In this guide, we will look at the Top 10 SQL functions to clean your data in No Code.

A Complete Guide with Examples to help you get started.

We’ll discuss why data cleansing is important and how you should go with it.

If you’re fed up writing too many queries or if you have never used SQL, you will find in this post how to use drag-and-drop No Code SQL functions in your workflow to clean large amounts of data.

Here’s a list of 10 SQL functions that you can consider using to clean your data in No Code.

You may have heard some of these terms and wondered what they meant prior to reading this article so let’s begin!

Why is data cleansing important?

Have you ever wanted to drop the duplicates in your data set? Or maybe you wanted to remove the non-numeric values from your column?

I know that’s why you’re here.

You have a data set with dirty rows and you want to clean that data as quickly as possible to activate it.

Well, cleaning data is not an easy task. If you have a small dataset, then you can easily do it by looking at it for some time and then removing the unwanted rows manually.

But what if your dataset is big? What if you have millions of rows in the dataset and cleaning them manually is not possible?

What if you want to combine multiple datasets from different data sources?

You need some SQL functions to help you out of this situation.

So let’s see which functions will help us easily clean our datasets with a No Code editor.

We are going to take this ‘raw’ Order table database with some ‘errors’ where we are going to operate SQL functions to clean the data.

scheme sql table
ordnumordamountadvanceamountorddatecustcodeagentcode
20010010006002022/05/02C00013A003
20011030035002022/04/18C00019A010
20010745009002022/03/02C00007A010
20011220004002022/05/02C00016A011
20011340006002022/04/09C00022A012
20010220093002022/05/02C00012A012
200114350020002022/05/07C00002a1006
20012225004002022/04/09C00003a1006
2001185001002022/05/05C00023a1006
2001195001102022/04/22C00025a1006
2001195001102022/04/22C00025a1006
20011120004002022/03/02C00011  A0199
200222  2022/03/12C00033A003

 

Column operations (Select columns, Sort columns, Drop columns, Rename columns)

Columns operations are a set of simple functions that let you manipulate the columns of your data frame.

They are used to select, sort, drop, and rename the columns of your data frame.

Column operations are very important for cleaning up messy data, so it’s good to become familiar with them.

The four basic column operations are:

  • SELECT COLUMNS: To clean up your columns, there are ways to select only some of them and sort them.
  • SORT COLUMNS: This is quite simple, you write something like this and select(col1, col2) from the table and the result will be a new table with those two columns, in that order.
  • DROP COLUMNS: Or if you prefer to drop some columns, you can write something like this drop(col4, col5) from the table and the result will be a new table with those columns deleted.
  • RENAME COLUMNS: To rename them you can say something like this rename(col1 = col_X) from the table and the result will be a new table with col1 renamed as col_X.
 

Example Select columns:

We want to calculate the running average revenue and total revenue for each agent from the first of April 2022.

SQL Query:

Select columns SQL query

No Code:

select function nocode

Results: 

Result select

Handle duplicates (Distinct, Drop duplicate)

Removing duplicate rows is a common thing that you will have to do in your data cleaning process.

In fact, the first thing that I do when I am given a new dataset is to check for duplicate rows.

This can be done with SQL queries as well. The queries below will get rid of all the duplicates, returning only unique rows. How does it work?

You can use the Drop duplicate function, it returns the table without the duplicated element.

Besides, you can also use the DISTINCT keyword which returns only distinct (different) values.

So, when you put DISTINCT before the SELECT statement, it returns only one row for each group of duplicates (here the group of duplicates is defined by all columns).

We want to calculate the running average revenue and total revenue for each agent.

Moreover, we want to uppercase the letters of the agent code a1006.

Example Drop duplicate:

We want to display the columns ordnum, ordamount, orddate, agentcode by deleting the duplicates  (ordnum 200119 is duplicated in the table).

SQL Query:

				
					SELECT DISTINCT(ordnum),
ordamount,
orddate,
agentcode, 
FROM Purchase

				
			
Select Distinct SQL query

No Code:

Nocode drop duplicate

Results: 

Remove spaces (Trim, Left trim, Right trim)

When dealing with data, it’s important to remove extra spaces in the relevant fields.

This is especially true when using SQL functions to clean your data.

In some cases, you’ll have redundant spaces between words, or even at the beginning and end of the string.

You can use a number of methods to remove these unwanted spaces for a cleaner dataset.

We’ll go through the “remove spaces” family of functions:

  • TRIM: Removes all leading and trailing spaces from a string.
  • LTRIM (aka Left Trim): Removes all leading spaces from a string.
  • RTRIM (aka Right Trim): Removes all trailing spaces from a string.
 

Example Left trim:

We want to calculate the running average revenue and total revenue for each agent.

Besides, we want to remove the space in the cell of the agent A0199.

SQL Query:

				
					SELECT orddate, LTRIM(agentcod), AVG(ordamount) 
OVER (
   PARTITION BY agentcode
   ORDER BY orddate
) runningagent_avgrevenue,
   SUM (ordamount) OVER (
       PARTITION BY agentcode
       ORDER BY orddate
   ) running_agent_total_revenue
FROM Purchase

				
			
Left Trim SQL query

No Code:

left trim nocode

Results: 

Results LTRIM

Cleaning Functions for Number fields (Is numeric, Convert, Round, Trunc)

The following functions are used to clean your data that are in number format.

This can be done by removing all invalid values or by converting a number to a string.

  • ISNUMERIC: Returns true if the expression has a numeric value and false if it doesn’t.
  • CONVERT: Converts a character string into a number. It takes as argument the character string to convert and returns its numerical value.
  • ROUND: Rounds numeric values according to the rule specified by the second argument (which must be an integer between 0 and 7) or if it is missing it rounds towards zero.
  • TRUNC: Truncates an expression to a specified length (by default 6).
 

Example Round:

We want to calculate the running average revenue and total revenue for each agent.

In addition, we want to round to 0 decimal the average revenue.

SQL Query:

				
					SELECT orddate, agentcode, ROUND(AVG(ordamount)) OVER (
   PARTITION BY agent_code
   ORDER BY ord_date
) running_agent_avg_revenue,
   SUM (ord_amount) OVER (
       PARTITION BY agent_code
       ORDER BY ord_date
   ) running_agent_total_revenue
FROM Purchase

				
			
Round SQL query

No Code:

no code round scale

Results: 

Result ROUND

Cleaning Functions for Text fields (Capitalize, Lower, Upper, Sub strings, Reverse, Split string)

There are a few functions available in SQL that will allow you to clean up your text values and make them consistent.

  • CAPITALIZE: Capitalizes the first letter of each word. You can use this function if you want your column data to be displayed in such a manner.
  • LOWER: Converts all the string values in the column to lowercase letters.
  • UPPER: Converts all the string values in the column to uppercase letters.
  • SUBSTRING: Returns a portion of a text based on starting and ending positions. You can use substring if you want to extract a portion of string from a given column value.
  • REVERSE: Reverse strings all the characters in reverse order.
  • SPLIT STRING: Breaks up text into multiple rows using a delimiter (delimiter is a character that separates words).
 

Example Upper:

We want to calculate the running average revenue and total revenue for each agent. Moreover, we want to uppercase the letters of the agent code a1006.

SQL Query:

				
					SELECT orddate, UPPER(agentcode), AVG(ordamount) OVER (
   PARTITION BY agentcode
   ORDER BY orddate
) running_agent_avg_revenue,
   SUM (ordamount) OVER (
       PARTITION BY agentcode
       ORDER BY orddate
   ) running_agent_total_revenue
FROM Purchase

				
			
Upper SQL query

No Code:

Upper function

Results: 

result upper

Cleaning Functions for Date fields (Cast to date, Convert date, Extract)

There are three functions that are very helpful in cleaning up data fields that include dates.

These functions help you if your dates are stored as strings, or if you need to change the format of a date field.

  • CAST TO DATE: Converts a string or integer into a date. If you have your date written as “DD/MM/YYYY”, then you will need to use this function.
  • CONVERT TO DATE: This is similar to cast to date, but instead of requiring you to enter the exact format of your column, it uses the current language and locale on your computer to determine how the strings should be interpreted.
  • EXTRACT: This extracts only certain parts of a given timestamp, such as year, month, or day.
 

Example Cast to date, Extract:

We want to display the columns ordnum, ordamount, orddate.

In addition, we want to transform orddate variable(string) into date so we can add a column year.

SQL Query:

				
					SELECT ordnun,
ordamount,
orddate,
CAST(orddate AS DATE)as orddate2,
EXTRACT CAST(orddate AS DATE) as Year

FROM Purchase

				
			
Cast to date SQL query

No Code:

extract function

Results: 

Extract Result

Handle missing values (Fill NA, Drop NA, If null)

Let’s see how you can use SQL functions to handle missing values (Fill NA, Drop NA, and If null): 

  • Fill NA: Fills missing values with a specified value.
  • Drop NA: Drops all rows where there are any missing values.
  • IF NULL: Returns one of two results based on whether or not a column is null.
 

Example If null:

We want to display the columns ordnum, ordamount, orddate, agentcode by replacing the empty value by 0 of ordamount for the ordnum 200222).

SQL Query:

				
					SELECT ordnun,
IFNULL(ordamount,0),
orddate,
agentcode, 
FROM Purchase

				
			
If null SQL query

No Code:

ifnull nocode

Results: 

Coding verification (Encode, Decode)

When you first use a new database or table, it will often contain columns that are encoded incorrectly.

This problem can easily be solved with the DECODE function, which takes two arguments: the encoded value and the desired value.

  • DECODE: Decodes the inputColumn (binary data) in the charset type.
  • ENCODE: Encodes column input in binary data.

Any data type (Left pad, Right pad, Length)

One of the problems you might run into with columnar data like that in a table is that its entries may have inconsistent lengths.

This can make it difficult to read if each row has a different number of characters.

There are three functions that can help you to handle this problem.

  • LEFT PAD: is used to add a specified number of characters to the left side of a text. The syntax for the function is LEFT PAD(string, length, character). So if I wanted to add “0” characters to all IDs in my table until they were four digits long, I would use LEFT PAD(ID, 4, 0).
 
  • RIGHT PAD:  is used to add a specified number of characters to the right side of a string. The syntax for the function is RIGHT PAD(string, length, character). So if I wanted to add “0” characters until all IDs in my table were four digits long, I would use RIGHT PAD(ID, 4, 0).
 
  • LENGTH: is used to return the length (number of characters) for a given text. The syntax for the function is LENGTH(string).
 

Example Length:

We want to display the columns ordnum, ordamount, orddate, custocde and the length of characters of the variable custcode.

SQL Query:

				
					SELECT ordnun,
ordamount,
orddate,
custcode, 
Length(custcode) as custcode_length
FROM Purchase

				
			
Length SQL query

No Code:

Length nocode

Results: 

Length results

Bonus: 3 SQL Advanced functions to normalize your data, still in No Code: JSON Normalize, Pivot, Unpivot

  • JSON NORMALIZE

It is an important function that allows you to transform a JSON string into a normalized format

This function is used to convert paths and arrays into single-value elements, or vice versa. It also removes any trailing commas from objects and arrays.

  • PIVOT 

It allows you to transpose rows and columns in a table so that each value appears only once for each group of values with the same grouping identifier.

This technique is commonly used when exporting data from a database into a spreadsheet or report as it allows us to easily analyze large amounts of data at once.

  • UNPIVOT 

It is the opposite of Pivot. It returns rotating columns of a table-valued expression into column values.

 

If you’re interested in starting with a Drag&Drop SQL Editor, check out the RestApp website or book a demo.

Share
Share on linkedin
Share on twitter
Share on facebook

Subscribe to our newsletter

Othmane Lamrani
Othmane Lamrani
Othmane is the Growth Manager of RestApp. He's passionate about Entrepreneurship, Data, and Growth. His previous experience in data-position made him have a data-driven vision of acquisition marketing.
Share this article
Share on linkedin
Share on twitter
Share on facebook
Subscribe to our newsletter
Ready to experience data activation
without code?

Product

Activate and combine any data sources without code

Transform your data with our No Code SQL editor

Automate your operational analytics with your business apps

Share your pipelines and collaborate smarter with your teammates

Discover how Data Transformation really means

Find out the new data architecture concept of Data Mesh

Learn how Operational Analytics actives your data

Learn how to deliver great customer experience with real-time data

Solutions

Empower your teammates

Prevent churn

Generate more revenue

Run better campaigns

Drive your metrics

Resources

Stay always up to date on data activation

Get access to tips and tricks to model your data

Discover our always evolving and regularly updated documentation

Find out how we keep your data safe