
Top 10 SQL functions
to clean your data in No Code.
A Complete Guide with Examples
By Brian Laleye · October 28, 2022 · 13 min read
In this guide, we will look at the Top 10 SQL functions to clean your data in No Code.
A Complete Guide with Examples to help you get started.
We’ll discuss why data cleansing is important and how you should go with it.
If you’re fed up writing too many queries or if you have never used SQL, you will find in this post how to use drag-and-drop No Code SQL functions in your workflow to clean large amounts of data.
Here’s a list of 10 SQL functions that you can consider using to clean your data in No Code.
You may have heard some of these terms and wondered what they meant prior to reading this article so let’s begin!
Summary
Why is data cleansing important?
Have you ever wanted to drop the duplicates in your data set? Or maybe you wanted to remove the non-numeric values from your column?
I know that’s why you’re here.
You have a data set with dirty rows and you want to clean that data as quickly as possible to activate it.
Well, cleaning data is not an easy task. If you have a small dataset, then you can easily do it by looking at it for some time and then removing the unwanted rows manually.
But what if your dataset is big? What if you have millions of rows in the dataset and cleaning them manually is not possible?
What if you want to combine multiple datasets from different data sources?
You need some SQL functions to help you out of this situation.
So let’s see which functions will help us easily clean our datasets with a No Code editor.
We are going to take this ‘raw’ Order table database with some ‘errors’ where we are going to operate SQL functions to clean the data.
Column operations (Select columns, Sort columns, Drop columns, Rename columns)
Columns operations are a set of simple functions that let you manipulate the columns of your data frame.
They are used to select, sort, drop, and rename the columns of your data frame.
Column operations are very important for cleaning up messy data, so it’s good to become familiar with them.
The four basic column operations are:
- SELECT COLUMNS: To clean up your columns, there are ways to select only some of them and sort them.
- SORT COLUMNS: This is quite simple, you write something like this and select(col1, col2) from the table and the result will be a new table with those two columns, in that order.
- DROP COLUMNS: Or if you prefer to drop some columns, you can write something like this drop(col4, col5) from the table and the result will be a new table with those columns deleted.
- RENAME COLUMNS: To rename them you can say something like this rename(col1 = col_X) from the table and the result will be a new table with col1 renamed as col_X.
Example Select columns:
We want to calculate the running average revenue and total revenue for each agent from the first of April 2022.
SQL Query:
SELECT orddate, agentcode, AVG(ordamount) OVER(
PARTITION BY agentcode
ORDER BY orddate
) running_agent_avg_revenue,
SUM(ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_total_revenue
FROM Purchase
WHERE orddate > '2022/04/01'
No Code:
Results:
Handle duplicates (Distinct, Drop duplicate)
Removing duplicate rows is a common thing that you will have to do in your data cleaning process.
In fact, the first thing that I do when I am given a new dataset is to check for duplicate rows.
This can be done with SQL queries as well. The queries below will get rid of all the duplicates, returning only unique rows. How does it work?
You can use the Drop duplicate function, it returns the table without the duplicated element.
Besides, you can also use the DISTINCT keyword which returns only distinct (different) values.
So, when you put DISTINCT before the SELECT statement, it returns only one row for each group of duplicates (here the group of duplicates is defined by all columns).
We want to calculate the running average revenue and total revenue for each agent.
Moreover, we want to uppercase the letters of the agent code a1006.
Example Drop duplicate:
We want to display the columns ordnum, ordamount, orddate, agentcode by deleting the duplicates (ordnum 200119 is duplicated in the table).
SQL Query:
SELECT DISTINCT(ordnum),
ordamount,
orddate,
agentcode,
FROM Purchase
No Code:
Results:
Remove spaces (Trim, Left trim, Right trim)
When dealing with data, it’s important to remove extra spaces in the relevant fields.
This is especially true when using SQL functions to clean your data.
In some cases, you’ll have redundant spaces between words, or even at the beginning and end of the string.
You can use a number of methods to remove these unwanted spaces for a cleaner dataset.
We’ll go through the “remove spaces” family of functions:
- TRIM: Removes all leading and trailing spaces from a string.
- LTRIM (aka Left Trim): Removes all leading spaces from a string.
- RTRIM (aka Right Trim): Removes all trailing spaces from a string.
Example Left trim:
We want to calculate the running average revenue and total revenue for each agent.
Besides, we want to remove the space in the cell of the agent A0199.
SQL Query:
SELECT orddate, LTRIM(agentcod), AVG(ordamount)
OVER (
PARTITION BY agentcode
ORDER BY orddate
) runningagent_avgrevenue,
SUM (ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_total_revenue
FROM Purchase
No Code:
Results:
Cleaning Functions for Number fields (Is numeric, Convert, Round, Trunc)
The following functions are used to clean your data that are in number format.
This can be done by removing all invalid values or by converting a number to a string.
- ISNUMERIC: Returns true if the expression has a numeric value and false if it doesn’t.
- CONVERT: Converts a character string into a number. It takes as argument the character string to convert and returns its numerical value.
- ROUND: Rounds numeric values according to the rule specified by the second argument (which must be an integer between 0 and 7) or if it is missing it rounds towards zero.
- TRUNC: Truncates an expression to a specified length (by default 6).
Example Round:
We want to calculate the running average revenue and total revenue for each agent.
In addition, we want to round to 0 decimal the average revenue.
SQL Query:
SELECT orddate, agentcode, ROUND(AVG(ordamount)) OVER (
PARTITION BY agent_code
ORDER BY ord_date
) running_agent_avg_revenue,
SUM (ord_amount) OVER (
PARTITION BY agent_code
ORDER BY ord_date
) running_agent_total_revenue
FROM Purchase
No Code:
Results:
With RestApp, be your team’s data hero
by activating insights from raw data sources.
Cleaning Functions for Text fields (Capitalize, Lower, Upper, Sub strings, Reverse, Split string)
There are a few functions available in SQL that will allow you to clean up your text values and make them consistent.
- CAPITALIZE: Capitalizes the first letter of each word. You can use this function if you want your column data to be displayed in such a manner.
- LOWER: Converts all the string values in the column to lowercase letters.
- UPPER: Converts all the string values in the column to uppercase letters.
- SUBSTRING: Returns a portion of a text based on starting and ending positions. You can use substring if you want to extract a portion of string from a given column value.
- REVERSE: Reverse strings all the characters in reverse order.
- SPLIT STRING: Breaks up text into multiple rows using a delimiter (delimiter is a character that separates words).
Example Upper:
We want to calculate the running average revenue and total revenue for each agent. Moreover, we want to uppercase the letters of the agent code a1006.
SQL Query:
SELECT orddate, UPPER(agentcode), AVG(ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_avg_revenue,
SUM (ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_total_revenue
FROM Purchase
No Code:
Results:
Cleaning Functions for Date fields (Cast to date, Convert date, Extract)
There are three functions that are very helpful in cleaning up data fields that include dates.
These functions help you if your dates are stored as strings, or if you need to change the format of a date field.
- CAST TO DATE: Converts a string or integer into a date. If you have your date written as “DD/MM/YYYY”, then you will need to use this function.
- CONVERT TO DATE: This is similar to cast to date, but instead of requiring you to enter the exact format of your column, it uses the current language and locale on your computer to determine how the strings should be interpreted.
- EXTRACT: This extracts only certain parts of a given timestamp, such as year, month, or day.
Example Cast to date, Extract:
We want to display the columns ordnum, ordamount, orddate.
In addition, we want to transform orddate variable(string) into date so we can add a column year.
SQL Query:
SELECT ordnun,
ordamount,
orddate,
CAST(orddate AS DATE)as orddate2,
EXTRACT CAST(orddate AS DATE) as Year
FROM Purchase
No Code:
Results:
Handle missing values (Fill NA, Drop NA, If null)
Let’s see how you can use SQL functions to handle missing values (Fill NA, Drop NA, and If null):
- Fill NA: Fills missing values with a specified value.
- Drop NA: Drops all rows where there are any missing values.
- IF NULL: Returns one of two results based on whether or not a column is null.
Example If null:
We want to display the columns ordnum, ordamount, orddate, agentcode by replacing the empty value by 0 of ordamount for the ordnum 200222).
SQL Query:
SELECT ordnun,
IFNULL(ordamount,0),
orddate,
agentcode,
FROM Purchase
No Code:
Results:
Coding verification (Encode, Decode)
When you first use a new database or table, it will often contain columns that are encoded incorrectly.
This problem can easily be solved with the DECODE function, which takes two arguments: the encoded value and the desired value.
- DECODE: Decodes the inputColumn (binary data) in the charset type.
- ENCODE: Encodes column input in binary data.
Any data type (Left pad, Right pad, Length)
One of the problems you might run into with columnar data like that in a table is that its entries may have inconsistent lengths.
This can make it difficult to read if each row has a different number of characters.
There are three functions that can help you to handle this problem.
- LEFT PAD: is used to add a specified number of characters to the left side of a text. The syntax for the function is LEFT PAD(string, length, character). So if I wanted to add “0” characters to all IDs in my table until they were four digits long, I would use LEFT PAD(ID, 4, 0).
- RIGHT PAD: is used to add a specified number of characters to the right side of a string. The syntax for the function is RIGHT PAD(string, length, character). So if I wanted to add “0” characters until all IDs in my table were four digits long, I would use RIGHT PAD(ID, 4, 0).
- LENGTH: is used to return the length (number of characters) for a given text. The syntax for the function is LENGTH(string).
Example Length:
We want to display the columns ordnum, ordamount, orddate, custocde and the length of characters of the variable custcode.
SQL Query:
SELECT ordnun,
ordamount,
orddate,
custcode,
Length(custcode) as custcode_length
FROM Purchase
No Code:
Results:
Bonus: 3 SQL Advanced functions to normalize your data, still in No Code: JSON Normalize, Pivot, Unpivot
- JSON NORMALIZE
It is an important function that allows you to transform a JSON string into a normalized format.
This function is used to convert paths and arrays into single-value elements, or vice versa. It also removes any trailing commas from objects and arrays.
- PIVOT
It allows you to transpose rows and columns in a table so that each value appears only once for each group of values with the same grouping identifier.
This technique is commonly used when exporting data from a database into a spreadsheet or report as it allows us to easily analyze large amounts of data at once.
- UNPIVOT
It is the opposite of Pivot. It returns rotating columns of a table-valued expression into column values.
Conclusion
At RestApp, we’re building a Data Activation Platform for modern data teams with our large built-in library of connectors to databases, data warehouses and business apps.
We have designed our next-gen data modeling editor to be intuitive and easy to use.
If you’re interested in starting with connecting all your favorite tools, check out the RestApp website or try it for free with a sample dataset.
Discover the next-gen end-to-end data pipeline platform with our built-in No Code SQL, Python and NoSQL functions. Data modeling has never been easier and safer thanks to the No Code revolution, so you can simply create your data pipelines with drag-and-drop functions and stop wasting your time by coding what can now be done in minutes!
Discover Data modeling without code with our 14-day free trial!
Category
Subscribe to our newsletter
Related articles

- Brian Laleye
- August 9, 2022
· 9 min read

- Brian Laleye
- June 28, 2022
· 8 min read

- Othmane Lamrani
- May 10, 2022
· 6 min read
Build better data pipelines
With RestApp, be your team’s data hero by activating insights from raw data sources.