Stay up to date to the latest news
on Data Activation
By Othmane Lamrani · May 3, 2022 · 12 min read
In this guide, we will look at the Top 10 SQL functions to clean your data in No Code.
A Complete Guide with Examples to help you get started.
We’ll discuss why data cleansing is important and how you should go with it.
If you’re fed up writing too many queries or if you have never used SQL, you will find in this post how to use drag-and-drop No-Code SQL functions in your workflow to clean large amounts of data.
Here’s a list of 10 SQL functions that you can consider using to clean your data in No Code.
You may have heard some of these terms and wondered what they meant prior to reading this article so let’s begin!
Have you ever wanted to drop the duplicates in your data set? Or maybe you wanted to remove the non-numeric values from your column?
I know that’s why you’re here.
You have a data set with dirty rows and you want to clean that data as quickly as possible to activate it.
Well, cleaning data is not an easy task. If you have a small dataset, then you can easily do it by looking at it for some time and then removing the unwanted rows manually.
But what if your dataset is big? What if you have millions of rows in the dataset and cleaning them manually is not possible?
What if you want to combine multiple datasets from different data sources?
You need some SQL functions to help you out of this situation.
So let’s see which functions will help us easily clean our datasets with a No-Code editor.
We are going to take this ‘raw’ Order table database with some ‘errors’ where we are going to operate SQL functions to clean the data.
ordnum | ordamount | advanceamount | orddate | custcode | agentcode |
200100 | 1000 | 600 | 2022/05/02 | C00013 | A003 |
200110 | 3003 | 500 | 2022/04/18 | C00019 | A010 |
200107 | 4500 | 900 | 2022/03/02 | C00007 | A010 |
200112 | 2000 | 400 | 2022/05/02 | C00016 | A011 |
200113 | 4000 | 600 | 2022/04/09 | C00022 | A012 |
200102 | 2009 | 300 | 2022/05/02 | C00012 | A012 |
200114 | 3500 | 2000 | 2022/05/07 | C00002 | a1006 |
200122 | 2500 | 400 | 2022/04/09 | C00003 | a1006 |
200118 | 500 | 100 | 2022/05/05 | C00023 | a1006 |
200119 | 500 | 110 | 2022/04/22 | C00025 | a1006 |
200119 | 500 | 110 | 2022/04/22 | C00025 | a1006 |
200111 | 2000 | 400 | 2022/03/02 | C00011 | A0199 |
200222 | 2022/03/12 | C00033 | A003 |
Columns operations are a set of simple functions that let you manipulate the columns of your data frame.
They are used to select, sort, drop, and rename the columns of your data frame.
Column operations are very important for cleaning up messy data, so it’s good to become familiar with them.
The four basic column operations are:
Example Select:
We want to calculate the running average revenue and total revenue for each agent from the first of April 2022.
SQL Query:
SELECT orddate, agentcode, AVG(ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_avg_revenue,
SUM (ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_total_revenue
FROM Purchase
WHERE orddate > ‘2022/04/01’
No-code:
Results
Removing duplicate rows is a common thing that you will have to do in your data cleaning process.
In fact, the first thing that I do when I am given a new dataset is to check for duplicate rows.
This can be done with SQL queries as well. The queries below will get rid of all the duplicates, returning only unique rows. How does it work?
You can use the Drop Duplicate function, it returns the table without the duplicated element.
Besides, you can also use the DISTINCT keyword which returns only distinct (different) values.
So, when you put DISTINCT before the SELECT statement, it returns only one row for each group of duplicates (here the group of duplicates is defined by all columns).
We want to calculate the running average revenue and total revenue for each agent.
Moreover, we want to uppercase the letters of the agent code a1006
Example Drop duplicate:
We want to display the columns ordnum, ordamount, orddate, agentcode by deleting the duplicates (ordnum 200119 is duplicated in the table).
SQL Query:
SELECT DISTINCT(ordnum),
ordamount,
orddate,
agentcode,
FROM Purchase
No-code:
Results
When dealing with data, it’s important to remove extra spaces in the relevant fields.
This is especially true when using SQL functions to clean your data.
In some cases, you’ll have redundant spaces between words, or even at the beginning and end of the string.
You can use a number of methods to remove these unwanted spaces for a cleaner dataset.
We’ll go through the “remove spaces” family of functions:
Example Left Trim:
We want to calculate the running average revenue and total revenue for each agent
Besides, we want to remove the space in the cell of the agent A0199.
SQL Query:
SELECT orddate, LTRIM(agentcod), AVG(ordamount)
OVER (
PARTITION BY agentcode
ORDER BY orddate
) runningagent_avgrevenue,
SUM (ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_total_revenue
FROM Purchase
No-code:
Results
The following functions are used to clean your data that are in number format.
This can be done by removing all invalid values or by converting a number to a string.
Example Round:
We want to calculate the running average revenue and total revenue for each agent.
In addition, we want to round to 0 decimal the average revenue.
SQL Query:
SELECT orddate, agentcode, ROUND(AVG(ordamount)) OVER (
PARTITION BY agent_code
ORDER BY ord_date
) running_agent_avg_revenue,
SUM (ord_amount) OVER (
PARTITION BY agent_code
ORDER BY ord_date
) running_agent_total_revenue
FROM Purchase
No-code:
Results
There are a few functions available in SQL that will allow you to clean up your text values and make them consistent.
Example Upper:
We want to calculate the running average revenue and total revenue for each agent. Moreover, we want to uppercase the letters of the agent code a1006.
SQL Query:
SELECT orddate, UPPER(agentcode), AVG(ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_avg_revenue,
SUM (ordamount) OVER (
PARTITION BY agentcode
ORDER BY orddate
) running_agent_total_revenue
FROM Purchase
No-code:
Results
There are three functions that are very helpful in cleaning up data fields that include dates.
These functions help you if your dates are stored as strings, or if you need to change the format of a date field.
Example Cast to date, Extract:
We want to display the columns ordnum, ordamount, orddate.
In addition, we want to transform orddate variable(string) into date so we can add a column year.
SQL Query:
SELECT ordnun,
ordamount,
orddate,
CAST(orddate AS DATE)as orddate2,
EXTRACT CAST(orddate AS DATE) as Year
FROM Purchase
No-code:
Results
Let’s see how you can use SQL functions to handle missing values (Fill NA, Drop NA, and If null):
Example If null:
We want to display the columns ordnum, ordamount, orddate, agentcode by replacing the empty value by 0 of ordamount for the ordnum 200222).
SQL Query:
SELECT ordnun,
IFNULL(ordamount,0),
orddate,
agentcode,
FROM Purchase
No-code:
Results
When you first use a new database or table, it will often contain columns that are encoded incorrectly.
This problem can easily be solved with the DECODE function, which takes two arguments: the encoded value and the desired value.
One of the problems you might run into with columnar data like that in a table is that its entries may have inconsistent lengths.
This can make it difficult to read if each row has a different number of characters.
There are three functions that can help you to handle this problem.
Example Length:
We want to display the columns ordnum, ordamount, orddate, custocde and the length of characters of the variable custcode.
SQL Query:
SELECT ordnun,
ordamount,
orddate,
custcode,
Length(custcode) as custcode_length
FROM Purchase
No-code:
Results
It is an important function that allows you to transform a JSON string into a normalized format.
This function is used to convert paths and arrays into single-value elements, or vice versa. It also removes any trailing commas from objects and arrays.
It allows you to transpose rows and columns in a table so that each value appears only once for each group of values with the same grouping identifier.
This technique is commonly used when exporting data from a database into a spreadsheet or report as it allows us to easily analyze large amounts of data at once.
It is the opposite of Pivot. It returns rotating columns of a table-valued expression into column values.
If you’re interested in starting with a Drag&Drop SQL Editor, check out the RestApp website or book a demo directly.
Additional resources:
Subscribe to our newsletter
Product
Solutions
Company
Rest Solution © 2022, The Data Activation Platform with No Code.
Privacy Overview
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Stay up to date to the latest news
on Data Activation