Clean input data
Pay serious attention to clean input data as it is the foundation of your spreadsheet! If possible, clean it in such a way that it becomes an automated process for the next time you use the same spreadsheet, because garbage in means garbage out!
Check your input
Does the following situation sound familiar to you?
Your spreadsheet has several input sources, from different systems, the format of this data is different for each system. The data received seems to be incomplete, and it also includes a lot duplicates. One system uses points for decimals, the other uses commas. It contains complete tables that are of no use to you at all, and the date used is 11.03.2018, does this mean 3 November 2018 or 11 March 2018? So many issues and the deadline for your data analysis is tomorrow…
Most Excel users dislike the process data cleansing and prefer to start their analysis straight away, but don’t underestimate the importance of clean input data! No matter how much pressure there is from your organization to deliver the analysis quickly.
Quick wins in the dataset?
Often there is a lot you can do in the dataset itself to prepare it for analysis: replace points with commas, delete lines, delete duplicates, etc. Please note: a manually modified dataset is very difficult to validate by controllers and accountants. The connection between the original input data and the modified “Excel source data” is a well-known frustration. So, when possible, clean your data using formulas and reserve separate tabs for the original source data (1 source, 1 tab) and the final, clean input data.
Do not import more than necessary
Do you get more information from your input source than you actually need? Don’t burden your file with it. Imported tables that you don’t use shouldn’t be included in your spreadsheet. If possible, run the export scripts of other systems in such a way that they don’t export useless, additional columns. At the very least, make sure that the tab with the clean input data does not contain the unnecessary data.
Have numbers been recognized as numbers?
In some import files numbers are not recognized as such in Excel. Figures remain text fields, with all kinds of annoying consequences for more complex analyses. A one-time trick is to select the column in question and convert it to numbers via the menu Data-> text to data (don’t ask us why it works, but it works). You will have to do this trick over and over again and then you will have to deal with the verifiability issues again. Fortunately, there is the formula, =NUMBERVALUE(A2), which helps you indicate which symbol is used for decimals in the source.
Read more about this subject in the article Pay attention to units and number formats
What to do with empty fields?
There are datasets in which, for example, a “main category” of something is listed in column A, but only if the main category is not equal to the main category of the row above. The human eye understands this, but Excel doesn’t. Fill these empty cells in the enriched data set with functions such as =IF(SOURCE!A3=”;A2;SOURCE!A3).
Unneeded spaces in text fields?
Unneeded spaces in text fields might not seem important at a first glance, but imported text data in Excel often ends up with one or more spaces. If this happens inconsistently, you’ll have problems with search functions and pivot tables. The “=TRIM(SOURCE!A2)” function is a powerful way to correct this. If you want to combine this with consistent capitalization, you can use “=PROPER(TRIM(SOURCE!A2))”.
Devote time and attention to prepare clean input data by using formulas. This will save you a lot of misery later on. Garbage in can still result in a great and clean analysis.
- It’s fast & easy to use
- Get detailed insights
- Save frustration and time
Read more about exploring Excel files
On this page you will find a step-by-step walkthrough of the PerfectXL Explore Tool. This page will give you information on how you can use explore to gain a better understanding of complex Excel files.
This is the technical page for PerfectXL Explore. Here you will find information like system requirements, release notes, and answers to most questions about our software. If you have more questions please use the form at the bottom of the page.
In this video we’ll show you how to run a quick basic check of the input sources of your Excel files and how to determine if there is any unknown input. We’ll also show you a powerful way to visualize your spreadsheet’s information flow to help determine the function and dependability of each worksheet.
When we read a book or a report or a website, we read from top to bottom and from left to right. Keep this in mind when you create an Excel file. Try your best to follow that structure both within each worksheet and between them. We advise you to distinguish separate input & output sheets as well as sheets for calculations, documentation and dashboards.
Numbers without meaning are worthless. Excel is a program for numbers, but it is only useful if it is clear to the reader or user what these numbers mean. So, make sure that you provide textual information for all data and formulas, let’s call these pieces of text information labels.
Treat transferring a spreadsheet like you would treat the process of renting your house via Airbnb. Most people realize it will stay neater if you clean it first. Mess stimulates mess, and neatness stimulates neatness. Remember that in an average spreadsheet there will be 12 different users that have to use it after you! What do you think happens when this chain starts with a messy file?
Understanding the workings of a spreadsheet is key to correctly interpreting its results. Spreadsheet visualization is an essential feature of PerfectXL Explore. It allows you to create a graphical model of the general flow of information, which also shows of risks, input & output and data & formula flows.
Curious to see what your spreadsheet’s all about? PerfectXL Explore gives you a great starting point. An overview of worksheets, input sources and potential risks are all part of our general spreadsheet characteristics section. It also gives you an insight into the amount of formulas used in your spreadsheet.
Be honest with yourself, how much time does it take you to spit through an Excel file? The opportunities in Excel are endless but this comes with a downside. PerfectXL has the solution. Starting now you can gain a better understanding of your spreadsheets, to help you in improving, documenting, and cleaning up your files. Here are 9 tasks that PerfectXL Explore makes easy.