Are we wasting time while working in Excel?
Excel is all about data and formulas. In general we follow this process while working with Excel
- Get data from somewhere
- Clean it up if it not the way you want
- Summarize it and make it readable and easy to interpret
- Analyze, forecast, decide, plan, monitor … the real use of Excel
- Make reports and send to others
Step 4 is the most important part of Excel. But do we spend maximum time on using the data? Probably not. We end up spending more time on the getting / cleaning / summarizing part of it… which is really a waste of time. It is non-value-adding part of our work. Unfortunately, this needs to be done in order to analyse the data.
Why do we spend so much time in cleaning up and creating reports? The primary reason is because the source data is not in an optimal format. If you begin with a wrong format and then continue with it, the time required for any activity increases and the complexity also increases.
This way you as well as everyone who needs the data is wasting time.
Before we can clean up bad data, we must understand what is good data.
What is good data?
I hope all of you understand and use Excel tables by now. The answer is simple… If your data can be converted to an Excel table, then it is good data. Any data which is not good data is bad data. Simple.
This is good data.
I wish it was so simple. The concept is simple, but there are too many varieties of bad data… some even look like good data but actually it is bad.
Examples of bad data
It is important for us to understand how bad data looks – so that you can recognize it when you see it.
Blank rows
This type of data looks tabular but every row has an empty row below it. Why did someone add those extra rows? Probably because they wanted to space the display of rows of data. But this data cannot be used for sorting, filtering, pivot tables or even simple formulas. Why? Because the blank rows interfere with everything.
Cross tab
This is one of the most commonly occurring bad format. Here we have column headings as well as row headings. This actually looks like a report. But data has been entered into it manually.
Most people think that if they enter data in the same format as the final report, their work is reducing. Unfortunately it is not true. Because we always need more analysis than the final report.
In this example, the attendance has been put exactly as the attendance register looks.
First column (employee id) is good because it has heading in the first row and data in the remaining rows. The problem starts from the second column. Its heading is number 1. What is this? Actually it is 1st Jan 2012. Number one is not really a heading. It is actually data.
To make matters worse, the data is also not complete. It is not even a date. It is just number 1. How do we know which month? Month is mentioned as a sheet name. For every month a new sheet will be added and for every year a new file will be created.
Now if you want to do analysis across months or years it is a complete nightmare. You will require hours of complex formula editing to do any analysis.
Ideally this data should have been in just three columns.
This way, the data would grow vertically – across months and even years if required. Excel can now handle 1 million rows… so vertically growing data is very easy to handle. There would be only 1 sheet and one file.
All the analysis can now be done instantly using Pivot tables. Is that not nice?
Good data in multiple sheets = Bad data
Sometimes we have good data to start with, but we spoil it by splitting it into multiple sheets… Why? May be because the data comes from multiple locations – and each location data is kept in separate sheets…
This way, you will need to create separate pivot tables for each sheet or end up with complex formulas which work across sheets.
The solution is simple… what goes into sheet names should become a column. This way you can instantly analyze the data using Pivot Tables.
Data with more than one meaning in one column
Each column should have only 1 meaning. If you put two meanings into one column it again creates problems…
category in the same column. This way, getting subtotals for category becomes manual work. If these two were in two different columns, Pivot Table would give the subtotals instantly.
Sometimes, especially in financial data or BOM type of data there can be upto 4 to 5 levels of categories in a single column.
The best way is to give a separate column for each meaning.
Try to make good data
Now, start noticing the data you have and see if you can convert it into better format… In the next article, we will see how to convert Cross Tab data into simple, good data.
Pingback: TechNet Blogs