Pesky Date Formats

January 29, 2018

If you’ve worked with Excel and/or with data for a while, you’ve probably dealt with pesky date formats. Dates come in a variety of makes and models including “d/m/y”, “mm/dd/yy”, “yyyy-mm-dd” and on and on. There’s nothing worse than opening Excel, applying filters, then dropping the filter menu down to find that Excel is reading a date column as strings.

Excel date format screenshot

Normally we would see the correct month and day but instead we see a string of numbers and slashes. Aside from not being able to convert this to your preferred format, perhaps the bigger problem is that you cannot easily sort or filter this data by date and who knows if a database can correctly parse it should you try to upload the data there.

Excel

My personal favorite date format is yyyy-mm-dd so we’re going to go through how I would fix this in Excel, R, and Python. The date format for this data is “dd/mm/yyyy hh:mm.” For Excel, I would do the following:

  1. Add a column to the right of the Created column.
  2. Use the text to Column function to break apart the cell by the space.
  3. Delete the Time Column.
  4. Add 3 columns to the right of the Created column.
  5. Use the text to column function to break apart the column by the forward slash.
  6. In the empty column, use the Date function and specify the year, month, and day.

Now, we should have the proper date format we need and we can check this by applying a filter and looking for the month names and days in the filter options. If you’re quick with the keyboard, this may not be so bad, but many people rely on their mouse, making this a lengthy process.

R

This manipulation can be done quite a bit easier with R and the lubridate package. After reading the csv into R and calling head(), we see that the Created column is a character.

R data head screenshot

First, we need to explain how to convert the data type to dttm (date time) and we’ll use the dmy_hm() function to do so.

data$`Cleaned Date` <- dmy_hm(data$Created)

I’m placing the “Cleaned Date” in its own column so that we can see the change, but you could overwrite the Created column as well.

R data head screenshot

If we want to get this in the exact format as Excel,”yyyy-mm-dd”, we can call as_date() to hack off the time at the end.

data$`Cleaned Date` <- as_date(data$`Cleaned Date`)

R data head screenshot

Note that the data type of Cleaned Date is now date and in just a few lines of code, we’ve converted these dates to the format we want. Here’s a link to the Lubridate GitHub Page so that you can see some of the functions that exist and another link to R for Data Science – Dates and Times Chapter. By the way, R for Data Science is a free book written by Garrett Grolemund and Hadley Wickham.

Python

Using Pandas and Datetime, we can make this fix in just about one line of code. Reading the csv and calling head, we see that we still have the bad date format.

Python data head screenshot

import pandas as pd

 
df = pd.read_csv(“Date Format Data.csv”)
 
df['Cleaned Date'] = pd.to_datetime(df['Created'],format="%d/%m/%Y %H:%M").dt.date

Here, we’ve used pandas.to_datetime to convert to the date time format, the format argument to explain the date format we’re expecting, and then dt.date to keep only the date. If you've got a particularly funky format, check out this strftime cheatsheet to figure out the formatting string to pass. This one line is certainly much easier and faster than the 6 step Excel fix.

Python data head screenshot

While this is a relatively simple example that doesn’t scratch the surface for all of the date formats that you might see, we can see the power of program languages. In my opinion, dealing with date formats is reason enough to begin learning Python or R!

Code Snippets

R

library(tidyverse)
library(lubridate)
 
data <- read_csv("Date Format Data.csv")
 
head(data)
 
data$`Cleaned Date` <- dmy_hm(data$Created)
 
data$`Cleaned Date`<- as_date(data$`Cleaned Date`)

Python

import pandas as pd

 
df = pd.read_csv('Date Format Data.csv')
 
df.head()
 
df['Cleaned Date'] = pd.to_datetime(df['Created'],format="%d/%m/%Y %H:%M").dt.date

Questions, comments, concerns? Please feel free to leave a note below.