One crucial aspect to
consider in data-related projects is the conversion of various date formats
into a single, unique format when processing data from source to target
systems. In such projects, it is not uncommon to encounter diverse date formats
in the source system, but it is essential to maintain a consistent format that
is easily understood by customers.
For instance, imagine
receiving a file with multiple columns, each containing a different date
format. If the customer requires all dates to be converted into a uniform
format during the data processing, the following code can be employed to
achieve this transformation:
import pandas as pd
from datetime import
datetime
def
convert_date(date_str):
try:
#
List possible date formats to handle different formats
date_formats = ["%d/%m/%y",
"%d-%m-%Y","%d-%b-%Y"]
for date_format in date_formats:
try:
return
datetime.strptime(date_str, date_format).strftime("%Y-%m-%d")
except ValueError:
pass
# If none of the formats match, return the
original string
return date_str
except Exception as e:
return date_str
input_csv_file =
'G:\ETL_Automation\data\deptd.csv'
output_csv_file =
'G:\ETL_Automation\data\'output_file.csv'
# Read
the CSV file
df =
pd.read_csv(input_csv_file)
#
Iterate through columns and apply date conversion if it's a date column
for column in df.columns:
df[column] = df[column].apply(convert_date)
# Save
the modified DataFrame to a new CSV file
print(df)
This code demonstrates a
Python implementation using the pandas library. It reads the data file into a
DataFrame object and specifies the desired unique date format using the
'%Y-%m-%d' syntax as an example. Next, it iterates over each date column in the
DataFrame, attempting to convert each date to the unique format. In case a
column cannot be converted due to incompatible formats, an exception is handled
gracefully.
0 Comments
Thanks for your message.