Data Engineering Project Scenario
In
this post, I will share a real-time project scenario
designed for practice purposes.
Sample
Json file format
[
{
"author":
"xyz",
"file_name":
"customer",
"file type":
"csv",
"table":
"customer"
}
]
File process scenarios
1. The
user uploads a CSV, Excel, or text file through the frontend application.
2. The
file is then stored in an S3 bucket.
3. A
Kafka message is generated in JSON format containing the relevant file
information.
4. Once
the Kafka message is produced, the ingestion process begins, reading data from
the JSON details and loading it into the target table.
5. Please
read the JSON file to access its content. Then,
6. read
the CSV file from the specified S3 bucket path (for practice, use a local
folder).
7. If
the file type is CSV, XLS, or TXT, process the file accordingly. For CSV
files, ingest the data into the related table.
8. The
same approach applies for XLS and TXT files.
9. If
the file, file type, or table information is missing, display an error message
10.
If the file contains different dates
in formats other than YYYY-MM-DD, convert these date fields to the YYYY-MM-DD format
during data loading into the target table.
File validation
1. Ensure
all sheet names are in lowercase only.
2. If
a sheet name contains any special characters, numbers, or double spaces,
throw an error.
3. In
an Excel workbook with multiple sheets, all sheets should be in the same
order (e.g., 1, 2, 3).
4. Verify
that the columns are in the correct order.
5. If
any column contains data that doesn't match the defined data type, throw
an error and do not load the data.
0 Comments
Thanks for your message.