Economists and data analysts who work with large administrative datasets and want to improve their data processing workflows.
Welcome back to the mini-series on Cloud Code for applied economists. Today, we're taking off the training wheels.
Today, we'll focus on handling big data in a structured way using Cloud Code, especially with large administrative datasets.
We'll use the HMDA dataset, which contains mortgage origination data. It's publicly available and around 70 gigabytes.
Cloud Code helps integrate better data storage and working practices, improving replication and opening up new research avenues.
We'll start by going to the command line to set up the project and describe the goals for downloading and harmonizing data.
The goal is to download and harmonize the HMDA data from 2007-2024 to study fintech mortgage lenders' geographic expansion.
We'll set up a DuckDB database, a relational database good for managing data relationships and storing metadata efficiently.