Aidan King

For this project, I analyzed a dataset of 9,000+ Berlin Airbnb listings across 79 variables to build a price prediction model and derive actionable insights for the platform. I started with exploratory data analysis, cleaning the price column, log-transforming the skewed price distribution, imputing missing values with medians, and one-hot encoding categorical variables like room type and district before running correlation analysis and building visualizations across key variables like neighborhood, availability, superhost status, and review scores. From there, I trained three models on a standardized 70/30 train-test split with consistent outlier removal: Multiple Linear Regression as an interpretable baseline, LightGBM as the primary model selected for its speed and scalability advantages at production scale, and XGBoost as a validation model, with LightGBM and XGBoost converging on nearly identical performance and both significantly outperforming regression. The business framing was built around information asymmetry in two-sided marketplaces: hosts price arbitrarily and guests accept or reject without context, so I proposed a dual-sided pricing tool where the same underlying model powers a host-facing price optimizer and a guest-facing price transparency badge, similar to what you see on used car platforms like CarGurus, that surfaces whether a listing is priced fairly relative to comparable properties. The project was built entirely in Python using pandas, scikit-learn, LightGBM, XGBoost, and matplotlib, with all modeling and EDA conducted in Google Colab.

ChatGPT Image Apr 26, 2026, 11_40_24 AM A mock-up of what the host pricing tool could look like

ChatGPT Image Apr 26, 2026, 12_19_58 PM A mock-up of what the "fair-pricing" marker could look like for guests