Multivariate Linear Regression Prediction Model
Motivation
For our model on price prediction for ebay auction time series data, we drew inspiration from the paper "An Integrated Model for Whether, Who, When and How Much in Internet Auctions" by Park and Bradlow. While we could have followed the modeling in that paper exactly, we have realized that the prediction of the final price would then only be possible through first determining full dynamics of the auction, which we did not need. At the same time, the underlying parameter guiding the consumer behavior and driving the modeling process was found by linear regression on the auction's features, so we have decided to try to apply a similar regression to predict the final price directly. Since we had fewer features available in our dataset, we have decided that we would also look at the pricing dynamics at and before 9/10 of the auction time is up, well before the flurry of bidding right near the deadline.
Our most optimal linear model uses five features: duration of the auction, square root of car age, square root of car mileage, starting price, and the price right at 9/10 of the auction time.
Below is a visualization of the relationships between each predictor and the end price:
Duration | Square Root Car Age |
Square Root Mileage | Start Price |
9/10 Price | |
Implementation
Goal: Predict the final price of the auction given 90% of the time series bidding data, as well as the auction features that are known before the beginning.
1. Split the data into train set and test set
2. Fit a linear regression based on the training data
3. Make predictions on the test data and compute the error.
Results
Below is a visualization of the predicted end price vs actual end price.
Linear Regression Results | |
The median absolute error is around 30%, mean absolute error is about 55%.
Analysis
Since the error is relatively high, we conclude that it may be worthwhile to look at the overall shape of the bidding dynamics, not just at the prices at certain points of the auction. (see 3-Factor Price Velocity Acceleration Model)
Further work related to this model and/or line of thought include:
1. Coming up with better linear predictors.
2. Constructing individual regressions for each model of car.