The corona crisis is a heavy burden for us all, with 10.1 million German people on restricted working hours — a hefty blow to meeting monthly rent demands. Our goal was to investigate the number of households under extreme conditions. What fraction of German households are at risk, and where are they? These are valuable questions to those responsible for the control and safety of the public.
AI by PREA means: mercury. Based on the most advanced database technologies and algorithms mercury sees, analyses and predicts all important scenarios in the real estate industry. With mercury, we are able to detect and evaluate the smallest changes in the real estate industry to ensure maximise future investment chances.
Using current data on rental offers in conjunction with social demographic data of 2019, we designed, trained and applied a model to predict the fraction of households within a given latitude, longitude and radius, that will pay more than 50% of their income on rent. The score is calculated under a given extreme assumption, everyone is now earning 80% of their salary. To calculate those who meet rental bills, we calculate the following: for all reduced income per household, that is 0.6 times income, minus the rent and count the number of those — for each offer — who pay more than 50% of their reduced income on rent. Finally, we divide by the number of afflicted households by total households, yielding a fraction.
We wanted our model to take specific inputs to predict: rent, latitude, longitude and area. This is important as location is an essential feature for mapping, radius to determine the total number of households and rent as a factor to scale the model; higher rent will result in more afflicted. This also ensures the production of a model for public use to determine reduction of troubled households via rent reduction. In the model we train on the center of the residential area and the area calculated from that residential area. The distribution of afflicted households are Tweedie distributed, where a majority of households are not at risk and those at risk, follow a Poisson distribution.
In the model we train on the center of residential areas and the area calculated from that residential areas. The distribution of afflicted households are Tweedie distributed, where a majority of households are not at risk and those at risk, follow a Poisson distribution.
The model in training was a boosted decision tree equipped with a Tweedie objective, which is an ensemble version of tree-based models. Our approach was necessary due to the mixed nature of the data, where normalisation destroys interpretability, trees keep the data intact. Such a model uses intuitive and explainable steps to split based on binary questions: is rent greater than 400, and if yes, is latitude less than 52, then we expect at this location with a rent of 200 to have 5% of households at risk. The added bonus is its function in operations research: the splitting criteria are trained and defined; allowing decisions to be made via a graphical representation known as a tree diagram.
We trained our model on seven locations in Germany: Frankfurt am Main, Cologne, Munich, Berlin, Stuttgart, Hamburg and Dusseldorf.
Additionally, one factor of extremity are calculated, that is, the restricted monthly income fraction, 0.65, 0.7 and 0.8. The metrics for measuring model performance in this case were the R2 score and mean absolute error (MAE). R2 determines the percentage of variance explained via the model; capturing shape and trend. MAE is the distance from tested samples to predicted samples. Over all 7 models with K-fold cross validation, where k = 10, the average values for all 70 predictions were: 0.95 and 0.5% deviation for R2 and MAE respectively. This means, at each prediction, there is a deviation of 0.5%. Additionally, BIC and AIC values were recorded to also confirm model performance from other viable choices.
The drawbacks of such a model is bias: we went under the extreme assumption that everyone is working restricted hours. This is simply not true. We intend to use the model as an “worst case scenario” In the future — if it is needed — collection of up to date employment records can develop a more precise model. Additionally, regional biases affect predictions due to the nature of a tree-based model; west being better than east, and area. The area is also biased to being of circular nature, which just isn’t true. Despite high scores, this model should be used VERY CAREFULLY and not wholeheartedly trusted.
The risk assessment is available for the seven biggest cities in Germany.
Our analysis is also ready for download, see the link below.