Indexed on: 30 Mar '17Published on: 30 Mar '17Published in: arXiv - Computer Science - Computers and Society
In this paper, we show how using publicly available data streams and machine learning algorithms one can develop practical data driven services with no input from domain experts as a form of prior knowledge. We report the initial steps toward development of a real estate portal in Switzerland. Based on continuous web crawling of publicly available real estate advertisements and using building data from Open Street Map, we developed a system, where we roughly estimate the rental and sale price indexes of 1.7 million buildings across the country. In addition to these rough estimates, we developed a web based API for accurate automated valuation of rental prices of individual properties and spatial sensitivity analysis of rental market. We tested several established function approximation methods against the test data to check the quality of the rental price estimations and based on our experiments, Random Forest gives very reasonable results with the median absolute relative error of 6.57 percent, which is comparable with the state of the art in the industry. We argue that while recently there have been successful cases of real estate portals, which are based on Big Data, majority of the existing solutions are expensive, limited to certain users and mostly with non-transparent underlying systems. As an alternative we discuss, how using the crawled data sets and other open data sets provided from different institutes it is easily possible to develop data driven services for spatial and temporal sensitivity analysis in the real estate market to be used for different stakeholders. We believe that this kind of digital literacy can disrupt many other existing business concepts across many domains.