Advanced Analytics and Deep Learning Models. Группа авторов
Abstract
For a long time since the very beginning, a continuous paradigm of selling and buying houses/land has continued to exist. The wealth of a man is often determined by the kind of house he/she buys, but this process had multiple people intermediate. However, with the increase in technology, this barter system has also changed a lot. With PropTech being the new upcoming thing to disrupt in the real estate market, using technology to complete the operations has made buying property very simple. It is seen as part of a digital transformation in the real estate industry, focuses on both the technological and psychological changes of the people involved, and could lead to new functions such as transparency, unprecedented data, statistical data, machine learning, blockchain, and sensors that are part of PropTech.
In India, there are number of websites, which collect the data for properties that are to sell, but there are cases where on different sites price vary for the same apartment, and as a result, there is a lot of obscurity [1, 2]. This project uses machine learning to predict house prices. One heuristic data commonly used in the analysis of housing price deficits is the Bangalore city suburban housing data. Recent analysis has found that prices in that database are highly dependent on size and location. To date, basic algorithms such as linear regression can eliminate errors using both internal and local features. The previous function of forecasting housing prices are basis of retrospective analysis and machine learning [6, 7]. A linear regression model and a decision tree model, using vague assumptions. In addition, a multi-dimensional object model with two training items is used to evaluate house prices where something that predicts the “internal” cost of a house is used, and the non-objective component can count neighbors’ preferences. The aim is to solve the problems of relapse where the target variable is the value and the independent variable region. We have used hot code coding in each of our institutions. The business application of this algorithm is that classified websites can directly use this algorithm to predict the values of new properties that are listed by taking variable input and predicting the correct and appropriate value.
Keywords: Machine learning, clustering algorithm, linear regression, LASSO regression, decision tree, support vector machine, random forest regressor
2.1 Introduction
We are in want of a right prediction at the real estate and the housing marketplace discipline. We see a mechanism that runs all through the residence shopping and promoting; buying a house may be a lifetime purpose for maximum of the people. There are lot of individuals making big errors when buying the houses; the majority are shopping for homes from the people they recognise with the aid of seeing the classified ads and everywhere in the grooves coming across the India. One of the not unusual hassles is shopping for the residences, which are too high priced and no longer really worth it [3]. From claiming valuation structures, additional techniques mirror those natures of asset and those conditions that are provided for [8, 9]. The assets would possibly properly, at the manner, alternate in open market underneath many situations and instances; people are unaware about the contemporary conditions and they start losing their cash [10]. The exchange in cost of residences would affect both the common people together with the financial system of country; to avoid such situations, there is a want of rate prediction. Many techniques are to use within the price prediction.
2.2 Literature Review
Statistical fashions have been a method to analyze and are expecting property expenses for a long term. In the work of Fik et al. (2003), a study to explain the housing costs version was carried out with the aid of studying the impact of vicinity capabilities at the property charges [11] (Piazzesi and Schneider; 2009). For those who foresee product costs in a different way, the association can be quite complicated. Price forecasts are number one within the import commercial enterprise quarter. But, forecasting from deliver call for can be complex due to the fact that there may be a consolidation energy alongside the way. A neural programming model wishes to predict inventory price. This gives an overlap between those shares and blessings.
Authors (Selim, 2009) [12] compared a few studies of artificial neural network deflection using 60% of residential price calculations, and a lot of comparisons have been made by estimating the performance of all their comparisons with different education sizes and choosing statistical lengths.
Authors (Wu and Brynjolfsson, 2009) [15] from MIT made an estimate of the way Google searches for global loan and income. The author is well aware about the near encounters between them in the fee of houses and the love for much priced houses. Data taken from net seek manner search queries the use of Google procedures and with the assistance of actual countrywide harmony-information gather each present of states.
The author provides a brief overview of how a random wooded algorithm is use for retrofitting and phase, power boost, and bag loading used as methods. It generates a lot of distinctions, and the difference between lifting and bagging as stated by Liaw et al. (2002) is the successive trees, calculating the weights of the objects and most will take predictions. Throughout the year 2001, Nghiep and Al (2001) proposed a randomized start-up that included fundraising and provided more randomly the entire random planning and postponement process, which is mentioned here in retrospect.
Eric Slone et al. (2014) improved the relationship among the various home factors and the number of residential queries analyzed using a simple linear regression and multiple linear regressions using a standard square method. Home square images have been used as descriptive variables in simple queues, and multi-line retouches include an increase in the measurement of the parcel of land, number of bedrooms, year of construction, and more descriptive.
2.3 Proposed Work
2.3.1 Methodology
There are classified websites where properties are inconsistent in terms of pricing of an apartment, and there are some cases where similar apartments are priced at different price point, and thus, there are a lot of intransparencies. Sometimes, the consumers feels that the pricing is not justified for a particular listed apartment, but there no way to confirm that either. We propose to use three machine learning algorithms: linear regression, LASSO regression, and decision tree algorithm. The tools required for the project are as follows: Python, Sklearn for model building, Jupyter notebook, visual studio code and Pycharm as IDE, Python flask for http server, HTML/CSS/Javascript for UI, Numpy and Pandas for data cleaning, and Matplotlib for data visualization.
2.3.2 Work Flow
Figure 2.1 Flow of work.
2.3.3 The Dataset
The selected dataset has element of the metropolis Bengaluru; it consists of nine columns with contents that is point out under in Table 2.1 and has 13,321 instances. Enforcement of real estate, infidelity in real estate builders inside the city, and actual property sales throughout India in 2017 have dropped by 7%. As an example, for a potential house owner, greater than 9,000 apartments and flats for sale vary between 42 and 52 lakh, and it is observed that more than 7,100 apartments are within the budget 52 to 62 lakh, in step with a property file website Makaan.
Table 2.1 Columns of dataset.
Column name | Description |
---|---|
Area type | The kind of area the flat/plot is in. |
|