Applied Modeling Techniques and Data Analysis 2. Группа авторов
them to the tax claim. More formally, both of the ratios and are computed. Then, the minimum between these two ratios and 1 is taken. That is, the variable Z value, which thus ranges from 0 to 1.
Now, for both tax claim (TC) and Z, we calculate the 25th percentile (Q1), the median value (Q2) and the 75th percentile (Q3). We then state that a taxpayer may be considered interesting if he satisfies one of the following conditions:
The three above-mentioned rules can be represented as in Figure 1.3.
Figure 1.3. Determining interesting and not interesting taxpayers. For a color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
Once the population of our dataset is entirely divided into interesting and not interesting taxpayers, we can see from Table 1.1 that the interesting ones are far more profitable than the others (tax claim values are in thousands of euros). A machine learning tool able to distinguish these two kinds of taxpayers fairly well would then be very useful.
Our first model task will then be that of identifying, with a certain confidence degree, the taxpayers who are more likely to have evaded (both in absolute terms and as a percentage of revenues or turnover).
The literature on tax fraud detection, although using different methods and algorithms, is usually only concerned about this issue, i.e. in finding the best way to identify the most relevant cases of tax evasion (Bonchi et al. 1999; Wu et al. 2012; Gonzalez and J.D. Velasquez 2013; de Roux et al. 2018).
There is another crucial issue that has to be taken into account, i.e. the effective tax authorities’ ability to collect the tax debt arising from the tax notices sent to all of the unfaithful taxpayers. Table 1.1. Tax claim, interesting and not interesting taxpayers
Table 1.1. Tax claim, interesting and not interesting taxpayers
Not interesting | Interesting | |||||
Tax claim | Num | Total tax claim | Average | Num | Total tax claim | Average |
[0 - 1] | 736 | 322 | 0.44 | 0 | 0 | 0.00 |
[1 - 2] | 631 | 942 | 1.49 | 0 | 0 | 0.00 |
[2 - 5] | 1,607 | 5,409 | 3.37 | 138 | 563 | 4.08 |
[5 - 10] | 1,127 | 7,727 | 6.86 | 517 | 4,157 | 8.04 |
[10 - 20] | 446 | 5,911 | 13.25 | 902 | 13,139 | 14.57 |
[20 - 50] | 0 | 0 | 0.00 | 1,164 | 36,056 | 30.98 |
[50 - 100] | 0 | 0 | 0.00 | 433 | 30,055 | 69.41 |
[100+] | 0 | 0 | 0.00 | 327 | 101,987 | 311.89 |
Total | 4,547 | 20,311 | 4.47 | 3,481 | 185,957 | 53.42 |
1.2.3. Enforced tax recovery proceedings
What happens if a taxpayer does not spontaneously pay the additional tax amount he is charged? Well, after a while, coercive collection procedures will be deployed by the tax authorities. However, as we have seen above, these procedures are highly ineffective, as they only collect about the 5% of the overall credits claimed against the audited taxpayers.
Indeed, data shows that coercive procedures take place in almost 40% of cases, although its distribution is not uniform: they are more frequent if the tax bill is high, as reported in Table 1.2 (again, tax claim values are in thousands of euros).
Table 1.2. Number of coercive procedures per tax claim interval
Tax claim | Coercive procedures | Total | |
No | Yes | ||
[0 - 1] | 578 | 158 | 736 |
[1 - 2] | 476 | 155 | 631 |
[2 - 5] | 1,268 | 477 | 1,745 |
[5 - 10] | 1,072 | 572 | 1,644 |
[10 - 20] | 745 |
|