Applied Modeling Techniques and Data Analysis 2. Группа авторов
rel="nofollow" href="#ulink_40aefda4-1a07-53d8-8aff-90f242fc3b5d">Figure 1.6. First model statistics and confusion matrix
In terms of tax claim amounts, the model appears to perform quite well, since the selected taxpayers’ average due additional taxes amounts to € 49,094, whereas the average on the entire test set is equal to € 22,339.
So far, we have shown that our model, on average, is able to distinguish serious tax evasion phenomena from the less significant ones. But what about the tax collection issue? To deal with this matter, we should investigate what kind of taxpayers we have just selected. For this purpose, Table 1.3 shows that the majority of the taxpayers, the model would select, would also be subject to coercive procedures (as we can see, the sum of the values of each column is 100%).
Table 1.3. Predicted values versus actual coercive procedures
Pred Interesting Not Interesting | ||
Act | ||
Procedure | 70.12% | 32.24% |
No procedure | 29.88% | 67.76% |
Thus, many of the selected taxpayers have a debt payment issue. This jeopardizes the overall selection process efficiency and effectiveness. As pointed out by the Italian Court of Auditors, coercive procedures, on average, are able to collect only about 5% of the overall claimed credits.
To evaluate the problem extent, we can replace the actual tax claim value corresponding to the problematic taxpayers with the estimated collectable tax, which is equal to the tax claim multiplied by a discount factor of 95%, and compare the two scenarios, as in Figures 1.7 and 1.8, where we depict both the total tax claim and the average tax claim arising from the taxpayers’ notices in the entire test set.
Figure 1.7. Total tax claim and discounted tax claim. For a color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
Taxpayers are ordered, from left to right, according to their probability of being interesting, as calculated by our model. Figure 1.7, for instance, depicts the cumulative tax claim charged up to a certain taxpayer: the red line values refer to the additional taxes requested with the tax notices, while the black line is drawn by considering the discounted values. The dashed vertical line indicates the levels corresponding to the last selected taxpayer according to the model (in our case, the 415th). Recall that when associating a class label with a record, the model also provides a probability, which highlights how confident the model is about its own prediction. Therefore, to a certain extent, it sets a ranking among taxpayers, which we can exploit to draw Figures 1.7 and 1.8. As we can easily observe, the overall tax claim charged to the selected taxpayers plummets from € 20 million to € 5 million, and the average tax claim, depicted in Figure 1.8, from € 49,000 to € 12,000. Thus, the selection process, which relied on our data mining model and at first sight seemed to be very efficient, shows some important flaws that we need to face. In fact, tax collectability is not adequately guaranteed.
Figure 1.8. Average total tax claim and discounted tax claim. For a color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
A second model may then help us by predicting which taxpayers would not be subject to coercive procedures, by focusing on a set of features concerning their assets.
Again, with a precision rate of about 80%, as shown in Figure 1.9, the model appears to be successful.
Figure 1.9. Second model statistics and confusion matrix
Table 1.4. Predicted coercive procedures versus actual interesting taxpayers
Pred Procedure No Procedure | ||
Act | ||
Interesting | 46.94% | 32.73% |
Not interesting | 53.06% | 67.27% |
This second model could be useful on our end, even though it may have some caveats. First, most of the taxpayers that the model classifies as people that will not face a coercive procedure are also not interesting, as shown in Table 1.4. Again, the sum of the values of each column is 100%.
In fact, this second model’s performance in terms of tax claim appears to have worsened with respect to the first, since the no procedure taxpayers’ average due additional tax, calculated on the first 415 taxpayers (according to the ranking set by this model, which is, obviously, dramatically different from the one set by the first model we have seen), is equal to € 20,388. However, the average collectable tax claim is equal to € 13,493, which is a little bit better than the one we have seen before.
We point out that throughout this chapter, we have compared sets of selected taxpayers with the same cardinality, for two kinds of considerations: first, tax authorities, reasonably, have a fixed budget of audits to perform, so comparisons between models should be done subject to a given number of audits; second, for comparability reasons, since smaller sets tend to perform more (see Figure 1.8, where the average tax claim decreases while the number of selected taxpayers increases).
Therefore, in this second model we have developed, the high rate of not interesting taxpayers, on one hand, causes a drop in the average tax claim (from 49,000 to 20,000), but, on the other, it contributes to the slight enhancement of the discounted average tax claim (from € 12,000 to € 13,000), since only a few of the not interesting taxpayers pass through a coercive procedure. Figure 1.10 compares, for each number of selected taxpayers, the different coercive procedures rates arising