Custom Essays, Research Papers & Assignment Help Services

Fill the order form details - writing instructions guides, and get your paper done.

Posted: February 28th, 2022

UNCLASSIFIED

1

Paper 1047-2021

SAS® Time Series Analysis & Forecasting

(TSAF) at the Canada Revenue Agency (CRA), with COVID impacts

Jason A. Oliver, Senior Compliance Analyst, Canada Revenue Agency (CRA)

ABSTRACT

It may well be a recurring theme of this year’s SAS Global Forum that we are faced with more pressure to use flexible thinking – not just critical thinking – and when it comes to

time series analysis and forecasting (TSAF) in SAS, it’s all about “rethinking the curve”.

At the Canada Revenue Agency (CRA) Compliance Programs Branch (CPB), we have grappled with reliable forecasting for macro-level tax variables on a month-to-month basis, even before the COVID-19 pandemic hit. But now we face a particularly difficult

challenge. As with many large organizations, it is not easy to foretell what the fallout may be from such a cataclysm.

In setting up SAS to right the trajectory, we must be extra cautious about some of the fallacies in applying TSAF in this context: the lagged effect for tax revenues realized based on audits of the previous tax year, the need to differentiate average tax recovery

per case from sum of tax recovery (month-to-month), realizing that industry sectors are not “one size fits all”, and accounting for relatively temporary effects of staffing re-

orientation in the conversion to a virtual workplace versus the more enduring effects of business disruptions. With SAS Enterprise Miner’s abilities to continuously adjust forecasts, sub-categorize datapoints by tax office or industry sector, and apply lagged

cross-correlation analysis, we are suitably equipped with the right tools and this can provide abstract learnings for other large organizations.

INTRODUCTION

The Canada Revenue Agency (CRA) is Canada’s federal tax administration. As with all tax

jurisdictions, the CRA has been challenged to keep pace with COVID-19 shocks and

manifestations, which began in March 2020 (the last month of our fiscal year).

Fortunately, SAS® Enterprise Miner™ has been an invaluable aid in gauging these impacts.

Enterprise Miner™ includes a highly versatile set of functional nodes for configuring and

processing time series data. It can decompose time series components such as seasonality

and trend, show trend lines and expected forecast within configurable prediction intervals,

and demonstrate complex correlation analyses.

While this has been of great benefit to the CRA in gauging the trajectory of macro-variables

related to tax revenues and auditor performance, the findings of this research paper could

UNCLASSIFIED

2

conceivably be applied in the abstract to large organizations with process-oriented

functions, and not just to other foreign tax jurisdictions.

Let us provide a Glossary of terms to set the stage:

 TSAF: Time Series Analysis & Forecasting.

 TEBA: tax earned by audit, which is the amount of tax collectible that is agreed upon in the course of a taxpayer audit. It is in NPV (Net Present Value).

 TAR: the tax-at-risk, which is the amount that CRA risk assessors arrive at as the precursor to auditing activity.

 C/AR ratio: the ratio of [audit] cases completed, to action requests [submitted]

for assistance. It is a tentative measure of auditor productivity.

 Integras: the tool used by CRA auditors to process cases.

TIME SERIES FUNCTIONAL NODES & SETUP

In SAS® Enterprise Miner™, you have six TSAF nodes in the “Time Series” ribbon; but we’re

only going to use four of them. Below is the Time Series ribbon with the functional nodes in

question:

Figure 1. Time Series Functional Nodes

 TS Data Preparation: this node allows you to specify basic time series properties

including interval, cycle, start/end time, and accumulation (i.e. by total, min or max,

mean, etc.)

o Below, the interval is “automatic”, so we specify “Month” as the interval.

o We can leave the seasonal cycle and start/end time as “Default”, as SAS®

Enterprise Miner™ will auto-determine these parts from the data.

o In our case, the data was pre-accumulated in SAS® Enterprise Guide™ row-

by-row on a per-month basis, so we can leave Accumulation = “Total” (else,

we would have to set it “Average”).

Figure 2. TS Data Preparation node – basic properties

UNCLASSIFIED

3

 TS Decomposition: this node allows you to specify similar basic settings to that of

the TS Data Prep node, but the Number of Periods can be configured, and moreover,

you can configure which Export Components you want to display.

o By default, it will only display “Trend-Cycle” component (=Yes), which is

generally regarded as the most salient one.

o However, in our case, we want to view ALL Components, so we would set that

value to “Yes”.

Figure 3. TS Decomposition node –properties

TS Correlation: this node allows you to set up your TSA for autocorrelation analysis, or

alternatively for CCA (Cross-correlation analysis). When you select one of those methods,

the other one’s properties will be greyed out.

Figure 4. TS Correlation node –properties

Both the TS Correlation and TS Decomposition nodes must be preceded by a TS Data

Preparation node (which occurs right after the source data node).

UNCLASSIFIED

4

TS Exponential Smoothing: this node allows you to conduct forecasting based on your

known data; as such, you would connect it to a TS Data Preparation node, not directly to

your source data node.

 The interval is automatic (which will be month in the case of our pre-accumulated

data), and the accumulation defaults to “Total” (which is OK in our case, for the

same reason).

 SAS will pick what it deems to be the best forecasting method.

 The default selection criterion is MSE, or Mean Squared Error.

 We will see more on the Forecast lead, back, and significance level parameters

during the forecast demonstration in this paper.

Figure 5. TS Exponential Smoothing node –properties

For our initial workspace setup, we can scrutinize on the C/AR (Case to Action Request)

ratio, which as per our glossary is a tentative measure of tax auditor performance. The

initial diagram workspace is called “Aggreg_Integras_27mths”, which runs from January

2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online to March 2020. This is arranged this way for a reason: because it ends on the month

of the COVID shutdown.

Our dataset name is “TSA_AGGREG_SINGLE_LINE_27MTHS”.

So, when I bring this in, I need to set all variables to Role = “Rejected” except a) C/AR ratio

and b) my MONTH (Time ID) variable.

Figure 6. Variable Role selection from data source

UNCLASSIFIED

5

You would set your variables once you bring the data source to your diagram (workspace).

Figure 7. TS Data Source to Diagram flow

NOTE: I do not cover the mechanics behind bringing in a data source, as the principal focus

is on conducting TSAF in SAS® Enterprise Miner™. All we need to be concerned with is that

as Data Sources become available in the top-left menu, we can drag-and-drop them to our

diagram workspace (which are also created by right-clicking ‘Diagrams’ in the left panel).

In examining the TS Data Preparation node, it is fairly simple: we see the known trajectory of the C/AR variable, simply by right-clicking the node  Run  Results.

Figure 8. Time Series Plot, for C/AR ratio variable

We can see that the C/AR ratio has fallen off as of mid-2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online, and continued on a very

gradual downward path. Which means that case auditors are completing disproportionately

less cases to the action requests they submit for help, albeit with a seasonal factor and

some rebounding of the trend-line in March 2020.

So, we can scrutinize on the more specific components of the time series line by using a TS

Decomposition node.

UNCLASSIFIED

6

DECOMPOSITION OF TIME SERIES

In running our TS Decomposition node, and viewing the results, the first one to examine

is the Seasonal Component Plot. When it comes to the C/AR ratio, the seasonal index range

is between a high of about 1.3 down to about 0.75.

Figure 9. Seasonal Component Plot, for C/AR ratio variable

During the months of March and December, we see fairly high seasonality. This is normal

for the time, since the push to complete cases is higher at the end of the CRA fiscal year

(March), and ostensibly at the end of the calendar year, also. Auditors are completing

proportionally more cases vs. the number of action requests they submit to the service

desk. So it is likely that they are fulfilling cases that do not require as many interventions

during those months. Even in March 2020, C/AR still remained high – it was

resilient to the initial COVID effects, due to being a ratio variable and not an absolute

sum variable.

In the decomposed results, we can also examine combinatory components; for instance, the

Trend-Cycle Component Plot:

Figure 10. Trend-Cycle Component Plot, for C/AR ratio variable

UNCLASSIFIED

7

This tells us what we had surmised from the initial data preparation, that the series has

been on a steadily downwards trajectory. Now when it comes to tax-related time series

data, there is no real cycle per se; at best, it is an inherited cycle from world economy

fluctuations. The proper definition of cycle in a TSA context is not the entity’s operational

lifecycle; rather, it refers to the boom-and-bust business cycles which are largely

unpredictable. Ergo, we are mainly concerned about trend here.

Now, if we substitute the Average TEBA (tax earned by audit) variable for C/AR [using the

Data Source node shown in figure 6 earlier], we can see what emerges in our decomposed

time series results.

Figure 11. Paneled Component Plots, TS Decomp. for Avg. TEBA

This time, as per the panel graph at bottom-left, we see that our seasonality index is

broader than that of C/AR ratio; it goes from a high of about 1.8 to a low of ~0.7. This is

largely attributable to the heightened pressures towards fiscal year-end to increase

realization of TEBA, which we see in Feb.-March. At the opposite end, we see rather low

seasonality for May, August, and November.

For the original series plot, bottom-right, the trend continues gradually upwards with

seasonality readily apparent. In the trend-cycle component plot, at top-left, we see that the

trend (with cycle, such as it is) is rising steadily upwards but then reaches a virtual plateau.

The key challenge then, has been to resolve and reconcile the expected forecast as of March

2020 with the new COVID-19 realities.

FORECASTING MACRO TAX VARIABLES

AVERAGE TEBA

We can proceed to evaluate the expected trajectory of the AVG. TEBA variable, on a

monthly interval. Recall that this variable is pre-accumulated at data source.

When we conduct our forecast, we use the TS Exponential Smoothing node.

UNCLASSIFIED

8

Figure 12. TS Exponential Smoothing node in the TSAF diagram

We let SAS® pick the best forecasting method, as well as selection criterion (forecast

measure). In this case, the latter value is the MSE [Mean Squared Error] as you can see at

the bottom of the properties of the node.

Figure 13. Properties of the TS Exponential Smoothing node

For our Significance Level, we set this to 0.5; it governs the blue bracket around the

forecast line, a.k.a. the prediction interval. So it is a confidence band of sorts. The way this

figure works is the opposite of what some of us might know from frequentist confidence

intervals; that is, the lower the “alpha” value, the wider the band (prediction interval) so an

“alpha” of 0.01 would produce a very wide band, and an “alpha” value = 0.99 would be

virtually limited to just the forecast line itself. So we aim in the middle (which actually is

closer to the outline of the trend line, as this figure is more “log-like” in its manifestation).

Figure 14. TEBA_NPV_Mean: forecast line from trend

SAS logically expects the trend will continue upwards (while maintaining seasonality, of

course) due to “series momentum”. Had we began our time series at, say, January 2016: 2024 – Do my homework – Help write my assignment online

rather than Jan. 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online, that momentum might have been more pronounced. The clichés of

UNCLASSIFIED

9

“future behavior is governed by past behavior” and “you can’t know where you’re going,

unless you know where you’ve been” have never been truer. However, enter COVID-19,

and that is a whole new wrench in the gears of the tax-auditing apparatus.

As for the selection of “Best” Forecasting Method: you could try to experiment with

different models – there are eight in all, as per fundamental TSAF science – but I can tell

from the shape of the forecast line that it’s based, appropriately, on the Additive Winters

method1. I ascertained this by running the node with this method selected, and the

resulting graph was identical to “best” method. Unlike the Multiplicative Winters method,

this forecast line is predicated on fairly consistent seasonal “inverted V” shapes in the curve.

If those inverted V shapes became noticeable larger (or smaller), then Multiplicative Winters

would likely be the “best” method that SAS would auto-select.

Figure 15. Available Forecasting Methods, properties of TS Exp. Smoothing node

We see that in the resulting forecast, it predicts ahead exactly 12 months. This is the

difference between the figures of “Forecast Lead” and “Forecast Back” in the properties. We

saw on the previous page that the “Forecast Back” = 6; this acts as our validation partition,

using the last six months of known data (i.e. Oct. 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers to March 2020). So this gets

subtracted from the “Forecast Back” value of 18 to arrive at 12 periods out. Ideally, you

want your “back” [validation] period to be between 20-25% of your known data, which it is

out of 27 months; even when we increase the known months to 30, it will still be 20% of

this.

SUM OF TEBA

When we run a TSAF experiment on the SUM of TEBA – as opposed to its average – we

realize a drastic difference in the scale. Because TEBA is a sum value, not a ratio (i.e.

C/AR, or [Average] TEBA/case), it is simply not as resilient to sudden shocks like COVID-19

– as we will later see when adjusting the forecast based on incremental months (April, May,

June) of known values.

1 The essence of the Winters method is to combine discernible trend with seasonality.

UNCLASSIFIED

10

Figure 16. TEBA SUM Forecast (post-March 2020)

Note that the MSE selection criterion (default) graphs a trend line around the known values

(which are represented by the red dots here). The SUM TEBA for Feb. 2020 is nearly double

what it was for March 2020, as you can see by the relatively large separation of the red dots

from the blue dots (on trendline) for those two months. Yet SAS® “thinks” that the trend

will continue positively, as it is “COVID-agnostic”.

What may also seem shocking to the reader is that the lower limit of the prediction interval

for April 2020 (at ~$674.5M) actually exceeds the actual value for April 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers, which was

slightly below $500 million. It is not until the fall until we see that the midpoint of actual

2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers data approximates the LCL (lower confidence limit) of the forecasted band for Sept.

2020. This is ostensibly due to the “positive momentum” of the time series that I alluded to

earlier.

C/AR RATIO

Next, we switch out the SUM of TEBA for the C/AR ratio, once again. In forecasting a

relatively low continuous ratio variable such as C/AR, the prediction interval can be less

reliable. We have to examine the midpoint distribution. While the midpoint post-March

2020 tends to be at or above the 10.0 line, this is rare for 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers datapoints.

Figure 17. C/AR ratio Forecast

UNCLASSIFIED

11

I used the Mean Relative Abs. Error as the forecast metric (selection criterion), which I

found to be more appropriate. Regardless, what we see in the actuals for the spring of 2020

is a very low C/AR ratio, telling us that case throughput has suffered as a result of the

pandemic AND that Action Requests for help did not decline proportionally; there was still

an apparent high need for action requests.

FORECASTING AVG. HOURS PER CASE

For forecasting average hours per [audit] case, I determined that the more ideal Selection

Criterion was “Median Relative Abs. Error”. No matter what Selection Criterion I used (or

Significance Level), the prediction interval still dipped into the negative range. Sometimes,

this is unavoidable. But then the prediction interval becomes spurious; you can’t have

negative hours. So we tend to just focus on the midpoint values in this situation.

Figure 18. Average hours per case Forecast

We can see that the midpoint goes very subtly upwards for the first few forecasted points

(post-March 2020), then sharply up for summer. As it turns out, this is a fairly good

approximation of the reality, since the Avg. Hours per case during the middle of 2020 is

about 1.5-2.0 times that of the previous year. What is especially pronounced is that the

Average Hours of March 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers were only 6.25, whereas for March 2020, it was 35.44. This

was predicated on an Agency policy-induced change; refer to the link and passage below:

https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits- despite-continued-backlog-?email_access=on In March 2020, the CRA announced that it was suspending the vast majority of audit activity for a

minimum of four weeks, other than audits involving the very largest taxpayers. This suspension meant

that the CRA ceased requests for information relating to existing audits, finalizing existing audits, and

issuing reassessments. Further, deadlines for information or document requests were suspended and no

action was required from taxpayers under audit during this time. This suspension remained in effect until

June 2020, though audits of small and medium businesses did not resume until late fall.

This is also arguably responsible for the “pulse” effect we see in actual Avg. TEBA for July

2020, as per the monthly incremental analysis that comes next.

https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
UNCLASSIFIED

12

INCREMENTAL ALIGNMENT

APRIL 2020, KNOWN VALUES

Now when we add the month of April 2020 to our data (making it 28 mths total), we would

expect the AVG. TEBA actuals for subsequent months to become closer to / within forecast

range. As an example in the graph cross-section that follows, the forecast for September,

October, and December 2020 becomes more within range of later-known actuals, once we

add April 2020 data. However, the July 2020 actual (~$122,000) is still above the forecast

band for this incremental dataset’s forecast. This was likely due to the resumption of

standard large business audit as of June 2020 (see previous page article/passage).

Figure 19. Revised AVG. TEBA forecast, incremental inclusion of APRIL 2020

Again, we typically use the measure of MSE [Mean Squared Error] in gauging efficacy or

proximity of a forecast to actual [values]. See the Appendix tables at the end of this paper

for a breakdown of this analysis, where I illustrate monthly incremental effect on accuracy

of the last six months of the calendar year (i.e. from July to Dec. 2020).

MAY 2020, KNOWN VALUES

Clearly, the addition of April wasn’t enough to right the trajectory of the expanding “COVID

window”. So in continuing our analysis of monthly incremental effect, I added May 2020’s

known data and I changed the forecast significance level from 0.5 to 0.25. But it makes no

difference: July actual is still out of forecast range. We must simply accept that July 2020

Avg. TEBA is an irregular value (~$122K), since July 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online had Avg. TEBA =~$45K, and July

2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers’s Avg. TEBA was ~$57K. It is clear that this is a COVID-adjustment spike.

Figure 20. Revised AVG. TEBA forecast, incremental inclusion of MAY 2020

UNCLASSIFIED

13

We can therefore define July 2020 as a pulse, or a one-time brief event, that caused a

spike in the accumulated time series value for that month. This emphasis on larger

business for audit while suspending SMB audits at the time is further substantiated by the

fact that in July 2020, there was an average of 50.75 hrs per case completed, which is

extremely high. For April, which had a very high Average TEBA of $185.5K, the figure was

52.16 average hours per case.

JUNE 2020, KNOWN VALUES

Predictably, for the addition of June 2020, it didn’t improve the forecast band to include the

actual Avg. TEBA for July. So this strengthens the theory that July’s value was a one-time

event, or pulse, in the time series. It also strengthens the theory that Avg. TEBA was more

resilient to initial COVID-19 transition measures (being a ratio value, in essence). To wit:

observe below that the April-May-June line for the original forecast (left) and actual data

points (right) is just above the $50K line, and follows the same trajectory.

Figure 21. Comparing Q1 of FY2020-21 forecast vs. actual data points

In taking MSE and RMSE (R is “root”) measurements for both the as-of-March and as-of-

June forecasts, we only note a slight improvement (reduction) in that value. Which also

goes to show the resilience of this variable, and the “pulse” nature of July’s spike.

MEASURE / as of MONTH MARCH 2020 JUNE 2020

AVG. TEBA (MSE) $ 954,467,257.64 $ 888,454,004.34

RMSE $ 30,894.45 $ 29,806.95

Table 1. Point-in-time [R]MSE for AVG. TEBA forecast-to-actual: July to Dec. 2020

Refer to the Appendix at the end of this paper for a more detailed month-by-month

breakdown of these calculations.

FALLACY: COMPARING SUM OF TEBA SHIFT TO AVG. TEBA CHANGES

TSAF works best when you accumulate data records by average, not by sum total. If we

tried this exercise using SUM TEBA per month, it would not turn out very well, because sum

totals are immediately impacted by any severe transition, i.e. auditor work re-arrangements

and temporary audit case policy due to COVID-19 fallout as of March 2020.

Evaluating the March 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers-2020 comparison in the following table, the TEBA_SUM and

Case Count have dropped significantly in March 2020, yet the C/AR ratio has augmented.

UNCLASSIFIED

14

Table 2. Year-over-Year March comparison, key macro-variables in TSA

However, as the staffing situation has attempted to stabilize in the intervening months

(April to June 2020), the C/AR ratio has dropped dramatically. (Not shown in above table.)

The same is true for the TEBA/AR pattern.

SUM OF TEBA: DRASTIC CHANGE

We now compare the SUM TEBA forecast as of March 2020 (left image) and that of June

2020 known data points (right image).

Figure 22. Comparison of SUM of TEBA forecast as of March vs. as of June (2020)

For the first image, none of the actuals of the last six months of 2020 fall in the forecast

band. Whereas, for the second image, two of the actuals of the last six months (Oct., Nov.)

fall in the forecast band.

Also observe how some of the accumulated data points in the forecast are more “depressed”

in the latter graph; while there is a discernible peak, it doesn’t quite have the same

buoyancy or upwards momentum as the former graph. (We must keep in mind, though,

that this is still using the MSE method, i.e. taking a line of best fit, where the red dots are

the actual values.)

So, there is little point in using the MSE to gauge efficacy of the monthly adjustment, simply

because the values would be so huge (as opposed to those in the Avg. TEBA MSE).

UNCLASSIFIED

15

ADVERSE IMPACTS AND DELAYED EFFECTS

LATENT EFFECTS OF SHOCKS

We would also expect that lower Avg. TEBA wouldn’t manifest until much later in the fiscal

year 2020-21, due to most of 2020 consisting of past year audits. The graph below covers

known Avg. TEBA trend data points right up to December 2020, the lowest point.

Figure 23. Calendar-year-end (2020) Avg. TEBA; lowest point

This extremely low Average TEBA of ~$32,000 per case could be a harbinger of further

average TEBA decline, but we’d have to observe the last quarter of the fiscal year – January

to March 2020, once available – and validate that theory. (Then we might apply an

intervention to the time series line.)

Incidentally, when it comes to SUM of TEBA with actuals up to Dec. 2020, the forecast trend

line for 2021 is far more credible, showing all datapoints as being well under $1 billion, and

mostly under $500 million.

INTERVENTIONS

As alluded to before, a TSAF exercise may use interventions, if the extreme or irregular

event is known in advance (or shortly thereafter). This is an adjustment to the “regular”

time series, using a “dummy” variable for the period of observation. In this case study,

we’d recommend an intervention for the SUM of TEBA as of March 2020, and possibly for

AVG TEBA as of Dec. 2020. Plus, we might use a “pulse effect” for July 2020. However,

programming an intervention requires SAS® Studio™, which is out of scope for this paper.

Figure 24. Basic denotation of input variables (interventions) by type

Lowest actual in 3 years; Dec. 2020 Avg. TEBA of $32,404

A step would work best as an intervention (for March 2020 and Dec. 2020), since the trend line shift is sudden and sustained; it does not happen gradually then return to baseline.

UNCLASSIFIED

16

TS CORRELATION NODE

AUTOCORRELATION

When we deal with a significant seasonal and/or trend component, we usually find a greater

degree of autocorrelation factor (abbreviated “ACF”). As the name suggests, this is the

tendency of a variable to self-influence. It could also be regarded as momentum, or

“muscle memory”.

In a similar vein, when frontline auditing teams are performing well, some of that

momentum carries over from one period to the next, as they build “muscle memory” and

are better-equipped to deal with more trying scenarios that have [abstract] aspects in

common with recent cases worked on. This presents opportunities for “boilerplate” copying

and pasting of common findings from one case to another, adjusting for specifics, and

accelerating average time to complete as well as garnering more average TEBA per case.

Clearly, during the current COVID-19 climate at this writing, and the embargo of SMB case

audit during the spring 2020 period, we can expect some of that momentum to be adversely

impacted – since auditors were working on more complex large business cases overall. But

first, let us examine a baseline from the years 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online-2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers, below:

Figure 25. ACF Plot, three key tax-related macro-variables (2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online-2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers)

From the three variables plotted above, Est. TAR-AI (tax-at-risk – audit issue) has low ACF,

TEBA has moderately high ACF, and Total [Avg. Case] Hours has very high ACF. To wit: at

lag t=5, TEBA reaches the zero line; but Total Hours is still at ACF=0.45.

By stark contrast, in 2020 (below), the ACF for both Avg. TEBA and Case Hours is very

weak overall. In fact, both drop precipitously at the very outset of 2020, just prior to

COVID-19.

Figure 26. ACF Plot, same macro-variables, for 2020

UNCLASSIFIED

17

CCA – CROSS-CORRELATION ANALYSIS

When we explore lagged effects between risk-related variables – in this case, TAR (tax-at-

risk) and TEBA (tax earned by audit) – we would use a CCA plot. We are also considering

Total Hours (on audit cases) here. The plots below are at t=3 months and t=12 months

out, with the influencing variables on the vertical axis, and the influenced variables on the

X-axis. The color shading is somewhat counterintuitive, whereby red means more positively

cross-correlated, and blue means less so. Again, we set a baseline of expectations using

tax data from 2016: 2024 – Do my homework – Help write my assignment online to 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers (48 months) here.

Figure 27. CCA Map, at time lags 3 and 12, key macro-variables

Note the pronounced difference in CCA factor: for time lag 3, the Estimated TAR has

virtually no effect on TEBA or Total Hours per case (because it’s too close time-wise), but 12

months out (at right) it has a very pronounced effect on total case hours, and a moderate

effect on TEBA (~22%). Also, in the first graph for time lag 3, TEBA highly influences Total

Hours and to a noticeable degree vice-versa too. But when we get to 12 months out, Total

Hours has virtually no lagged effect on TEBA, and vice-versa.

If we repeat the experiment from 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online data up to 2020 (COVID window) data, evaluating

lagged effects of TAR on TEBA for 2020, we find a very different pattern at t=3 and t=12.

For time lag=3, the best we get is ~3% influence; for t=12, it’s absolutely nothing.

Figure 28. CCA Map, at time lags 3 and 12, inclusive of COVID-19 period

UNCLASSIFIED

18

SUBSETTED ANALYSIS

INDUSTRY PROFILING ANALYSIS

Using the same data for CCA, we can subdivide our dataset by industry sector, or NAICS

code. I can set this input to “Cross ID” in the data source’s variables list, then re-run the

flow. From the TS Data Prep node’s Results, right-click in the Time Series Plot and select

Data Options. We’ll pick a NAICS code at random. And you can see that it fell at the outset

of COVID, and struggled to regain its footing – yet exceeding it by calendar year-end.

Figure 29. Industry Profile (NAICS) subsetting of Avg. TEBA in TS Plot (in 2020)

Note that when you have over 100 categorical values – as in the case of NAICS industry

codes here – it will only allow you to select from the first 100. In my opinion and

experience, I prefer SAS VIYA when it comes to subsetting TSA by key categories.

BY TSO (TAX SERVICES OFFICE)

So let us examine a subsetting TSA for an under-100 categorical set. I use the TSO, or Tax

Service Office parameter, so again I set the Case_TSO_ID input to “Cross ID” at the data

source node. Then I re-run the flow and access the Results.

Figure 30. Tax Services Office (TSO) subsetting of Avg. TEBA in TS Plot (in 2020)

By default, this will display all TSO IDs in the Input TS Plot; so I have to right-click the plot

area and select “Data Options” to specify filters (WHERE TSO = 5, 18, or 40). Note that

while all of these TSOs converge at various points, in the month of April we find a very

strange anomaly: TSO 18 has AVG. TEBA =~ $600K, but the other two TSOs have TEBA

just under $10,000. Yet all three of them re-converge later in 2020.

UNCLASSIFIED

19

CONCLUSION

We have seen the power and versatility of SAS® Enterprise Miner™ for conducting TSAF

exercises. It is clear that not all macro-variables in the Canada Revenue Agency exhibit the

same behaviors or resilience at various points in the turbulent COVID-19 period, but a good

deal of this can be attributed to whether they were pure sum variables, or derived ratio-like

variables. Some disruptions – prompting the insertion of intervention effects – were

ostensibly due to policies in place to “take the edge off” more vulnerable business.

Many of us can also take away abstract learnings from this paper, even if such individuals

are not employed in the tax sector – because in the end, it is all about maintaining a certain

buoyancy of the macro-variables that matter most, to the extent possible – these are not

easy times to navigate and we wish those adversely impacted the most clement journey to

a regained prosperity.

REFERENCES

Sarma, Kattamuri S., PhD. Copyright © 2017. Predictive Modeling with SAS® Enterprise

Miner™: Practical Solutions for Business Applications, Third Edition. Cary, NC, USA: SAS

Institute, Inc.

ACKNOWLEDGMENTS

I am grateful to my family for their encouragement on this endeavor. I am also grateful to

the numerous staff of the CRA who were the audience in my internal presentation of this

TSAF subject matter. I also acknowledge and admit defeat to the spell checker in insisting

on the spelling of “endeavor” as it is, not like it ought to be as it is on the space shuttle.

Which, unlike CRA time series, must be expected to follow a known trajectory.

RECOMMENDED READING

 Milhøj, Anders. Practical Time Series Analysis Using SAS®. Copyright © 2013, SAS

Institute Inc., Cary, NC, USA.

 Shumway, Robert H. and Stoffer, David S. Time Series Analysis and its Applications. 4th

ed. © Springer International Publishing AG, 2017, Univ. of California at Davis. Davis,

CA, USA.

 Brocklebank, John C., Dickey, David A, and Choi, Bong S. SAS® for Forecasting Time

Series. 3rd ed. Copyright © 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online, SAS Institute Inc., Cary, NC, USA.

 Svolba, Gerhard. Applying Data Science: Business Case Studies Using SAS®. Copyright

© 2017, SAS Institute Inc., Cary, NC, USA.

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

Jason A. Oliver, Senior Compliance Analyst & Data Scientist

Canada Revenue Agency

Jason.oliver@cra-arc.gc.ca

mailto:Jason.oliver@cra-arc.gc.ca
UNCLASSIFIED

20

APPENDIX: TABLES OF ACTUAL-TO-FORECAST ANALYSIS

This contains detailed breakdowns of the incremental monthly additions of accumulated

data to the COVID-19 observation window.

AVERAGE TEBA

This begins with Average TEBA, being subject to both MSE and RMSE (Mean Squared

Error, and Root Mean Squared Error).

At this juncture, between April and May 2020 known data, the MSE / RMSE actually

regresses slightly, telling us that we might as well have gone straight to June 2020’s data.

In the end, this substantiates our earlier findings, that because Average TEBA is in essence

a ratio variable and more resilient to initial COVID window – especially since it is predicated

UNCLASSIFIED

21

on audits of past year’s tax filings – there was no real near-future benefit to forecast

alignment based on incremental monthly additions for spring.

C/AR RATIO

This, once again, is the Cases [Completed] to Action Requests [Submitted] ratio. Here I

break down the monthly forecast measure, using MSE (no RMSE), of the last six months of

calendar year 2020 and incrementing known months from March up to June. For March to

May, I include the spring months not yet arrived at in each incremental forecast.

UNCLASSIFIED

22

From adding April known data, the forecast actually worsens; this is arguably due to having

been accustomed to high C/AR values for so long. It is not until we add MAY that it becomes

more realistic.

Given this extremely low MSE value, brought on by the actual 2.57 C/AR value of May, we

have reached the optimum point – as evidenced by adding June to known values:

CASE HOURS

Lastly, in speaking to Hours per [audit] case forecast, I provide a condensed analysis using

a simplified MAE [Mean Absolute Error] criterion.

 As of March 2020; forecast of April to Dec. 2020: MAE = 78.52

 As of April 2020; forecast of May to Dec. 2020: MAE = 95.83

 As of May 2020; forecast of June to Dec. 2020: MAE = 107.99

 As of June 2020; forecast of July to Dec. 2020: MAE = 71.51

So, all in all, this proved a very difficult variable to effectively forecast.

Applied Sciences
Architecture and Design
Biology
Business & Finance
Chemistry
Computer Science
Geography
Geology
Education
Engineering
English
Environmental science
Spanish
Government
History
Human Resource Management
Information Systems
Law
Literature
Mathematics
Nursing
Physics
Political Science
Psychology
Reading
Science
Social Science
Home
Homework Answers
Blog
Archive
Tags
Reviews
Contact
twitterfacebook

1

Paper 1047-2021

SAS® Time Series Analysis & Forecasting

(TSAF) at the Canada Revenue Agency (CRA), with COVID impacts

Jason A. Oliver, Senior Compliance Analyst, Canada Revenue Agency (CRA)

ABSTRACT

It may well be a recurring theme of this year’s SAS Global Forum that we are faced with more pressure to use flexible thinking – not just critical thinking – and when it comes to

time series analysis and forecasting (TSAF) in SAS, it’s all about “rethinking the curve”.

At the Canada Revenue Agency (CRA) Compliance Programs Branch (CPB), we have grappled with reliable forecasting for macro-level tax variables on a month-to-month basis, even before the COVID-19 pandemic hit. But now we face a particularly difficult

challenge. As with many large organizations, it is not easy to foretell what the fallout may be from such a cataclysm.

In setting up SAS to right the trajectory, we must be extra cautious about some of the fallacies in applying TSAF in this context: the lagged effect for tax revenues realized based on audits of the previous tax year, the need to differentiate average tax recovery

per case from sum of tax recovery (month-to-month), realizing that industry sectors are not “one size fits all”, and accounting for relatively temporary effects of staffing re-

orientation in the conversion to a virtual workplace versus the more enduring effects of business disruptions. With SAS Enterprise Miner’s abilities to continuously adjust forecasts, sub-categorize datapoints by tax office or industry sector, and apply lagged

cross-correlation analysis, we are suitably equipped with the right tools and this can provide abstract learnings for other large organizations.

INTRODUCTION

The Canada Revenue Agency (CRA) is Canada’s federal tax administration. As with all tax

jurisdictions, the CRA has been challenged to keep pace with COVID-19 shocks and

manifestations, which began in March 2020 (the last month of our fiscal year).

Fortunately, SAS® Enterprise Miner™ has been an invaluable aid in gauging these impacts.

Enterprise Miner™ includes a highly versatile set of functional nodes for configuring and

processing time series data. It can decompose time series components such as seasonality

and trend, show trend lines and expected forecast within configurable prediction intervals,

and demonstrate complex correlation analyses.

While this has been of great benefit to the CRA in gauging the trajectory of macro-variables

related to tax revenues and auditor performance, the findings of this research paper could

UNCLASSIFIED

2

conceivably be applied in the abstract to large organizations with process-oriented

functions, and not just to other foreign tax jurisdictions.

Let us provide a Glossary of terms to set the stage:

 TSAF: Time Series Analysis & Forecasting.

 TEBA: tax earned by audit, which is the amount of tax collectible that is agreed upon in the course of a taxpayer audit. It is in NPV (Net Present Value).

 TAR: the tax-at-risk, which is the amount that CRA risk assessors arrive at as the precursor to auditing activity.

 C/AR ratio: the ratio of [audit] cases completed, to action requests [submitted]

for assistance. It is a tentative measure of auditor productivity.

 Integras: the tool used by CRA auditors to process cases.

TIME SERIES FUNCTIONAL NODES & SETUP

In SAS® Enterprise Miner™, you have six TSAF nodes in the “Time Series” ribbon; but we’re

only going to use four of them. Below is the Time Series ribbon with the functional nodes in

question:

Figure 1. Time Series Functional Nodes

 TS Data Preparation: this node allows you to specify basic time series properties

including interval, cycle, start/end time, and accumulation (i.e. by total, min or max,

mean, etc.)

o Below, the interval is “automatic”, so we specify “Month” as the interval.

o We can leave the seasonal cycle and start/end time as “Default”, as SAS®

Enterprise Miner™ will auto-determine these parts from the data.

o In our case, the data was pre-accumulated in SAS® Enterprise Guide™ row-

by-row on a per-month basis, so we can leave Accumulation = “Total” (else,

we would have to set it “Average”).

Figure 2. TS Data Preparation node – basic properties

UNCLASSIFIED

3

 TS Decomposition: this node allows you to specify similar basic settings to that of

the TS Data Prep node, but the Number of Periods can be configured, and moreover,

you can configure which Export Components you want to display.

o By default, it will only display “Trend-Cycle” component (=Yes), which is

generally regarded as the most salient one.

o However, in our case, we want to view ALL Components, so we would set that

value to “Yes”.

Figure 3. TS Decomposition node –properties

TS Correlation: this node allows you to set up your TSA for autocorrelation analysis, or

alternatively for CCA (Cross-correlation analysis). When you select one of those methods,

the other one’s properties will be greyed out.

Figure 4. TS Correlation node –properties

Both the TS Correlation and TS Decomposition nodes must be preceded by a TS Data

Preparation node (which occurs right after the source data node).

UNCLASSIFIED

4

TS Exponential Smoothing: this node allows you to conduct forecasting based on your

known data; as such, you would connect it to a TS Data Preparation node, not directly to

your source data node.

 The interval is automatic (which will be month in the case of our pre-accumulated

data), and the accumulation defaults to “Total” (which is OK in our case, for the

same reason).

 SAS will pick what it deems to be the best forecasting method.

 The default selection criterion is MSE, or Mean Squared Error.

 We will see more on the Forecast lead, back, and significance level parameters

during the forecast demonstration in this paper.

Figure 5. TS Exponential Smoothing node –properties

For our initial workspace setup, we can scrutinize on the C/AR (Case to Action Request)

ratio, which as per our glossary is a tentative measure of tax auditor performance. The

initial diagram workspace is called “Aggreg_Integras_27mths”, which runs from January

2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online to March 2020. This is arranged this way for a reason: because it ends on the month

of the COVID shutdown.

Our dataset name is “TSA_AGGREG_SINGLE_LINE_27MTHS”.

So, when I bring this in, I need to set all variables to Role = “Rejected” except a) C/AR ratio

and b) my MONTH (Time ID) variable.

Figure 6. Variable Role selection from data source

UNCLASSIFIED

5

You would set your variables once you bring the data source to your diagram (workspace).

Figure 7. TS Data Source to Diagram flow

NOTE: I do not cover the mechanics behind bringing in a data source, as the principal focus

is on conducting TSAF in SAS® Enterprise Miner™. All we need to be concerned with is that

as Data Sources become available in the top-left menu, we can drag-and-drop them to our

diagram workspace (which are also created by right-clicking ‘Diagrams’ in the left panel).

In examining the TS Data Preparation node, it is fairly simple: we see the known trajectory of the C/AR variable, simply by right-clicking the node  Run  Results.

Figure 8. Time Series Plot, for C/AR ratio variable

We can see that the C/AR ratio has fallen off as of mid-2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online, and continued on a very

gradual downward path. Which means that case auditors are completing disproportionately

less cases to the action requests they submit for help, albeit with a seasonal factor and

some rebounding of the trend-line in March 2020.

So, we can scrutinize on the more specific components of the time series line by using a TS

Decomposition node.

UNCLASSIFIED

6

DECOMPOSITION OF TIME SERIES

In running our TS Decomposition node, and viewing the results, the first one to examine

is the Seasonal Component Plot. When it comes to the C/AR ratio, the seasonal index range

is between a high of about 1.3 down to about 0.75.

Figure 9. Seasonal Component Plot, for C/AR ratio variable

During the months of March and December, we see fairly high seasonality. This is normal

for the time, since the push to complete cases is higher at the end of the CRA fiscal year

(March), and ostensibly at the end of the calendar year, also. Auditors are completing

proportionally more cases vs. the number of action requests they submit to the service

desk. So it is likely that they are fulfilling cases that do not require as many interventions

during those months. Even in March 2020, C/AR still remained high – it was

resilient to the initial COVID effects, due to being a ratio variable and not an absolute

sum variable.

In the decomposed results, we can also examine combinatory components; for instance, the

Trend-Cycle Component Plot:

Figure 10. Trend-Cycle Component Plot, for C/AR ratio variable

UNCLASSIFIED

7

This tells us what we had surmised from the initial data preparation, that the series has

been on a steadily downwards trajectory. Now when it comes to tax-related time series

data, there is no real cycle per se; at best, it is an inherited cycle from world economy

fluctuations. The proper definition of cycle in a TSA context is not the entity’s operational

lifecycle; rather, it refers to the boom-and-bust business cycles which are largely

unpredictable. Ergo, we are mainly concerned about trend here.

Now, if we substitute the Average TEBA (tax earned by audit) variable for C/AR [using the

Data Source node shown in figure 6 earlier], we can see what emerges in our decomposed

time series results.

Figure 11. Paneled Component Plots, TS Decomp. for Avg. TEBA

This time, as per the panel graph at bottom-left, we see that our seasonality index is

broader than that of C/AR ratio; it goes from a high of about 1.8 to a low of ~0.7. This is

largely attributable to the heightened pressures towards fiscal year-end to increase

realization of TEBA, which we see in Feb.-March. At the opposite end, we see rather low

seasonality for May, August, and November.

For the original series plot, bottom-right, the trend continues gradually upwards with

seasonality readily apparent. In the trend-cycle component plot, at top-left, we see that the

trend (with cycle, such as it is) is rising steadily upwards but then reaches a virtual plateau.

The key challenge then, has been to resolve and reconcile the expected forecast as of March

2020 with the new COVID-19 realities.

FORECASTING MACRO TAX VARIABLES

AVERAGE TEBA

We can proceed to evaluate the expected trajectory of the AVG. TEBA variable, on a

monthly interval. Recall that this variable is pre-accumulated at data source.

When we conduct our forecast, we use the TS Exponential Smoothing node.

UNCLASSIFIED

8

Figure 12. TS Exponential Smoothing node in the TSAF diagram

We let SAS® pick the best forecasting method, as well as selection criterion (forecast

measure). In this case, the latter value is the MSE [Mean Squared Error] as you can see at

the bottom of the properties of the node.

Figure 13. Properties of the TS Exponential Smoothing node

For our Significance Level, we set this to 0.5; it governs the blue bracket around the

forecast line, a.k.a. the prediction interval. So it is a confidence band of sorts. The way this

figure works is the opposite of what some of us might know from frequentist confidence

intervals; that is, the lower the “alpha” value, the wider the band (prediction interval) so an

“alpha” of 0.01 would produce a very wide band, and an “alpha” value = 0.99 would be

virtually limited to just the forecast line itself. So we aim in the middle (which actually is

closer to the outline of the trend line, as this figure is more “log-like” in its manifestation).

Figure 14. TEBA_NPV_Mean: forecast line from trend

SAS logically expects the trend will continue upwards (while maintaining seasonality, of

course) due to “series momentum”. Had we began our time series at, say, January 2016: 2024 – Do my homework – Help write my assignment online

rather than Jan. 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online, that momentum might have been more pronounced. The clichés of

UNCLASSIFIED

9

“future behavior is governed by past behavior” and “you can’t know where you’re going,

unless you know where you’ve been” have never been truer. However, enter COVID-19,

and that is a whole new wrench in the gears of the tax-auditing apparatus.

As for the selection of “Best” Forecasting Method: you could try to experiment with

different models – there are eight in all, as per fundamental TSAF science – but I can tell

from the shape of the forecast line that it’s based, appropriately, on the Additive Winters

method1. I ascertained this by running the node with this method selected, and the

resulting graph was identical to “best” method. Unlike the Multiplicative Winters method,

this forecast line is predicated on fairly consistent seasonal “inverted V” shapes in the curve.

If those inverted V shapes became noticeable larger (or smaller), then Multiplicative Winters

would likely be the “best” method that SAS would auto-select.

Figure 15. Available Forecasting Methods, properties of TS Exp. Smoothing node

We see that in the resulting forecast, it predicts ahead exactly 12 months. This is the

difference between the figures of “Forecast Lead” and “Forecast Back” in the properties. We

saw on the previous page that the “Forecast Back” = 6; this acts as our validation partition,

using the last six months of known data (i.e. Oct. 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers to March 2020). So this gets

subtracted from the “Forecast Back” value of 18 to arrive at 12 periods out. Ideally, you

want your “back” [validation] period to be between 20-25% of your known data, which it is

out of 27 months; even when we increase the known months to 30, it will still be 20% of

this.

SUM OF TEBA

When we run a TSAF experiment on the SUM of TEBA – as opposed to its average – we

realize a drastic difference in the scale. Because TEBA is a sum value, not a ratio (i.e.

C/AR, or [Average] TEBA/case), it is simply not as resilient to sudden shocks like COVID-19

– as we will later see when adjusting the forecast based on incremental months (April, May,

June) of known values.

1 The essence of the Winters method is to combine discernible trend with seasonality.

UNCLASSIFIED

10

Figure 16. TEBA SUM Forecast (post-March 2020)

Note that the MSE selection criterion (default) graphs a trend line around the known values

(which are represented by the red dots here). The SUM TEBA for Feb. 2020 is nearly double

what it was for March 2020, as you can see by the relatively large separation of the red dots

from the blue dots (on trendline) for those two months. Yet SAS® “thinks” that the trend

will continue positively, as it is “COVID-agnostic”.

What may also seem shocking to the reader is that the lower limit of the prediction interval

for April 2020 (at ~$674.5M) actually exceeds the actual value for April 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers, which was

slightly below $500 million. It is not until the fall until we see that the midpoint of actual

2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers data approximates the LCL (lower confidence limit) of the forecasted band for Sept.

2020. This is ostensibly due to the “positive momentum” of the time series that I alluded to

earlier.

C/AR RATIO

Next, we switch out the SUM of TEBA for the C/AR ratio, once again. In forecasting a

relatively low continuous ratio variable such as C/AR, the prediction interval can be less

reliable. We have to examine the midpoint distribution. While the midpoint post-March

2020 tends to be at or above the 10.0 line, this is rare for 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers datapoints.

Figure 17. C/AR ratio Forecast

UNCLASSIFIED

11

I used the Mean Relative Abs. Error as the forecast metric (selection criterion), which I

found to be more appropriate. Regardless, what we see in the actuals for the spring of 2020

is a very low C/AR ratio, telling us that case throughput has suffered as a result of the

pandemic AND that Action Requests for help did not decline proportionally; there was still

an apparent high need for action requests.

FORECASTING AVG. HOURS PER CASE

For forecasting average hours per [audit] case, I determined that the more ideal Selection

Criterion was “Median Relative Abs. Error”. No matter what Selection Criterion I used (or

Significance Level), the prediction interval still dipped into the negative range. Sometimes,

this is unavoidable. But then the prediction interval becomes spurious; you can’t have

negative hours. So we tend to just focus on the midpoint values in this situation.

Figure 18. Average hours per case Forecast

We can see that the midpoint goes very subtly upwards for the first few forecasted points

(post-March 2020), then sharply up for summer. As it turns out, this is a fairly good

approximation of the reality, since the Avg. Hours per case during the middle of 2020 is

about 1.5-2.0 times that of the previous year. What is especially pronounced is that the

Average Hours of March 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers were only 6.25, whereas for March 2020, it was 35.44. This

was predicated on an Agency policy-induced change; refer to the link and passage below:

https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits- despite-continued-backlog-?email_access=on In March 2020, the CRA announced that it was suspending the vast majority of audit activity for a

minimum of four weeks, other than audits involving the very largest taxpayers. This suspension meant

that the CRA ceased requests for information relating to existing audits, finalizing existing audits, and

issuing reassessments. Further, deadlines for information or document requests were suspended and no

action was required from taxpayers under audit during this time. This suspension remained in effect until

June 2020, though audits of small and medium businesses did not resume until late fall.

This is also arguably responsible for the “pulse” effect we see in actual Avg. TEBA for July

2020, as per the monthly incremental analysis that comes next.

https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
UNCLASSIFIED

12

INCREMENTAL ALIGNMENT

APRIL 2020, KNOWN VALUES

Now when we add the month of April 2020 to our data (making it 28 mths total), we would

expect the AVG. TEBA actuals for subsequent months to become closer to / within forecast

range. As an example in the graph cross-section that follows, the forecast for September,

October, and December 2020 becomes more within range of later-known actuals, once we

add April 2020 data. However, the July 2020 actual (~$122,000) is still above the forecast

band for this incremental dataset’s forecast. This was likely due to the resumption of

standard large business audit as of June 2020 (see previous page article/passage).

Figure 19. Revised AVG. TEBA forecast, incremental inclusion of APRIL 2020

Again, we typically use the measure of MSE [Mean Squared Error] in gauging efficacy or

proximity of a forecast to actual [values]. See the Appendix tables at the end of this paper

for a breakdown of this analysis, where I illustrate monthly incremental effect on accuracy

of the last six months of the calendar year (i.e. from July to Dec. 2020).

MAY 2020, KNOWN VALUES

Clearly, the addition of April wasn’t enough to right the trajectory of the expanding “COVID

window”. So in continuing our analysis of monthly incremental effect, I added May 2020’s

known data and I changed the forecast significance level from 0.5 to 0.25. But it makes no

difference: July actual is still out of forecast range. We must simply accept that July 2020

Avg. TEBA is an irregular value (~$122K), since July 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online had Avg. TEBA =~$45K, and July

2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers’s Avg. TEBA was ~$57K. It is clear that this is a COVID-adjustment spike.

Figure 20. Revised AVG. TEBA forecast, incremental inclusion of MAY 2020

UNCLASSIFIED

13

We can therefore define July 2020 as a pulse, or a one-time brief event, that caused a

spike in the accumulated time series value for that month. This emphasis on larger

business for audit while suspending SMB audits at the time is further substantiated by the

fact that in July 2020, there was an average of 50.75 hrs per case completed, which is

extremely high. For April, which had a very high Average TEBA of $185.5K, the figure was

52.16 average hours per case.

JUNE 2020, KNOWN VALUES

Predictably, for the addition of June 2020, it didn’t improve the forecast band to include the

actual Avg. TEBA for July. So this strengthens the theory that July’s value was a one-time

event, or pulse, in the time series. It also strengthens the theory that Avg. TEBA was more

resilient to initial COVID-19 transition measures (being a ratio value, in essence). To wit:

observe below that the April-May-June line for the original forecast (left) and actual data

points (right) is just above the $50K line, and follows the same trajectory.

Figure 21. Comparing Q1 of FY2020-21 forecast vs. actual data points

In taking MSE and RMSE (R is “root”) measurements for both the as-of-March and as-of-

June forecasts, we only note a slight improvement (reduction) in that value. Which also

goes to show the resilience of this variable, and the “pulse” nature of July’s spike.

MEASURE / as of MONTH MARCH 2020 JUNE 2020

AVG. TEBA (MSE) $ 954,467,257.64 $ 888,454,004.34

RMSE $ 30,894.45 $ 29,806.95

Table 1. Point-in-time [R]MSE for AVG. TEBA forecast-to-actual: July to Dec. 2020

Refer to the Appendix at the end of this paper for a more detailed month-by-month

breakdown of these calculations.

FALLACY: COMPARING SUM OF TEBA SHIFT TO AVG. TEBA CHANGES

TSAF works best when you accumulate data records by average, not by sum total. If we

tried this exercise using SUM TEBA per month, it would not turn out very well, because sum

totals are immediately impacted by any severe transition, i.e. auditor work re-arrangements

and temporary audit case policy due to COVID-19 fallout as of March 2020.

Evaluating the March 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers-2020 comparison in the following table, the TEBA_SUM and

Case Count have dropped significantly in March 2020, yet the C/AR ratio has augmented.

UNCLASSIFIED

14

Table 2. Year-over-Year March comparison, key macro-variables in TSA

However, as the staffing situation has attempted to stabilize in the intervening months

(April to June 2020), the C/AR ratio has dropped dramatically. (Not shown in above table.)

The same is true for the TEBA/AR pattern.

SUM OF TEBA: DRASTIC CHANGE

We now compare the SUM TEBA forecast as of March 2020 (left image) and that of June

2020 known data points (right image).

Figure 22. Comparison of SUM of TEBA forecast as of March vs. as of June (2020)

For the first image, none of the actuals of the last six months of 2020 fall in the forecast

band. Whereas, for the second image, two of the actuals of the last six months (Oct., Nov.)

fall in the forecast band.

Also observe how some of the accumulated data points in the forecast are more “depressed”

in the latter graph; while there is a discernible peak, it doesn’t quite have the same

buoyancy or upwards momentum as the former graph. (We must keep in mind, though,

that this is still using the MSE method, i.e. taking a line of best fit, where the red dots are

the actual values.)

So, there is little point in using the MSE to gauge efficacy of the monthly adjustment, simply

because the values would be so huge (as opposed to those in the Avg. TEBA MSE).

UNCLASSIFIED

15

ADVERSE IMPACTS AND DELAYED EFFECTS

LATENT EFFECTS OF SHOCKS

We would also expect that lower Avg. TEBA wouldn’t manifest until much later in the fiscal

year 2020-21, due to most of 2020 consisting of past year audits. The graph below covers

known Avg. TEBA trend data points right up to December 2020, the lowest point.

Figure 23. Calendar-year-end (2020) Avg. TEBA; lowest point

This extremely low Average TEBA of ~$32,000 per case could be a harbinger of further

average TEBA decline, but we’d have to observe the last quarter of the fiscal year – January

to March 2020, once available – and validate that theory. (Then we might apply an

intervention to the time series line.)

Incidentally, when it comes to SUM of TEBA with actuals up to Dec. 2020, the forecast trend

line for 2021 is far more credible, showing all datapoints as being well under $1 billion, and

mostly under $500 million.

INTERVENTIONS

As alluded to before, a TSAF exercise may use interventions, if the extreme or irregular

event is known in advance (or shortly thereafter). This is an adjustment to the “regular”

time series, using a “dummy” variable for the period of observation. In this case study,

we’d recommend an intervention for the SUM of TEBA as of March 2020, and possibly for

AVG TEBA as of Dec. 2020. Plus, we might use a “pulse effect” for July 2020. However,

programming an intervention requires SAS® Studio™, which is out of scope for this paper.

Figure 24. Basic denotation of input variables (interventions) by type

Lowest actual in 3 years; Dec. 2020 Avg. TEBA of $32,404

A step would work best as an intervention (for March 2020 and Dec. 2020), since the trend line shift is sudden and sustained; it does not happen gradually then return to baseline.

UNCLASSIFIED

16

TS CORRELATION NODE

AUTOCORRELATION

When we deal with a significant seasonal and/or trend component, we usually find a greater

degree of autocorrelation factor (abbreviated “ACF”). As the name suggests, this is the

tendency of a variable to self-influence. It could also be regarded as momentum, or

“muscle memory”.

In a similar vein, when frontline auditing teams are performing well, some of that

momentum carries over from one period to the next, as they build “muscle memory” and

are better-equipped to deal with more trying scenarios that have [abstract] aspects in

common with recent cases worked on. This presents opportunities for “boilerplate” copying

and pasting of common findings from one case to another, adjusting for specifics, and

accelerating average time to complete as well as garnering more average TEBA per case.

Clearly, during the current COVID-19 climate at this writing, and the embargo of SMB case

audit during the spring 2020 period, we can expect some of that momentum to be adversely

impacted – since auditors were working on more complex large business cases overall. But

first, let us examine a baseline from the years 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online-2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers, below:

Figure 25. ACF Plot, three key tax-related macro-variables (2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online-2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers)

From the three variables plotted above, Est. TAR-AI (tax-at-risk – audit issue) has low ACF,

TEBA has moderately high ACF, and Total [Avg. Case] Hours has very high ACF. To wit: at

lag t=5, TEBA reaches the zero line; but Total Hours is still at ACF=0.45.

By stark contrast, in 2020 (below), the ACF for both Avg. TEBA and Case Hours is very

weak overall. In fact, both drop precipitously at the very outset of 2020, just prior to

COVID-19.

Figure 26. ACF Plot, same macro-variables, for 2020

UNCLASSIFIED

17

CCA – CROSS-CORRELATION ANALYSIS

When we explore lagged effects between risk-related variables – in this case, TAR (tax-at-

risk) and TEBA (tax earned by audit) – we would use a CCA plot. We are also considering

Total Hours (on audit cases) here. The plots below are at t=3 months and t=12 months

out, with the influencing variables on the vertical axis, and the influenced variables on the

X-axis. The color shading is somewhat counterintuitive, whereby red means more positively

cross-correlated, and blue means less so. Again, we set a baseline of expectations using

tax data from 2016: 2024 – Do my homework – Help write my assignment online to 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers (48 months) here.

Figure 27. CCA Map, at time lags 3 and 12, key macro-variables

Note the pronounced difference in CCA factor: for time lag 3, the Estimated TAR has

virtually no effect on TEBA or Total Hours per case (because it’s too close time-wise), but 12

months out (at right) it has a very pronounced effect on total case hours, and a moderate

effect on TEBA (~22%). Also, in the first graph for time lag 3, TEBA highly influences Total

Hours and to a noticeable degree vice-versa too. But when we get to 12 months out, Total

Hours has virtually no lagged effect on TEBA, and vice-versa.

If we repeat the experiment from 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online data up to 2020 (COVID window) data, evaluating

lagged effects of TAR on TEBA for 2020, we find a very different pattern at t=3 and t=12.

For time lag=3, the best we get is ~3% influence; for t=12, it’s absolutely nothing.

Figure 28. CCA Map, at time lags 3 and 12, inclusive of COVID-19 period

UNCLASSIFIED

18

SUBSETTED ANALYSIS

INDUSTRY PROFILING ANALYSIS

Using the same data for CCA, we can subdivide our dataset by industry sector, or NAICS

code. I can set this input to “Cross ID” in the data source’s variables list, then re-run the

flow. From the TS Data Prep node’s Results, right-click in the Time Series Plot and select

Data Options. We’ll pick a NAICS code at random. And you can see that it fell at the outset

of COVID, and struggled to regain its footing – yet exceeding it by calendar year-end.

Figure 29. Industry Profile (NAICS) subsetting of Avg. TEBA in TS Plot (in 2020)

Note that when you have over 100 categorical values – as in the case of NAICS industry

codes here – it will only allow you to select from the first 100. In my opinion and

experience, I prefer SAS VIYA when it comes to subsetting TSA by key categories.

BY TSO (TAX SERVICES OFFICE)

So let us examine a subsetting TSA for an under-100 categorical set. I use the TSO, or Tax

Service Office parameter, so again I set the Case_TSO_ID input to “Cross ID” at the data

source node. Then I re-run the flow and access the Results.

Figure 30. Tax Services Office (TSO) subsetting of Avg. TEBA in TS Plot (in 2020)

By default, this will display all TSO IDs in the Input TS Plot; so I have to right-click the plot

area and select “Data Options” to specify filters (WHERE TSO = 5, 18, or 40). Note that

while all of these TSOs converge at various points, in the month of April we find a very

strange anomaly: TSO 18 has AVG. TEBA =~ $600K, but the other two TSOs have TEBA

just under $10,000. Yet all three of them re-converge later in 2020.

UNCLASSIFIED

19

CONCLUSION

We have seen the power and versatility of SAS® Enterprise Miner™ for conducting TSAF

exercises. It is clear that not all macro-variables in the Canada Revenue Agency exhibit the

same behaviors or resilience at various points in the turbulent COVID-19 period, but a good

deal of this can be attributed to whether they were pure sum variables, or derived ratio-like

variables. Some disruptions – prompting the insertion of intervention effects – were

ostensibly due to policies in place to “take the edge off” more vulnerable business.

Many of us can also take away abstract learnings from this paper, even if such individuals

are not employed in the tax sector – because in the end, it is all about maintaining a certain

buoyancy of the macro-variables that matter most, to the extent possible – these are not

easy times to navigate and we wish those adversely impacted the most clement journey to

a regained prosperity.

REFERENCES

Sarma, Kattamuri S., PhD. Copyright © 2017. Predictive Modeling with SAS® Enterprise

Miner™: Practical Solutions for Business Applications, Third Edition. Cary, NC, USA: SAS

Institute, Inc.

ACKNOWLEDGMENTS

I am grateful to my family for their encouragement on this endeavor. I am also grateful to

the numerous staff of the CRA who were the audience in my internal presentation of this

TSAF subject matter. I also acknowledge and admit defeat to the spell checker in insisting

on the spelling of “endeavor” as it is, not like it ought to be as it is on the space shuttle.

Which, unlike CRA time series, must be expected to follow a known trajectory.

RECOMMENDED READING

 Milhøj, Anders. Practical Time Series Analysis Using SAS®. Copyright © 2013, SAS

Institute Inc., Cary, NC, USA.

 Shumway, Robert H. and Stoffer, David S. Time Series Analysis and its Applications. 4th

ed. © Springer International Publishing AG, 2017, Univ. of California at Davis. Davis,

CA, USA.

 Brocklebank, John C., Dickey, David A, and Choi, Bong S. SAS® for Forecasting Time

Series. 3rd ed. Copyright © 2018: 2024 – Write My Essay For Me | Essay Writing Service For Your Papers Online, SAS Institute Inc., Cary, NC, USA.

 Svolba, Gerhard. Applying Data Science: Business Case Studies Using SAS®. Copyright

© 2017, SAS Institute Inc., Cary, NC, USA.

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

Jason A. Oliver, Senior Compliance Analyst & Data Scientist

Canada Revenue Agency

Jason.oliver@cra-arc.gc.ca

mailto:Jason.oliver@cra-arc.gc.ca
UNCLASSIFIED

20

APPENDIX: TABLES OF ACTUAL-TO-FORECAST ANALYSIS

This contains detailed breakdowns of the incremental monthly additions of accumulated

data to the COVID-19 observation window.

AVERAGE TEBA

This begins with Average TEBA, being subject to both MSE and RMSE (Mean Squared

Error, and Root Mean Squared Error).

At this juncture, between April and May 2020 known data, the MSE / RMSE actually

regresses slightly, telling us that we might as well have gone straight to June 2020’s data.

In the end, this substantiates our earlier findings, that because Average TEBA is in essence

a ratio variable and more resilient to initial COVID window – especially since it is predicated

UNCLASSIFIED

21

on audits of past year’s tax filings – there was no real near-future benefit to forecast

alignment based on incremental monthly additions for spring.

C/AR RATIO

This, once again, is the Cases [Completed] to Action Requests [Submitted] ratio. Here I

break down the monthly forecast measure, using MSE (no RMSE), of the last six months of

calendar year 2020 and incrementing known months from March up to June. For March to

May, I include the spring months not yet arrived at in each incremental forecast.

UNCLASSIFIED

22

From adding April known data, the forecast actually worsens; this is arguably due to having

been accustomed to high C/AR values for so long. It is not until we add MAY that it becomes

more realistic.

Given this extremely low MSE value, brought on by the actual 2.57 C/AR value of May, we

have reached the optimum point – as evidenced by adding June to known values:

CASE HOURS

Lastly, in speaking to Hours per [audit] case forecast, I provide a condensed analysis using

a simplified MAE [Mean Absolute Error] criterion.

 As of March 2020; forecast of April to Dec. 2020: MAE = 78.52

 As of April 2020; forecast of May to Dec. 2020: MAE = 95.83

 As of May 2020; forecast of June to Dec. 2020: MAE = 107.99

 As of June 2020; forecast of July to Dec. 2020: MAE = 71.51

So, all in all, this proved a very difficult variable to effectively forecast.

Applied Sciences
Architecture and Design
Biology
Business & Finance
Chemistry
Computer Science
Geography
Geology
Education
Engineering
English
Environmental science
Spanish
Government
History
Human Resource Management
Information Systems
Law
Literature
Mathematics
Nursing
Physics
Political Science
Psychology
Reading
Science
Social Science
Home
Homework Answers
Blog
Archive
Tags
Reviews
Contact
twitterfacebook

Order | Check Discount

Tags: BSN Papers, DNP Assignment, Health Care Essays, Masters Essays, Nurs Essays

Assignment Help For You!

Special Offer! Get 20-25% Off On your Order!

Why choose us

You Want Quality and That’s What We Deliver

Top Skilled Writers

To ensure professionalism, we carefully curate our team by handpicking highly skilled writers and editors, each possessing specialized knowledge in distinct subject areas and a strong background in academic writing. This selection process guarantees that our writers are well-equipped to write on a variety of topics with expertise. Whether it's help writing an essay in nursing, medical, healthcare, management, psychology, and other related subjects, we have the right expert for you. Our diverse team 24/7 ensures that we can meet the specific needs of students across the various learning instututions.

Affordable Prices

The Essay Bishops 'write my paper' online service strives to provide the best writers at the most competitive rates—student-friendly cost, ensuring affordability without compromising on quality. We understand the financial constraints students face and aim to offer exceptional value. Our pricing is both fair and reasonable to college/university students in comparison to other paper writing services in the academic market. This commitment to affordability sets us apart and makes our services accessible to a wider range of students.

100% Plagiarism-Free

Minimal Similarity Index Score on our content. Rest assured, you'll never receive a product with any traces of plagiarism, AI, GenAI, or ChatGPT, as our team is dedicated to ensuring the highest standards of originality. We rigorously scan each final draft before it's sent to you, guaranteeing originality and maintaining our commitment to delivering plagiarism-free content. Your satisfaction and trust are our top priorities.

How it works

When you decide to place an order with Nursing Essays, here is what happens:

Complete the Order Form

You will complete our order form, filling in all of the fields and giving us as much detail as possible.

Assignment of Writer

We analyze your order and match it with a writer who has the unique qualifications to complete it, and he begins from scratch.

Order in Production and Delivered

You and your writer communicate directly during the process, and, once you receive the final draft, you either approve it or ask for revisions.

Giving us Feedback (and other options)

We want to know how your experience went. You can read other clients’ testimonials too. And among many options, you can choose a favorite writer.