Situational forecasting of electricity demand in the region

The process of forecasting volumes of electricity sales on the wholesale market is considered. To improve the quality of the forecast, it is proposed to use the method of machine learning Random Forests as part of the solution of the task of situational forecasting of electricity consumption. A comparison of the Random Forests with a simple linear regression is performed. The forecast is based on historical data on electricity consumption in Ukraine, as well as changes in cost per hour of consumption and a number of key factors. Forecasting takes into account weather conditions, macro – financial and economic characteristics. When the software was implemented, the library used includes the implementation of the prognostic algorithms Spark MLlib, which specializes in machine learning methods. Training samples were created based on historical data found in different open sources. In the introduction section the justification of the problem of forecasting demand for electricity and the impossibility of taking into account all factors affecting the environment when using standard approaches is made. In the results section, a number of indicators have been calculated capable of determining the accuracy of the forecast: the mean square error, the mean relative error and the absolute error. In the mathematics section, a description and analysis of the Random Forests algorithm was given. The graphs were built showing the results of the forecast in different time periods: one day, one week, one year. The results were compared with the original historical data. Added tables that show the input data and the results obtained using linear regression and the algorithm of machine learning Random Forests. In conclusion, conclusions were drawn about the effectiveness of the algorithm Random Forests, as well as a possible problem when working with machine learning algorithms.


Introduction.
Operational planning and effective management of the electricity system (ES) are impossible without reliable forecast of the load in the nodes of the calculation scheme, conducted in advance within a few minutes to several days.The forecast of nodal loads is necessary for optimization and correction of current modes, consideration of operational dispatch requests related to the withdrawal of electric equipment for repair, for examinations, and others.A particular importance is given to this problem in connection with the transition to a competitive electricity market.Load nodes is considered consumption of active and reactive power in the substitution circuit of the electrical network.The load at the node for each time period is determined by the loadings of the set of receivers of the electrical network connected to this EC node, and the loss of power when Вісник Національного технічного університету «ХПІ».Серія: Системний аналіз, управління та інформаційні технології, № 21 (1297) 2018 29 transmitting electricity to this node [1].The need for qualitative prediction is a result of technological and economic reasons [2].Technological reasons are connected with the key role of forecasting in the processes of planning energy balances and power system capacity, determined by the modes of parameters: technical and economic indicators of electricity consuming objects and calculations of electrical loads in power nodes and sections.
Forecasting of future indicators of electric consumption allows achieving the most important principle of forming a reliable and efficient operation of the Unified Energy System ensuring a clear system balance of production and consumption of electric energy under the conditions of simultaneous momentum of these processes.Balance electricity production and consumption is the basis of technological stability of the power system, it determines the system reliability and continuity of its operation.In case of violations of the balance conditions, the quality of the electricity deteriorates (changing the parameters of the frequency and voltage of the network), which reduces the efficiency of the work of the consumer equipment of end users.
There is a number of economic reasons that necessitate qualitative forecasting.Accurate calculations provide an optimal distribution of the load between power plants in the grid and improve the quality of electricity.Forecasting and planning of electricity consumption indicators for large consumers allows managing the cost of purchasing electricity through the regulation of equipment loading with the help of production processes management, transforming the main volumes of electricity consumption in the hours with the lowest cost, thereby reducing the cost of production and the amount of payments to energy supplying organizations.
A particular relevance the forecasting task was acquired after the emergence of the wholesale electricity market, the rules of which require the need for accurate forecasting of the electricity volume in order to make its purchase in the wholesale market [3].
In accordance with these rules, market participants should make a forecast of their planned hourly consumption and the day before the operational one, to submit an hourly request for electricity consumption to the system administrator for every hour of operating days.It means that all market players should fulfill their electricity consumption forecasts, as well as the hourly discretion for a few days ahead.Substandard submission by the enterprise of a participant in applications for electricity can lead to significant economic losses.Mistakes in forecasting reduce the quality of management of the electricity supply and worsen the cost-effectiveness of its complex regimes.In case of deviation of the actual hourly rates from the forecast, the participants are imposed with penalties that increase the cost of purchased electricity.This is due to the feature of electricity as a commodity.Under estimation of the forecast leads to the need to use emergency power.Exaggeration of the forecast leads to an increase in the cost of maintaining reserve power in the operating state.For the consumer that acquires electricity on the wholesale and (or) retail market, there is the problem of accurate application power consumption for some time ahead.This is due to the fact that excessive or inadequate electricity consumption booked by the previous applications of the enterprise-user leads to unscheduled costs of the supplier at the point of generation.Therefore, increasing the accuracy of the forecast even by a tenth of a percent can lead to a significant reduction in the cost of paying deviations from the plan for the supply of electricity [4].It turns out that the task of forecasting is highly relevant for a large number of players operating in the wholesale electricity market: guaranteeing suppliers serving the consumers of entire regions, independent energy sales organizations serving individual industrial enterprises in different regions, and for large economic entities that buy electricity on the wholesale market for their own consumption needs.Therefore, qualitative forecasting of electricity consumption for the subjects of the wholesale market is economically feasible and in today's highly competitive conditions, it receives an increasing relevance.The article describes a detailed description of the Random Forests algorithm and based on historical data for previous years makes a forecast that is compared with the forecast of linear regression and real historical data.
Task setting.One of the distinguishing features of the technological process of making decisions on the planning of electricity supply is the cyclicality (repetition) and the relationship of problems being solved to input and output data.As a result on of the accuracy and reliability of the electricity consumption forecast depends on the accuracy of the solution of the load optimal distribution tasks between the generating capacities, the efficiency of the management of the combined energy system and the management of power consumption and, that is especially important in market conditions, the wholesale price of purchases from the generating companies and, hence, the wholesale price of sales to electricity suppliers at the wholesale electricity market.
When moving to the free trade sector, in addition to winning from participation in competitive bidding, the wholesale market entity takes some risks associated with the impossibility of an exact planning of the application for electricity consumption.Deviations of actual consumption from declared values above a certain percentage values lead to the purchase of electricity from a balancing market at a higher price.Deviation to the smaller side is also punishable by payment of nondelivered electricity, determined by the difference between the declared and actual consumption at the established rates.The forecast is especially responsible for those power systems that do not have their own generating capacities, and there is no possibility to influence the electricity loads of consumers.The complexity of the forecast of electricity consumption is due to the presence of a large number of consumers and the need to take into account many factors affecting the consumption of electricity.This is a ambient temperature, degree of illumination, longitude of the day, day of the week, transitions from winter to summer time and back, the Вісник Національного технічного університету «ХПІ».Серія: Системний 30 аналіз, управління та інформаційні технології, № 21 (1297) 2018 presence of extraordinary events (catastrophes, mass actions), weather forecasts, the state of other factors affecting the change in consumption in accordance with the data obtained from the processing of consumption statistics, planned inclusion/disconnection of energyintensive industries.For the solution of the problem of forecasting power consumption, traditional statistical models (regression models and time series models), models based on expert systems, artificial neural networks (ANN), and machine-driver algorithms can be used.

Ways of solution.
The choice of the optimal method for solving a particular practical problem is a separate and rather difficult task.Up to nowadays in many power-systems in the world, when creating forecasting models, statistical methods of analysis are used: dynamic (time) rows, that are ordered sequences of observations of a process that changes over time.A sufficient number of mathematical stochastic methods of forecasting electric load schedules is known, their practical implementation in order to reduce the error of calculations requires the collection and subsequent use of significant volumes of output retrospective data of real hourly production of electricity.Determination and correction of errors that is maintaining a certain level of account correctness affecting the forecast result requires additional amount of data.
In connection with the transition to the balancing market and the market of bilateral agreements between direct producers and suppliers of electricity to solve the problem of prediction of electrical load in the ES, including one day ahead, can find and use new information technologies that are developed on the basis of the device expert systems, artificial neural networks, cellular automata, machine learning and others.
Mathematics.The decision trees are the greedy algorithm for building the chain if then else for constructing data forecasting models.Allows getting stable solutions that are comparable to SVM [5] and Neurons networks [6], while not using the high computing power required by the previous ones.
The most effective method of improving any algorithms based on deciduous trees was proposed by Leo Breiman and Adele Cutler in 2001 [7,8].They proposed an algorithm called Random Forests.Its main idea is use one tree instead of a whole ensemble of deciduous trees built with a few modified algorithms.At the same time, the essence of the struggle with the problem solving of ineffective selection of features was to use in the process of constructing a tree of some random samples that removes the determinism of tree construction and makes this process stochastic.Let us proceed directly to the description of the algorithm.We give a statement of the problem in the general case.Let us give a set of  power objects, a component of the learning sample and has a set of ( + 1) attributes.The set of first M attributes is denoted by .For a given set  all ( + 1), the attributes are known.For other (new) known elements of the first  attributes, the target ( + 1) attribute is needed to be found.At the same time, the parameter , the number  ≥ 1, parameters  ≤ ,  ≤ , some parameter  (0 <  ≤ 1) and the parameter number of trees in the ensemble  ≥ 1 are given to the input.A generalized prediction algorithm based on Random Forests looks like: Basing the original training set , a random sample with  repetitions is generated.Basing on the generated sample, a decision tree is constructed (in the general case, any algorithm for constructing deciduous trees can be used for this task).Moreover, during the construction of the next node of a tree with  of existing features, on the basis of which it is possible to divide the tree, we choose  random.The partition decision is made on the basis of the best of  ≤  selected characteristics (ie, based on the application of the branching criterion to  signs).
The procedure is repeated  times.The resulting  random trees combine into an ensemble.The classes are defined for the next new element, (its value ( + 1) parameter) using all constructed trees and choose the resulting class for the one that "voted" by the most trees.Each tree is constructed using the following algorithm in fig.1:  It is given the number  of input variables to be used to determine the decision at a node of the tree; m should be much less than . Choose a training set for this tree by choosing  times with replacement from all  available training cases (i.e.take a bootstrap sample).Use the rest of the cases to estimate the error of the tree, by predicting their classes. For each node of the tree, randomly choose  variables on which to base the decision at that node.Calculate the best split [9] based on these  variables in the training set. Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).For prediction, a new sample is shifted down the tree.It is assigned the label of the training sample in the terminal node it ends up in.This procedure is iterated over all trees in the ensemble and the average vote of all trees is reported as Random Forests prediction.
Consider the complexity of the proposed algorithm based on the random forests.We will consider the variant of the algorithm's operation with the following parameters:  = 1,  = .With these values of parameters, the computational complexity of the algorithm will be greatest.Then the complexity of constructing a single tree will coincide with the complexity of constructing the original deciduous tree without modifications.In this case the complexity of the entire algorithm for constructing a random forests will be determined as follows: This complexity can be reduced if you do not build the entire forest at once and engage in the construction of only those branches in each tree from the forest, which are needed to predict this particular element.Taking as a basis the proposed adaptive algorithm for constructing trees in a random forests, we asymptotically reduce the complexity to the following value: where  is the number of elements that need to be predicted.We call the received modification of the "Random Forests" algorithm in application to the task of forecasting the time series of the adaptive algorithm "Random Forests".
Methods.The method described here is simple, a learning sample was created, based on historical data that was taken from different public sources.That training sample was used by the Random Forests algorithm, in our case, we used the Spark MLlib [10] software library to solve the regression problem, it supports different algorithms and Random Forests in particularly.The training sample was presented in LIBSVM [11] format.In this format, each line represents a single set of data with the result and attributes values.
Then simple software program was implemented that able to load training sample and make a prediction using Spark MLlib [10].
After the training sample was loaded, the training was completed, and algorithm could make a prediction based on new LIBSVM data in vector format.The complexity of working with Spark MLlib comes down to find a relevant training sample.

Results.
Overall the results presented below show that forecasting of electric energy consumption includes a number of factors among that a considerable importance takes weather conditions: temperature, humidity, wind speed, and others.Weather indicators are essential factors that influence the consumption of electricity, during the analysis of historical data, it was concluded that at low temperatures and high wind speed, electricity consumption figures are increasing.Thus, indicators for winter periods exceed the rates for the warmer ones: in the fall or in the summer season.The main task when working with the algorithms of machine learning is the collection of historical data.During the computation of the results the following data were collected.The weather data was taken from the site [12], and data on electricity consumption were obtained from site [13] in the interval of 3 years 2006-2008, based on which the learning algorithm was made.It is also worth mentioning that weather conditions are far from all indicators that affect consumption, but weather is a significant criterion in forecasting demand for electricity in the region.An important factor is the large amount of historical data, so that the algorithm can do the training.
To test the Random Forests algorithm a sample in the LIBSVM data format was created.
In tab. 1 contains data for the weekly chart.For a monthly forecast, the data has a similar structure to which month and year are added.Also, the table shows the forecast data using linear regression to compare the Random Forests algorithm with standard forecasting approaches.
From the graphs in fig. 2 and fig. 3 the results of the forecast have rather good performance.To ensure this, the mean squared error (MSE) for the given formula is calculated: where the number of forecasts; △the difference between historical data and predicted.
The average absolute value of the error is calculated as: Average relative error (approximation error) is calculated as: where  historical value,   *predicted value the number of forecasts.
Вісник Національного технічного університету «ХПІ».Серія: Системний 32 аналіз, управління та інформаційні технології, № 21 (1297) 2018 From the graphs in fig. 2 and fig. 3 the results of the forecast have rather good performance.To ensure this, the mean squared error (MSE) for the given formula is calculated: forecast data using linear regression to compare the Random Forests (RF) algorithm with standard forecasting approaches (REG) using formula (1).
To estimate the accuracy of the Random Forests algorithm, we calculate MSE for regression from tab. 1: In tab. 2 the test data for forecasting the cost of electricity in the wholesale market is presented.
For the forecast a number of indicators influencing the price characteristics were selected: the cost of raw materials: natural gas and coal were taken from the site [14,15], the dollar exchange rate for the current month was obtained from the site [16] as well as the wholesale average monthly price per hourly segments for one month was taken from the site [13].Also, there are a few essential global macroeconomic parameters as Gross domestic product [17], GDP per capita [18] and foreign investments [19].
Figure 3 shows the forecast made by the Random Forests algorithm based on the data obtained from years 2006-2008.The forecast was drawn up for a period of 10 months.Axis Xmonths of the year.Axis Ydemand for energy in MW*hour.
Figure 4 shows the forecast of the calculation of the wholesale price for each hour (kop / kW*hour), predicted by the Random Forests algorithm based on the data obtained in the period 2006-2008 years.Axis Xhours of day.Axis Ythe amount on the wholesale market.Consequently, the mean square error, relative average error and absolute average error in the Random Forests algorithm is less than in the linear regression algorithm and can be verified that the Random Forests algorithm has given more precise metrics.
The MSE of the regression algorithm for tab.We have again  RF <  REG .
The relative average and absolute mean errors for Linear Regression were calculated by formulas ( 2  According to the results, the Random Forests algorithm and in the forecast for the wholesale price gave better performance than regression.
Conclusions.This work has been attempted to analyze of the one of the of Machine learning algorithms, which is used to predict the demand for electricity.
Prediction of Electricity demand is a complex task, a lot of factors play a key role, which are very difficult to consider.The Random Forests algorithm is really well suited for solving prediction problems.The main difficulty when working with the algorithm is to search for relevant historical data that will affect the course of the forecast.The calculated results showed that the standard prediction algorithms based on regression gave less accuracy in comparison with Random Forests algorithm.However, in order to improve the accuracy of the algorithm, a large statistic of historical data were required, based on which the algorithm will carry out training.It should also be noted that all algorithms of machine learning have the so-called shortcoming as a retraining, when some data may become irrelevant to reality and will only worsen the accuracy of the forecast.Here you should promptly detect such and remove them from a training set.

Figure 2 -Figure 3 -
Figure 2-Results of the Random Forests algorithm for a weekly forecast

Figure 4 -
Figure 4 -Forecast of the Random Forests algorithm to determine the price per day