top of page

Data Analysis

hour.png

Data preprocessing

To visualize the solar energy generation data, all null values in the data are eliminated and the remaining data is displayed in the form of a box plot for each hour of the day.

At first sight there seems to be some outliers for the hours 3:00 and 19:00 to 21:00, when no solar power is generated for the majority of time. Upon further investigation, it was found that the data at 3:00 and most of the data at 19:00 were in fact not outlying, as there is a small amount of solar energy generated at this time every summer. However, on the 18th of May 2019, the output during 19:00 to 21:00 reached up to 1.716 kWh, which is a definite outlier. 

Find outliers

outliner2.png

At 20-21 o'clock, except the solar energy on May 28, 2019, which is about 1417-1521W, all other values are zero.

So there are some abnormalities

At three o'clock, every summer there is a maximum of about 250W of solar generation.

Therefore, there is no abnormality at this moment

At 19:00, there is always some power generated in the summer, so most are not outliers, apart from the data on May 28, 2019.

Data Demonstration

This figure displays the daily generation data as a time series, after removing the outliers. The shape of the graph matches with intuition, as the solar energy generated is strongest in the summer and weakest in the winter. It is also evident that the maximum value of solar energy generated increased every year, with the most obvious jump from 2018 to 2019.

solar.png
solar3d.png

June 2019 was chosen to visualize the change in solar energy over the day, which produced the most solar energy over the whole time-period. A portion of the data is shown in this figure. This set is also chosen as the training set for short-term prediction.

Correlation coefficient

This figure shows the correlation between all factors, with darker blue colours corresponding to a high positive correlation. The correlation between solar generation and direct radiation, diffuse radiation and temperature are 0.93, 0.75 and 0.56 respectively, showing that direct radiation is an essential component in predicting solar power output. All three proposed variables will be used as inputs for our models, as they all have a correlation higher than 0.5 with solar power.

correlation.png
bottom of page