skip to Main Content

Demystifying Machine Learning, Part II – Learning

Machines learn and make predictions the same way that humans do; by combining historical data, prior knowledge & statistics to infer the most likely outcome.

In Part II of Demystifying Machine Learning, we explore what these terms mean, and build a simple machine learning model that predicts the temperature of a turbine bearing, given the wind speed.

The benefits of using Machine Learning in predictive maintenance

First, it is worth reiterating the importance of machine learning in wind energy. Machine Learning is a tool which helps ONYX InSight expert engineers sort through huge quantities of wind turbine data, whether that is SCADA, vibration or oil condition data, and draw out trends – which they can then assess to help solve specific problems to deliver effective predictive maintenance.

ONYX InSight’s predictive maintenance is built on combining engineering expertise with Machine Learning and Artificial Intelligence enhanced data analytics, leading to better fault detection and creating a more accurate picture of wind turbine health. This engineering-driven predictive maintenance helps owner-operators understand their assets more clearly and make significant savings from their O&M budgets.

How do humans learn?

– What do you think the temperature is outside right now?

– Honestly, take a moment and make a guess.

– Got a number in your head?

– Good.

We’re not actually interested in whether you were right or not, rather, we’re interested in how you came to that conclusion. Take another moment to think: “How did my brain come to the conclusion that specific number was the best estimate?”.

This is not an easy thing to think about, and there is rarely a conclusive reason, but do pause to reflect on how your brain processed the information.

Here’s my thought process:

  • As I write this, it is mid-winter, which I know means it is cold.
  • I live in the UK, and our winters are chilly, but rarely much below -5 degrees Celsius.
  • On my walk to work today, it felt around 2 – 5 degrees outside.
  • It is now mid-morning, so it will be slightly warmer than my walk to work this morning.

Given all this information, I therefore guess-estimated it was about five degrees outside right now (It was  six degrees in fact, so I was close in my guess!).

Let’s pick apart what my brain just did there.

It first gathered both relevant data and prior knowledge together. In this case my data were: “My walk to work this morning was about 2-5 degrees”, whilst my prior knowledge was: “In winter it is cold”, “The temperatures are higher towards midday”, and “I live in the UK”[1].

It then used statistics to aggregate this information:

  • The minimum temperature is -5 degrees
  • The mean temperature this morning was 3.5 degrees
  • The standard deviation was about 1.5 degrees
  • There is a roughly linear relationship between morning temperatures and time, about +0.5 degrees per hour

Therefore, my brain used statistics, historical data & prior knowledge to draw its conclusion.

Wait a second; “historical data, prior knowledge, and statistics?”. Sounds an awful lot like machine learning! Well that’s because that’s precisely what machine learning is.

Even though this snowy landscape with the solitary magpie isn’t real, your brain is able to predict what the temperature is, based on its prior information and the patterns it is seeing in this picture. Thankfully, the UK rarely reaches temperatures this cold. Claude Monet – The Magpie, 1869

How do machines learn?

The above thought experiment is a good example of what the “learning” part in machine learning is; the use of statistics, historical data, and prior knowledge to make a best guess.

It asks the question “What was the correct answer the last time we were in a similar situation?”.

In other words, what happened the last time the pattern looked like this?
(The pattern here including the time of year, time of day, geographical location, recent experiences, etc.)

This may all seem obvious in hindsight, but these concepts are fundamental to learning, whether by machine or human. These concepts apply universally to much of machine learning[2], and if you understand them, you will be well equipped to critically assess the likely usefulness and impact of any machine learning application you come across.

How hot is this bearing?

Let us apply this idea to a more practical concept: predicting the temperature of a wind turbine bearing.


Our prior knowledge can be summarised by the following:

  • Bearings allow parts of a device to rotate with as little friction as possible.
  • We expect this friction to cause a temperature increase.
  • As the part rotates faster, we expect more friction, and therefore a higher temperature.
  • Abnormally high temperatures can be a good indicator that the bearing is failing.

We additionally have historical data, in the form of 3 months of bearing temperatures and wind speeds. Let’s plot both below:

(Top) The bearing temperature as a function of time. (Bottom) The wind speed as a function of time


Even a cursory glance suggests our prior knowledge was correct; the bearing temperature appears to be (roughly) positively correlated with wind speed.

Finally, we can use statistics based on our prior knowledge and historical data to guess what the temperature of the bearing will be, given a certain wind speed.

The simplest thing we can do with statistics is just to take the mean of our bearing temperature.

(Blue) The actual bearing temperature as a function of time. (Orange) Our first machine learning prediction, the mean temperature of the bearing over the timespan


Congratulations! You’ve just had your machine learn from historical data! You’ve made your first machine learning model! But it feels like we can do better than this by including wind speed.

To include the effects of wind speed, I need to assume a model. For simplicity I’m going to assume that there’s a roughly linear relationship between the bearing temperature and wind speed. That is to say:

where TBearing is the bearing temperature, C­0 is a constant, VWind­ is the wind speed, and C1 is a coefficient. We know that the true relationship isn’t linear (for instance wind turbines don’t keep turning faster at faster wind speeds) but it is good to start simple[3].

We know both TBearing  and VWind­ at all points in time, so we just have to figure out what the best values of C­0 and C1 are[4]. Any statistics course or a multitude of online tutorials can tell you how to do this[5]. We plot the results of this model below.

(Blue) The actual bearing temperature as a function of time. (Orange) Our prediction of the main bearing temperature, as a linear function of wind speed.

This is starting to look pretty good! We’re only off, on average, by 4­­oC, which is accurate to within 8-10% of the absolute value of the temperature.

With the addition of a few more variables, and some changes in the model assumptions, we can bring this average error down to only 1.2oC (a 2% error):

(Blue) The actual bearing temperature as a function of time. (Orange) A more complex prediction, involving numerous tags and a different model, but applying the same broad techniques of prior knowledge, data, and statistics.

On a fundamental level, we’ve not changed anything we’re trying to do; we’re still just using prior knowledge, historical data, and statistics, to create these models. Whilst the final result is impressive, it’s just been built up layer by layer from the simpler models we started off with.

Perhaps none of this feels like machine learning to you. If that’s how you feel, then I have done my job right and demystified machine learning for you! Because this really is all it is; pattern recognition, historical data, statistics and prior knowledge. There’s nothing complex or magic about it, it’s merely a suite of tools you can use to aid in your analysis.

In part III of this blog, we’ll go into some of the practical things you can do with this model, and how it can be used for fault detection.



[1] Of course, one could argue that my prior knowledge was data too. The line between isn’t very clearly defined, but I like to think of prior knowledge as “What do I believe right before I see any data”. In this case, this would be “What would I expect the temperature to be outside before have left my house in the morning?”

[2] But not all of machine learning! Particular branches of machine learning don’t need data and/or prior knowledge! We’ll get around to these in some later posts.

[3] When we use neural networks, we don’t need to assume any kind of models. This is because neural networks are universal function approximators; it doesn’t matter what the true relationship is between your variables, a neural network can [though not necessarily will] accurately approximate it given enough data, and we therefore don’t need to worry about specifying the specific form of the relationship.

[4]Though for statistical reasons, we actually calculate the least-worst values.

[5] for instance.


Back To Top