# Demystifying Machine Learning, Part II – Learning

*Machines learn and make predictions the same way that humans do; by combining historical data, prior knowledge & statistics to infer the most likely outcome.*

*In Part II of Demystifying Machine Learning, we explore what these terms mean, and build a simple machine learning model that predicts the temperature of a turbine bearing, given the wind speed.*

### The benefits of using Machine Learning in predictive maintenance

First, it is worth reiterating the importance of machine learning in wind energy. Machine Learning is a tool which helps ONYX InSight expert engineers sort through huge quantities of wind turbine data, whether that is SCADA, vibration or oil condition data, and draw out trends – which they can then assess to help solve specific problems to deliver effective predictive maintenance.

ONYX InSight’s predictive maintenance is built on combining engineering expertise with Machine Learning and Artificial Intelligence enhanced data analytics, leading to better fault detection and creating a more accurate picture of wind turbine health. This engineering-driven predictive maintenance helps owner-operators understand their assets more clearly and make significant savings from their O&M budgets.

### How do humans learn?

– What do you think the temperature is outside right now?

– Honestly, take a moment and make a guess.

– Got a number in your head?

– Good.

We’re not actually interested in whether you were right or not, rather, we’re interested in __how you came to that conclusion.__ Take another moment to think*: “How did my brain come to the conclusion that specific number was the best estimate?”.*

This is not an easy thing to think about, and there is rarely a conclusive reason, but do pause to reflect on how your brain processed the information.

Here’s my thought process:

*As I write this, it is mid-winter, which I know means it is cold.**I live in the UK, and our winters are chilly, but rarely much below -5 degrees Celsius.**On my walk to work today, it felt around 2 – 5 degrees outside.**It is now mid-morning, so it will be slightly warmer than my walk to work this morning.*

Given all this information, __I therefore guess-estimated it was about five degrees outside right now__ (It was six degrees in fact, so I was close in my guess!).

—

Let’s pick apart what my brain just did there.

It first gathered both relevant **data** and **prior knowledge** together. In this case my **data** were: *“My walk to work this morning was about 2-5 degrees”,* whilst my **prior knowledge** was: *“In winter it is cold”, “The temperatures are higher towards midday”, *and* “I live in the UK”[1].*

It then used **statistics** to aggregate this information:

- The
**minimum**temperature is -5 degrees - The
**mean**temperature this morning was 3.5 degrees - The
**standard deviation**was about 1.5 degrees - There is a roughly
**linear relationship**between morning temperatures and time, about +0.5 degrees per hour

Therefore, my brain used **statistics**, **historical data** & **prior knowledge** to draw its conclusion.

Wait a second; “historical data, prior knowledge, and statistics?”. Sounds an awful lot like machine learning! Well that’s because that’s precisely what machine learning is.

Even though this snowy landscape with the solitary magpie isn’t real, your brain is able to predict what the temperature is, based on its prior information and the patterns it is seeing in this picture. Thankfully, the UK rarely reaches temperatures this cold. Claude Monet – The Magpie, 1869

### How do machines learn?

The above thought experiment is a good example of what the “learning” part in machine learning is; the use of **statistics**, **historical data**, and prior **knowledge** to make a best guess.

It asks the question “What was the correct answer the last time we were in a similar situation?”.

In other words, what happened the last time the pattern looked like this?

*(The pattern here including the time of year, time of day, geographical location, recent experiences, etc.) *

This may all seem obvious in hindsight, but these concepts are fundamental to learning, whether by machine or human. These concepts apply universally to much of machine learning[2], and if you understand them, you will be well equipped to critically assess the likely usefulness and impact of any machine learning application you come across.

### How hot is this bearing?

Let us apply this idea to a more practical concept: predicting the temperature of a wind turbine bearing.

Our **prior knowledge** can be summarised by the following:

*Bearings allow parts of a device to rotate with as little friction as possible.**We expect this friction to cause a temperature increase.**As the part rotates faster, we expect more friction, and therefore a higher temperature.**Abnormally high temperatures can be a good indicator that the bearing is failing.*

We additionally have **historical data**, in the form of 3 months of bearing temperatures and wind speeds. Let’s plot both below:

*(Top) The bearing temperature as a function of time. (Bottom) The wind speed as a function of time*

Even a cursory glance suggests our prior knowledge was correct; the bearing temperature appears to be (roughly) positively correlated with wind speed.

Finally, we can **use statistics** based on our **prior knowledge** and **historical data** to guess what the temperature of the bearing will be, given a certain wind speed.

The simplest thing we can do with statistics is just to take the mean of our bearing temperature.

*(Blue) The actual bearing temperature as a function of time. (Orange) Our first machine learning prediction, the mean temperature of the bearing over the timespan*

Congratulations! You’ve just had your machine learn from historical data! You’ve made your first machine learning model! But it feels like we can do better than this by including wind speed.

To include the effects of wind speed, I need to assume a model. For simplicity I’m going to assume that there’s a roughly linear relationship between the bearing temperature and wind speed. That is to say:

where T_{Bearing }is the bearing temperature, C_{0} is a constant, V_{Wind} is the wind speed, and C_{1} is a coefficient. We know that the true relationship isn’t linear (for instance wind turbines don’t keep turning faster at faster wind speeds) but it is good to start simple[3].

We know both T_{Bearing } and V_{Wind} at all points in time, so we just have to figure out what the best values of C_{0} and C_{1 }are[4]. Any statistics course or a multitude of online tutorials can tell you how to do this[5]. We plot the results of this model below.

*(Blue) The actual bearing temperature as a function of time. (Orange) Our prediction of the main bearing temperature, as a linear function of wind speed.*

This is starting to look pretty good! We’re only off, on average, by 4^{o}C, which is accurate to within 8-10% of the absolute value of the temperature.

With the addition of a few more variables, and some changes in the model assumptions, we can bring this average error down to only 1.2^{o}C (a 2% error):

*(Blue) The actual bearing temperature as a function of time. (Orange) A more complex prediction, involving numerous tags and a different model, but applying the same broad techniques of prior knowledge, data, and statistics.*

On a fundamental level, we’ve not changed anything we’re trying to do; we’re still just using **prior knowledge,** **historical data, **and **statistics**, to create these models. Whilst the final result is impressive, it’s just been built up layer by layer from the simpler models we started off with.

Perhaps none of this *feels* like machine learning to you. If that’s how you feel, then I have done my job right and demystified machine learning for you! Because this really is all it is; pattern recognition, historical data, statistics and prior knowledge. There’s nothing complex or magic about it, it’s merely a suite of tools you can use to aid in your analysis.

In part III of this blog, we’ll go into some of the practical things you can do with this model, and how it can be used for fault detection.

### REFERENCES:

[1] Of course, one could argue that my prior knowledge was data too. The line between isn’t very clearly defined, but I like to think of prior knowledge as “What do I believe right before I see any data”. In this case, this would be “What would I expect the temperature to be outside *before* have left my house in the morning?”

[2] But not all of machine learning! Particular branches of machine learning don’t need data and/or prior knowledge! We’ll get around to these in some later posts.

[3] When we use neural networks, we don’t need to assume any kind of models. This is because neural networks are universal function approximators; it doesn’t matter what the true relationship is between your variables, a neural network *can *[though not necessarily* will*] accurately approximate it given enough data, and we therefore don’t need to worry about specifying the specific form of the relationship.

[4]Though for statistical reasons, we actually calculate the least-worst values.

[5] https://en.wikipedia.org/wiki/Least_squares for instance.