Thu, Sep 3, 2020
Even if machines have done a big part of the heavy lifting for us since the industrial revolution, they still depend on us for their maintenance. As they have that annoying tendency to break from time to time, their conservation becomes essential to keep up with our daily activities.
Now, with the industry 4.0, the internet of things, and the artificial intelligence advent, we are letting a new kind of machines take care of their older counterparts. We make these new transistor-based machines look after their ancestors.
This post will introduce some common strategies on how to do machinery maintenance, with a specific focus on predictive maintenance. We will also cover what role machine learning algorithms play to improve on some of the current solutions.
Let's start with the most simple maintenance strategies and work our way up to the more sophisticated ones.
Most businesses rely on corrective maintenance (i.e., reactive), where the failing parts are replaced once they stop being functional to the system. This ensures parts are used entirely and, therefore, it doesn't waste a component's useful lifetime. This option adds the cost of downtimes, labor, and unscheduled maintenance.
It is the most straightforward maintenance strategy, but the most expensive one, since downtime and unplanned maintenance costs can strongly affect productivity and returns.
To exemplify this case, imagine you go on the highway and your car breaks. You will need to call the mechanic, wait for them to arrive, evaluate the problem, fix it (if it is a simple thing), or get it towed to the shop. You got the most of that piece, but lots of time and money were spent.
A second, more robust approach is preventive maintenance (i.e., periodic), where components are replaced after a given time, regardless of their condition. This approach avoids catastrophic failures and unscheduled downtimes but requires careful consideration when determining a part's useful lifespan and replacements' periodicity.
It is generally a practical approach to avoid failures; however, unnecessary corrective actions can be taken. Sometimes replacing components that could still last longer leads to an increase in the operative costs.
Let's go to the car example again. In this scenario, you'll take the car to the mechanic shop periodically to have it checked and change some parts just because they have reached the manufacturer's mileage or certain months have gone by.
Finally, predictive maintenance aims to optimize the balance between corrective and preventive maintenance by enabling just in time replacement of components. This approach minimizes the cost of unscheduled maintenance and maximizes the component's lifespan, thus getting more value out of a part.
It is based on continuous monitoring of a machine or process integrity, allowing maintenance to be performed only when necessary.
Moreover, it allows the early detection of failures thanks to predictive tools based on historical data with machine learning techniques, integrity factors as analyzing visual aspects like wear or coloration, statistical inference methods, and other engineering approaches.
For the car example, this will be the case when the car's computer indicates that it is time to make a specific revision.
Predictive maintenance is not the easiest solution to implement, but it's benefits are outstanding. If implemented well, these solutions will result in significant cost savings, mainly by maximizing the components' lifespan.
Replacements or machinery service will only take place when it is absolutely necessary, which will unload part of your maintenance team to focus on more exciting tasks (make sure to keep them around in case of emergency, though).
Since this approach measures components' real behavior, it can anticipate failures even in faulty pieces that will not last as long as we expected, something that preventive maintenance wouldn't do.
As a bonus, you will get lots of data about your equipment, which could be used to compare different providers or further optimize your manufacturing processes. Not to mention that it also reduces the ecological impact of your business.
As we mentioned before, rule-based systems are a good starting point. Still, this approach will quickly lead to making adjustments and maintenance of hundreds of rules. If each rule has to be adapted for every machine, you'll need to adjust these values through time, as the environmental conditions can change, and the thresholds defined for today may be invalid in a couple of months. A simple example would be a seasonal change of temperature that can impact the machinery's normal operation values.
In this scenario, machine learning can make your life easier by automatically finding the hidden patterns in your data.
Now we will focus on data collection matters and different ways of using the collected data for various purposes. Finally, we will briefly discuss the use of edge devices for applications of predictive maintenance.
As with any machine learning project, data is a key point. The amount and quality of your data will probably be the main limitation your data scientists will find.
Before we dive into the details of data collection, it is important to remember that you will need to set aside a proper amount of unseen data for testing purposes. This will make it possible to evaluate the predictive models' quality before shipping them to a production setting.
The first question to answer is what kind of data should be gathered. Luckily this is one: gather everything you can. Always try to collect everything that seems remotely relevant. You will probably start analyzing just a subset of all your data down the road, but having all of it available will help you make better decisions. As it is often said, "Data is the new oil". This process will need some crucial questions, and its answers come from the data:
A side note on sensors: If you are lucky, or if you choose the right provider, the equipment will come with all the sensors you need. If not, don't despair, sometimes simple sensors can give you enough information. For instance, a simple microphone can pick up the required information, especially if you combine it with machine learning techniques to find interesting failure patterns.
The amount of data needed varies a lot from case to case, but in general, the most important part is to have a fair amount of failure examples. Usually, malfunctions are rare, and everything will need to be stored until enough examples are recorded.
Continuous monitoring of sensors will generate a lot of data that can be hard to manage, but big data solutions exist for a reason. It is essential to define how often the measurement from a sensor will be read, where the data will be stored, and how to process the values obtained. Storing time series data is a well-known problem, and there are plenty of cost-efficient solutions that take advantage of the fact that the data is never updated and rarely queried.
Labeling is the main factor that determines the quality of your data. Your data scientists need consistency and thorough labels that classify each exceptional event.
Events like failures categorized in their types, maintenance and what parts were replaced or fixed, power outages and any other episode that can affect the measurements, will significantly help your data scientists and improve the quality of the solution overall.
Machine learning will be easier to implement in small problems, so you should first train a model to detect issues in one of the components, instead of your whole process. Focusing on a single part of a machine, or identifying just one kind of failure, are also good options. As more concrete and straightforward the problem is, the easier it will be for the model to perform.
Without getting into the nitty-gritty of what model to use, feature engineering, and other things that you will need to do, let's focus instead on the big picture. There are different ways to apply machine learning to predictive maintenance, what they offer, and what kind of data they need.
We know that AI explainability can be a sensitive topic for some machine learning applications in the industry. Thinking of machine learning systems as black boxes is not ideal and does not provide the answers needed to make business decisions. However, there has been much effort invested lately in improving explainability to alleviate your concerns.
Let's dive into some of the most common examples that show how machine learning can help us.
One of the first things you learn when starting in Machine learning is linear regression; this means predicting a single real number from a bunch of inputs. A RUL predictor will try to do just that, given the measurements in a given time. In some cases, that might be enough; in others, you will need to provide more context information to your model. The output you will get is an approximation of time before a failure is expected to happen, then you decide when is the appropriate time to take action. To implement this solution, you will need historical data and labels that indicate when the failure happened.
As we mentioned before, it is better if you only target one type of failure at the time; this way, the model only needs to learn the pattern(s) that lead to a particular failure type.
You don't necessarily need to predict how far into the future a component will break. Having a predictive model that determines if a part is likely to break in the next X days is almost as useful.
If you have enough data and a powerful model, you can frame it as a multi-class problem to know exactly which type of error to expect. This will let you know which action you should take.
Data requirements for this solution are the same as RUL, but if you want to do multi-class classification, you will need to have detailed labels for each failure class.
If you have data but no labels on it, you can use different methods to distinguish normal behavior from anomalies, which most likely mean failures or malfunctions.
These methods can be hard to validate, but once you are confident that you have separated the wheat from the chaff, you can use the result as labels for one of the methods we described above, or fire alerts when the measurements deviate too much from the expected behavior.
Halliburton needed to monitor thousands of oil pumps scattered across the United States. Most of them were in areas with no mobile coverage, and even when they had, prices were prohibitive. We partnered with them to implement a monitoring system for their oil pumps while overcoming a common challenge when implementing predictive maintenance in real life: limited connectivity.
The most traditional way to apply predictive maintenance would have been to use a central server. But on these conditions, it would be extremely costly because we would need to transfer all the data to the server, run all diagnosis there, and send it back to take action. So this was not an option.
We devised a solution using these IoT devices attached to the already existing equipment, to do the heavy processing and only alert when something was off. This approach brought all the benefits of predictive maintenance while keeping the internet bill low.
Historically edge devices have lacked computing power to process large amounts of data. This is no longer true as edge devices are becoming more powerful and cheaper (check our comparison on multiple devices running bigger models).
Machine learning on the edge is a great cost-effective way to implement real-time predictive maintenance, even in extremely distributed cases. These devices can not only monitor and alert about pump status but also take remote actions to fix or prevent issues.
We teamed up with SES to build a Medium Earth Orbit (MEO) monitoring solution for the gateways and their customers' terminals. As a network provider, the most significant risk to SES's business is outages. The main goal of the project was to predict weather-related outages in a matter of minutes since ~70% of the outages they had were due to weather conditions that interfered between the ground antenna, terminal or gateway, and the satellite signal,
At Tryolabs, we developed an offline-on-site solution in this case as well. For SES, the moment their equipment started to malfunction due to weather conditions, it was too late, and the client's internet connection was already lost.
To avoid the outage altogether, the monitoring system can now switch to a Geostationary satellite when weather conditions worsen. While this is more expensive and slower, it is not affected by weather fluctuations. With this solution in place, they can now provide a proactive customer's Service Level Agreement (SLA) management, detecting the outage in real-time or even anticipate it.
Once the above monitoring solution is fully operational, the next logical step was to extend the solution to provide predictive maintenance to the customer's terminals as part of the SES service. This will avoid long-term outages, prevent expensive parts' replacement when unnecessary, and reduce the manual work of visiting and monitoring these terminals.
In this case, thermal images were processed using machine learning to do real-time detection of anomalies in electrical tools to prevent power-equipment failure for power stations located in Chongqing, China.
The team tackling this project realized that multiple reasons could raise the internal temperature of electrical instruments, which end up triggering unexpected problems and potential damage to power systems. They concluded that early detection of these temperature anomalies could prevent larger damage since it can improve inspection speed and effectiveness. In summary, the problem at hand was to conclude if it is possible to detect temperature anomalies fast in order to act quickly, save costs, and prevent a whole system breakdown.
The solution implemented consisted of deploying thermal infrared cameras across ten power substations to monitor temperatures. Once they were able to capture 150 thermal pictures using 300 hotspots, they implemented a multi-layered perceptron (MLP) to classify the thermal conditions into "defect" and "non-defect".
Per the paper results, after fine-tuning the MLP model, the team was able to achieve an 84% accuracy, which was good enough for them to demonstrate the ROI of deploying this predictive maintenance system.
Predictive maintenance has clear advantages over other types of maintenance and it is widely applicable. The key to establishing a predictive maintenance pipeline is the ability to read, process and store valuable data. The ultimate benefit is reduced downtime and even reduced maintenance costs. The discipline and technologies, which complex, are applicable to a wide range of enterprises.
We looked at cases where internet connection or physical access may be a major issue and showed how edge devices could help us overcome these difficulties. Hard to access places or remote environments are not obstacles for predictive maintenance applicability. Even better, we can bring the latest machine learning techniques with these edge devices and add all its benefits within.
It is more important to be able to read, process, and store valuable data than to have the latest and shiniest machine learning models working on worthless data.
Once good data is in place, we can start thinking about the best ways to apply machine learning to the data and get the most out of it.
If you think there could be ways of applying predictive maintenance to your business, please don't hesitate to contact us.
© 2025. All rights reserved.