Federated learning and its use with IoT devices
In the world of IoT, AI and machine learning (ML) are often seen as the main topic when talking about advancements in the industry. Indeed, the applications for AI and ML in the broader IoT ecosystem are abundant. Yet, a topic generating less noise, but having equally broad ramification for IoT is Federated Learning (FL).
This article originally appeared in the Aug'23 magazine issue of Electronic Specifier Design – see ES's Magazine Archives for more featured publications.
Yet rather than seeking to supersede anything, the idea is that FL seeks to bring many benefits to devices, that use ML and AI, to operate whilst being less dependent on a centralised system often typical to these models. Being a less explored topic, we’ll take a deep dive into FL, it’s history, applications and future.
What is FL?
“Federated learning is a new technology, a new approach to machine learning,” says George Hancock, CGO & Business Development Director at T-DAB.AI. “In principal, it is a privacy preserving machine learning technique used to build and train models without having to move or see the data.”
FL can be traced back to the early development of distributed ML techniques. Researchers seeking to train ML models on distributed systems, where data is decentralised across multiple devices or servers, were led to the idea of individual devices being more independent in their learning despite being connected to a wider overall system.
In 2016, Google researchers published a paper called "Communication-Efficient Learning of Deep Networks from Decentralised Data". The work outlined a privacy-preserving method to train ML models on user devices without necessitating the transfer of raw data to a central server and coined the term "federated learning" for this process. Models using FL are instead trained on decentralised devices without sharing raw data to a centralised server, as the learning process occurs on the devices themselves, and only aggregated model updates are shared back to the central server. The server then aggregates these model updates using techniques like federated averaging or secure aggregation protocols. The aggregated model is then sent back to the devices, and the process of local model training and aggregation continues. Multiple rounds of communication and model updates allow the models to progressively learn from their situation. Once the FL process reaches a desired level of convergence, a final global model can be generated.
Yet despite individual device data being aggregated into the global model and forming a broader update for devices as a whole, the nuances each individual devices had through their interactions can still be retained. “They take the devices unique parameters going up, and then take them when the update goes back down to the device,” says Hancock. “This is because, when the global model goes back down to Edge, the model takes into consideration the parameters, makes a comparison on which will perform better, the global model running on the device or the local model and implements the model.”
FL therefore soon began to find application in various domains by those seeing application in its decentralised nature: healthcare, finance, Edge computing and of course, IoT. Although different domains, they share issues related to data privacy, limited network connectivity, or regulatory constraints, something FL is primed to aid with.
Why is FL useful?
So, as previously mentioned, Privacy is a big draw. The system is more private because of the way data is processed on device as opposed to having to be sent to a centralised system. How centralised system work is that raw data is collected from the devices and transferred to a central server for processing. This makes them attractive targets for potential attackers because large amounts of information are being transferred from user devices to the server, creating points of breach for sensitive information during data transmission and making the data haul in the central server attractive bait for potential hackers.
Yet, by keeping data on local devices, FL minimises appeal for hackers to even attempt a wide-scale attack due to the more separated nature of the data. Because only aggregates are sent from devices to the central server, then should a hack of the central server take place, the raw data from the individual connected devices will remain uncompromised. Should the hacker want to get this, they would have to hack each individual device for its individual information. A task too large for many to see it through. Equally, because raw data isn’t transmitted, it closes another entry for hackers who would be looking to get the data from the network during the data transfer process.
By not transferring this data also means that latency and bandwidth requirements are also minimised. This becomes especially pertinent when examining its application with IoT devices, as many devices can placed in remote areas with low connectivity. Equally, the limited power resources of these devices benefit from reducing the energy consumption associated with sending data over the network.
Diversity in data can also aid in the quality of the model that is eventually aggregated. The inclusion of data from different devices, environments, and user profiles introduces greater heterogeneity, leading to more comprehensive and nuanced models. This collective intelligence from the various devices enhances the model's generalisation and ability to capture a wide range of real-world scenarios.
Applications of FL in IoT
The attributes of FL are therefore easily applied to IoT generally, with specific sectors of it proving very applicable.
“Into this sector of IoT, there's a huge opportunity for this [federated learning] to cause disruption,” says Hancock. “But I think it’s current, lower, use indicates more about the maturity of the technological solution.”
One example of the utility of FL with IoT is smart homes. Keeping devices information segmented can protect systems that hold very private information regarding everything from device placement on entry points, temperature of the house, occupancy awareness and even smart lock control. Equally, the AI models on each individual IoT device within the home can enhance smart home automation through context-aware personalisation of how users have interacted with that specific device to improve the model as a whole. For instance, optimising energy consumption in the least visited rooms, adjusting ambient conditions when empty, or turning on and off.
This attribute can also be implemented into the healthcare industry. FL’s ability for personalisation on individual IoT devices can cater to the unique nature of health. As health of individuals can differ wildly from patient to patient, FL’s ability to allow each device to be alert to different patterns, such as vital signs, activity levels, or health conditions can result in more nuanced monitoring of patients. This collaboration helps enhance the accuracy and robustness of the final global model, which enables improved healthcare predictions, anomaly detection, or disease diagnosis.
“If you think about that example of patients’ wearables, you may have different classes of patients. So, if you had 1000 patients and you put them into 10 classes, from very vulnerable to not vulnerable, you could apply some model parameters to those groups of patterns with things you'd expect to see. Then you would then be able to use those parameters to get some devices to act specifically for one group as opposed to treating all the people the same,” says Hancock.
These applications show that just because privacy is a big draw of FL, does not mean industries that don’t require it to the same degree don’t have use of FL. FL’s utilisation in aptly applying the collective knowledge from a range of IoT devices means that agriculture, industrial and even concepts like smart cities all stand to benefit form FL IoT devices.
Drawbacks of FL
“We're not prescribing all use cases take a federated approach, what we're looking at doing is solving issues around privacy and security of that data.” Hancock explains.
Currently there exists some drawbacks to FL that may explain why the system is not more ubiquitously used among IoT devices.
In FL, local devices need to perform computationally intensive tasks, such as model training and optimisation, which can put a strain on their resources. Devices with limited computational capabilities or energy constraints may struggle to efficiently execute the necessary computations, leading to slower training or decreased device performance.
Equally, what is a pro of FL, can also be a con. The heterogeneity of devices and data sources and the models then created from them can make it challenging to aggregate them into global model updates. Differences in data quality, device capabilities, or data distribution all can lead to difficulties in achieving optimal model performance and convergence.
Even lack of centralised control isn’t always positive, as the central server has limited visibility and control over the training process on individual devices, making it more challenging to monitor and ensure the quality, integrity, and compliance of the training process across all participating devices.
Yet the benefits of such a system in IoT devices is evident despite potential drawbacks. A report by Gartner estimated 75% of enterprise-generated data will be created and processed at the Edge, meaning industry is seeing the value of more decisions being made closer to the device, and as Hancock explains, the increase in Edge use cases sees more opportunity for FL to be applied. So as IoT is entwined more with the Edge, then FL as a means of security may become more necessary and more ubiquitous.