Aarhus Universitets segl

Using deep learning methods to tailor sleep scoring to specific populations

Using deep learning methods to tailor sleep scoring to specific populations

Quality of sleep significantly impacts physical health, mental health, and cognitive performance. A key tool to probe sleep architecture is the scoring of EEG sleep recordings into the five sleep stages defined by the American Academy of Sleep Medicine (AASM). Traditionally, such sleep scoring is a laborious task performed by trained experts using guidelines defined in terms of polysomnography (PSG) recordings.

This has inspired the development of automatic sleep scoring algorithms that have achieved human-level performance by applying deep learning models to large datasets of PSG recordings. Such models may perform significantly worse on unseen patient groups and recordings made by different devices. Unsurprisingly, models trained on PSG input benefit from adaptation when scoring ear-EEG recordings, but even the use of different recording equipment for making recordings of the same modality may cause such degradation. More broadly, sleep differs between patients based on health, age, and individual traits.

Figure 1 TSNE visualization of learned representations of 3 nights of sleep ear-EEG recordings for 3 subjects color coded by the predicted sleep stages of the model. Note that while there is variation between nights of one subject, the variation appears greater between subjects.

Improving sleep scoring models using transfer learning

Deep learning models learn useful, complex representations of the data through a sequence of simple transformations. The process of jointly learning these simple transformations is called training the network and is done by updating the weights of a network so that its predictions tend to match the labels of provided examples e.g., the training data. Thus, the nature and quality of the training data plays a central role in how the model generalizes i.e. performs on unseen data. Formally, we describe the domain of the training data as consisting of the input space – characterizing the numerical values the data may take – and the marginal distribution that characterizes what value the data is likely to take. Applying a trained model to data that does not match the training data in these components is likely to significantly impact performance. This phenomenon is called domain mismatch, and transfer learning methods that deal with this issue are categorized as domain adaptation methods.

The domain of the data is not the only component of the problem. In the case of a classification problem, the labels we may assign a data point and the target mapping from input data points to these labels constitute the task. Transfer learning also encompasses methods that adapt models trained on a different task. One example is adapting a model trained to predict whether two EEG epochs are close together in the signal to instead detect sleep stages. The fundamental idea is that learned representations that are useful for one problem might provide a useful starting point for another problem.

If used appropriately, such methods can increase performance, reduce the need for training data and greatly speed up the process of training a sleep scoring model.

Application to sleep scoring with ear-EEG

Most existing EEG sleep databases consist of PSG recordings. Applying deep learning models trained on such recordings to ear-EEG sleep recordings necessarily introduces a domain mismatch – both the number of electrodes and electrode placement differ. Mismatch of this nature can even occur between different recording equipment of the same modality. Our goal is to learn how to best alleviate such a mismatch with as little labeled data as possible.

Beyond handling equipment-related mismatch, we also aim to discover the best practice for adapting sleep scoring models to patient groups for which we have only little labeled data. While posing an interesting problem in itself, it would also help to lower the barrier of entry of such models into a clinical setting.

We expect wearable EEG devices like ear-EEG to enable the recording of sleep at a much larger scale and longitudinally. In this context, the problem of adapting a model to a particular individual i.e. personalization becomes feasible both economically and technically.   We aim to investigate the benefits, if any, of personalization of sleep scoring models and the best approaches to doing so.