IEEE Transactions on Emerging Topics in Computational Intelligence, volume 8, issue 2, pages 1156-1166

Enhancing Low-Density EEG-Based Brain-Computer Interfacing With Similarity-Keeping Knowledge Distillation

Publication typeJournal Article
Publication date2024-04-01
scimago Q1
SJR1.894
CiteScore10.3
Impact factor5.3
ISSN2471285X
Computer Science Applications
Computational Mathematics
Artificial Intelligence
Control and Optimization
Yang Z., Li Z., Shao M., Shi D., Yuan Z., Yuan C.
2022-11-02 citations by CoLab: 118 Abstract  
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students’ performance by imitating the output of the teacher. This paper shows that teachers can also improve students’ representation power by guiding students’ feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student’s feature and force it to generate the teacher’s full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU. Our codes are available at https://github.com/yzd-v/MGD .
Li R., Wang L., Sourina O.
Methods scimago Q1 wos Q2
2022-06-01 citations by CoLab: 37 Abstract  
Situation awareness (SA) has received much attention in recent years because of its importance for operators of dynamic systems. Electroencephalography (EEG) can be used to measure mental states of operators related to SA. However, cross-subject EEG-based SA recognition is a critical challenge, as data distributions of different subjects vary significantly. Subject variability is considered as a domain shift problem. Several attempts have been made to find domain-invariant features among subjects, where subject-specific information is neglected. In this work, we propose a simple but efficient subject matching framework by finding a connection between a target (test) subject and source (training) subjects. Specifically, the framework includes two stages: (1) we train the model with multi-source domain alignment layers to collect source domain statistics. (2) During testing, a distance is computed to perform subject matching in the latent representation space. We use a reciprocal exponential function as a similarity measure to dynamically select similar source subjects. Experiment results show that our framework achieves a state-of-the-art accuracy 74.32% for the Taiwan driving dataset.
Wang L., Yoon K.
2022-06-01 citations by CoLab: 419 Abstract  
Deep neural models, in recent years, have been successful in almost every field, even solving the most complex problem statements. However, these models are huge in size with millions (and even billions) of parameters, demanding heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of labeled data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called ‘Student-Teacher’ (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically used for vision tasks. In general, we investigate some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.
Aguilar-Herrera A.J., Delgado-Jimenez E.A., Candela-Leal M.O., Olivas-Martinez G., Alvarez-Espinosa G.J., Ramirez-Moreno M.A., Lozoya-Santos J.D., Ramirez-Mendoza R.A.
2021-12-15 citations by CoLab: 4 Abstract  
This work presents a real-time biofeedback tool that employs wearables and the Internet of Things with educational applications to improve students' learning and retention. We aimed to create a web platform using the Internet of Things (IoT) and Machine Learning (ML) architecture to predict students' performance, analyze mental fatigue, and provide real-time quantitative biofeedback to identify the best learning modality. Thus, the main goal was to develop a system that allows students to learn and improve their projects. We integrated the analysis of real-time biometric signals, machine learning algorithms, and web services as we observed their behavior under different learning modalities, seeking to improve cognitive performance. For this, 23 volunteers filled out the ten-question Fatigue Assessment Scale questionnaire about mental fatigue, validated with the P300 waves acquired during auditory-oddball (AO) tests. Synchronized data acquisition was achieved using Enophones and an E4 wristband. To develop predictive models, we collected the biometric data and incorporated it into an ML algorithm to visualize students' performance in real time. The system can accommodate other wearable systems with new features in further experiments. Thus, we believe this current development has the potential to further revolutionize traditional teaching with this methodology and future enhancements.
Zhu Y., Wang Y.
2021-10-01 citations by CoLab: 55 Abstract  
Knowledge distillation (KD) transfers the dark knowledge from cumbersome networks (teacher) to lightweight (student) networks and expects the student to achieve more promising performance than training without the teacher’s knowledge. However, a counter-intuitive argument is that better teachers do not make better students due to the capacity mismatch. To this end, we present a novel adaptive knowledge distillation method to complement traditional approaches. The proposed method, named as Student Customized Knowledge Distillation (SCKD), examines the capacity mismatch between teacher and student from the perspective of gradient similarity. We formulate the knowledge distillation as a multi-task learning problem so that the teacher transfers knowledge to the student only if the student can benefit from learning such knowledge. We validate our methods on multiple datasets with various teacher-student configurations on image classification, object detection, and semantic segmentation.
Zhang G., Etemad A.
2021-09-28 citations by CoLab: 14 Abstract  
EEG-based emotion recognition often requires sufficient labeled training samples to build an effective computational model. Labeling EEG data, on the other hand, is often expensive and time-consuming. To tackle this problem and reduce the need for output labels in the context of EEG-based emotion recognition, we propose a semi-supervised pipeline to jointly exploit both unlabeled and labeled data for learning EEG representations. Our semi-supervised framework consists of both unsupervised and supervised components. The unsupervised part maximizes the consistency between original and reconstructed input data using an autoencoder, while simultaneously the supervised part minimizes the cross-entropy between the input and output labels. We evaluate our framework using both a stacked autoencoder and an attention-based recurrent autoencoder. We test our framework on the large-scale SEED EEG dataset and compare our results with several other popular semi-supervised methods. Our semi-supervised framework with a deep attention-based recurrent autoencoder consistently outperforms the benchmark methods, even when small sub-sets (3%, 5% and 10%) of the output labels are available during training, achieving a new stateof-the-art semi-supervised performance.
Strypsteen T., Bertrand A.
Journal of Neural Engineering scimago Q1 wos Q2
2021-07-20 citations by CoLab: 38 Abstract  
Objective.To develop an efficient, embedded electroencephalogram (EEG) channel selection approach for deep neural networks, allowing us to match the channel selection to the target model, while avoiding the large computational burdens of wrapper approaches in conjunction with neural networks.Approach.We employ a concrete selector layer to jointly optimize the EEG channel selection and network parameters. This layer uses a Gumbel-softmax trick to build continuous relaxations of the discrete parameters involved in the selection process, allowing them be learned in an end-to-end manner with traditional backpropagation. As the selection layer was often observed to include the same channel twice in a certain selection, we propose a regularization function to mitigate this behavior. We validate this method on two different EEG tasks: motor execution and auditory attention decoding. For each task, we compare the performance of the Gumbel-softmax method with a baseline EEG channel selection approach tailored towards this specific task: mutual information and greedy forward selection with the utility metric respectively.Main results.Our experiments show that the proposed framework is generally applicable, while performing at least as well as (and often better than) these state-of-the-art, task-specific approaches.Significance.The proposed method offers an efficient, task- and model-independent approach to jointly learn the optimal EEG channels along with the neural network weights.
Wang Z., Gu T., Zhu Y., Li D., Yang H., Du W.
2021-07-01 citations by CoLab: 34 Abstract  
Based on the current research on EEG emotion recognition, there are some limitations, such as hand-engineered features, redundant and meaningless signal frames and the loss of frame-to-frame correlation. In this paper, a novel deep learning framework is proposed, named the frame-level distilling neural network (FLDNet), for learning distilled features from the correlations of different frames. A layer named the frame gate is designed to integrate weighted semantic information on multiple frames to remove redundant and meaningless signal frames. A triple-net structure is introduced to distill the learned features net by net to replace the hand-engineered features with professional knowledge. Specifically, one neural network is normally trained for several epochs. Then, a second network of the same structure will be initialized again to learn the extracted features from the frame gate of the first neural network based on the output of the first net. Similarly, the third net improves the features based on the frame gate of the second network. To utilize the representation ability of the triple neural network, an ensemble layer is conducted to integrate the discriminative ability of the proposed framework for final decisions. Consequently, the proposed FLDNet provides an effective method for capturing the correlation between different frames and automatically learn distilled high-level features for emotion recognition. The experiments are carried out in a subject-independent emotion recognition task on public emotion datasets of DEAP and DREAMER benchmarks, which have demonstrated the effectiveness and robustness of the proposed FLDNet.
Chen P., Liu S., Zhao H., Jia J.
2021-06-01 citations by CoLab: 300 Abstract  
Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss functions between the same level's features to improve the effectiveness. We differently study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our new review mechanism is effective and structurally simple. Our finally designed nested and compact framework requires negligible computation overhead, and outperforms other methods on a variety of tasks. We apply our method to classification, object detection, and instance segmentation tasks. All of them witness significant student network performance improvement.
Chen D., Mei J., Zhang Y., Wang C., Wang Z., Feng Y., Chen C.
2021-05-18 citations by CoLab: 181 Abstract  
Recently proposed knowledge distillation approaches based on feature-map transfer validate that intermediate layers of a teacher model can serve as effective targets for training a student model to obtain better generalization ability. Existing studies mainly focus on particular representation forms for knowledge transfer between manually specified pairs of teacher-student intermediate layers. However, semantics of intermediate layers may vary in different networks and manual association of layers might lead to negative regularization caused by semantic mismatch between certain teacher-student layer pairs. To address this problem, we propose Semantic Calibration for Cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple layers rather than a single fixed intermediate layer from the teacher model for appropriate cross-layer supervision in training. Consistent improvements over state-of-the-art approaches are observed in extensive experiments with various network architectures for teacher and student models, demonstrating the effectiveness and flexibility of the proposed attention based soft layer association mechanism for cross-layer distillation.
Gou J., Yu B., Maybank S.J., Tao D.
2021-03-22 citations by CoLab: 1810 Abstract  
In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher–student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.
Chen Z., Zheng X., Shen H., Zeng Z., Zhou Y., Zhao R.
2020-11-02 citations by CoLab: 9 Abstract  
Most previous knowledge distillation frameworks train the student to mimic the teacher’s output of each sample or transfer cross-sample relations from the teacher to the student. Nevertheless, they neglect the structured relations at a category level. In this paper, a novel Category Structure is proposed to transfer category-level structured relations for knowledge distillation. It models two structured relations, including intra-category structure and inter-category structure, which are intrinsic natures in relations between samples. Intra-category structure penalizes the structured relations in samples from the same category and inter-category structure focuses on cross-category relations at a category level. Transferring category structure from the teacher to the student supplements category-level structured relations for training a better student. Extensive experiments show that our method groups samples from the same category tighter in the embedding space and the superiority of our method in comparison with closely related works are validated in different datasets and models.
Li Y., Yang H., Li J., Chen D., Du M.
Neurocomputing scimago Q1 wos Q1
2020-11-01 citations by CoLab: 73 Abstract  
Electroencephalography (EEG) based Brain-Computer Interface (BCI) enables subjects to communicate with the outside world or control equipment using brain signals without passing through muscles and nerves. Many researchers in recent years have studied the non-invasive BCI systems. However, the efficiency of the intention decoding algorithm is affected by the random non-stationary and low signal-to-noise ratio characteristics of the EEG signal. Furthermore, channel selection is another important issue in BCI systems intention recognition. During intention recognition in BCI systems, the unnecessary information produced by redundant electrodes affects the decoding rate and deplete system resources. In this paper, we introduce a recurrent-convolution neural network model for intention recognition by learning decomposed spatio-temporal representations. We apply the novel Gradient-Class Activation Mapping (Grad-CAM) visualization technology to the channel selection. Grad-CAM uses the gradient of any classification, flowing into the last convolutional layer to produce a coarse localization map. Since the pixels of the localization map correspond to the spatial regions where the electrodes are placed, we select the channels that are more important for decision-making. We conduct an experiment using the public motor imagery EEG dataset EEGMMIDB. The experimental results demonstrate that our method achieves an accuracy of 97.36% at the full channel, outperforming many state-of-the-art models and baseline models. Although the decoding rate of our model is the same as the best model compared, our model has fewer parameters with faster training time. After the channel selection, our model maintains the intention decoding performance of 92.31% while reducing the number of channels by nearly half and saving system resources. Our method achieves an optimal trade-off between performance and the number of electrode channels for EEG intention decoding.
Mane R., Chouhan T., Guan C.
Journal of Neural Engineering scimago Q1 wos Q2
2020-08-01 citations by CoLab: 289 Abstract  
Abstract Stroke is one of the leading causes of long-term disability among adults and contributes to major socio-economic burden globally. Stroke frequently results in multifaceted impairments including motor, cognitive and emotion deficits. In recent years, brain–computer interface (BCI)-based therapy has shown promising results for post-stroke motor rehabilitation. In spite of the success received by BCI-based interventions in the motor domain, non-motor impairments are yet to receive similar attention in research and clinical settings. Some preliminary encouraging results in post-stroke cognitive rehabilitation using BCI seem to suggest that it may also hold potential for treating non-motor deficits such as cognitive and emotion impairments. Moreover, past studies have shown an intricate relationship between motor, cognitive and emotion functions which might influence the overall post-stroke rehabilitation outcome. A number of studies highlight the inability of current treatment protocols to account for the implicit interplay between motor, cognitive and emotion functions. This indicates the necessity to explore an all-inclusive treatment plan targeting the synergistic influence of these standalone interventions. This approach may lead to better overall recovery than treating the individual deficits in isolation. In this paper, we review the recent advances in BCI-based post-stroke motor rehabilitation and highlight the potential for the use of BCI systems beyond the motor domain, in particular, in improving cognition and emotion of stroke patients. Building on the current results and findings of studies in individual domains, we next discuss the possibility of a holistic BCI system for motor, cognitive and affect rehabilitation which may synergistically promote restorative neuroplasticity. Such a system would provide an all-encompassing rehabilitation platform, leading to overarching clinical outcomes and transfer of these outcomes to a better quality of living. This is one of the first works to analyse the possibility of targeting cross-domain influence of post-stroke functional recovery enabled by BCI-based rehabilitation.
Gottlibe M., Rosen O., Weller B., Mahagney A., Omar N., Khuri A., Srugo I., Genizi J.
Neurophysiologie Clinique scimago Q2 wos Q2
2020-02-01 citations by CoLab: 39 Abstract  
Summary Objective Changes in EEG patterns during stroke are almost immediate; however, a full EEG test takes time and requires highly qualified staff. In this study, we examined whether a short recording using a portable EEG device can differentiate between a stroke and control group. Methods EEG samples were collected from patients with an acute ischemic stroke event. The control group comprised healthy volunteers. EEG recordings were recorded using a portable brain wave sensor device. The Revised Brain Symmetry Index (rsBSI) was used to quantify the symmetry of spectral power between the two hemispheres. Results The investigation group included 33 patients (ages 46–96, mean age 72 years, 66% male) who were diagnosed with ischemic stroke. The control group included 25 healthy individuals. Scores for the rsBSI of non-stroke patients (M = 0.1686, SD = 0.10) differed significantly from those of ischemic stroke patients (P  Conclusions A statistically significant difference was observed between a group of stroke patients and a matched group of healthy controls with a short recording using a portable EEG device.
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?