Open Access
Open access
Journal of Engineering Research

A Trust-based Global Expert System for Disease Diagnosis Using Hierarchical Federated Learning

Farah M. Al-Mulla
Farah M. Almulla
Mohammed A. Almulla
Publication typeJournal Article
Publication date2025-03-15
scimago Q3
wos Q3
SJR0.232
CiteScore1.6
Impact factor0.9
ISSN23071877, 23071885, 27641317
Zhang F., Kreuter D., Chen Y., Dittmer S., Tull S., Shadbahr T., Schut M., Asselbergs F., Kar S., Sivapalaratnam S., Williams S., Koh M., Henskens Y., de Wit B., D’Alessandro U., et. al.
Patterns scimago Q1 wos Q1 Open Access
2024-06-14 citations by CoLab: 7 Abstract  
For healthcare datasets, it is often impossible to combine data samples from multiple sites due to ethical, privacy, or logistical concerns. Federated learning allows for the utilization of powerful machine learning algorithms without requiring the pooling of data. Healthcare data have many simultaneous challenges, such as highly siloed data, class imbalance, missing data, distribution shifts, and non-standardized variables, that require new methodologies to address. Federated learning adds significant methodological complexity to conventional centralized machine learning, requiring distributed optimization, communication between nodes, aggregation of models, and redistribution of models. In this systematic review, we consider all papers on Scopus published between January 2015 and February 2023 that describe new federated learning methodologies for addressing challenges with healthcare data. We reviewed 89 papers meeting these criteria. Significant systemic issues were identified throughout the literature, compromising many methodologies reviewed. We give detailed recommendations to help improve methodology development for federated learning in healthcare.
Hemalatha J., Sekar M., Kumar C., Gutub A., Sahu A.K.
2023-08-01 citations by CoLab: 30 Abstract  
The success rate for blind or universal steganalysis lies in the ability to extract the statistical footprints of image features. Further, the choice of machine learning (ML) algorithm is crucial to distinguish the stego image more precisely from the untouched clean images. Literature suggests that most steganalysis approaches report less favorable detection accuracy despite considering many features. This study presents a three-step process to accurately identify the clean and stego images to solve this issue. We used the curvelet denoising as an initial phase during the first step to suppress the natural noise residuals (NRs) by producing the stego NRs. Secondly, it extracts the Third-order Markov-chain sample transition probability matrices as features. Finally, the oblique decision tree ensemble using a multisurface proximal support vector machine (SVM) classifier has been utilized to achieve greater detection accuracy than the state-of-the-art classifiers. The experiments are performed on an extensive database comprising clean and stego images generated from nine embedding schemes with varying payloads. The experimental results suggest that an accuracy of 93.12 has been achieved using the proposed Third order subtractive pixel adjacency matrix (SPAM) features with an ensemble classifier.
Arafeh M., Otrok H., Ould-Slimane H., Mourad A., Talhi C., Damiani E.
Internet of Things scimago Q1 wos Q1
2023-07-01 citations by CoLab: 22 Abstract  
Numerous research recently proposed integrating Federated Learning (FL) to address the privacy concerns of using machine learning in privacy-sensitive firms. However, the standards of the available frameworks can no longer sustain the rapid advancement and hinder the integration of FL solutions, which can be prominent in advancing the field. In this paper, we propose ModularFed, a research-focused framework that addresses the complexity of FL implementations and the lack of adaptability and extendability in the available frameworks. We provide a comprehensive architecture that assists FL approaches through well-defined protocols covering three dominant FL paradigms: adaptable workflow, datasets distribution, and third-party application support. Within this architecture, protocols are blueprints that strictly define the framework’s components’ design, contribute to its flexibility, and strengthen its infrastructure. Further, our protocols aim to enable modularity in FL, supporting third-party plug-and-play architecture and dynamic simulators coupled with major built-in data distributors. Additionally, the framework support wrapping multiple approaches in a single environment to enable consistent replication of FL issues such as clients’ deficiency, data distribution, and network latency, which entails a fair comparison of techniques outlying FL technologies. In our evaluation, we examine the applicability of our framework addressing major FL domains, including statistical distribution and modular-based resource monitoring tools and client selection. Moreover, our comparison analysis indicates that our architecture has an inconsiderable impact on performance compared to other approaches.
Khalid N., Qayyum A., Bilal M., Al-Fuqaha A., Qadir J.
2023-05-01 citations by CoLab: 173 Abstract  
There has been an increasing interest in translating artificial intelligence (AI) research into clinically-validated applications to improve the performance, capacity, and efficacy of healthcare services. Despite substantial research worldwide, very few AI-based applications have successfully made it to clinics. Key barriers to the widespread adoption of clinically validated AI applications include non-standardized medical records, limited availability of curated datasets, and stringent legal/ethical requirements to preserve patients' privacy. Therefore, there is a pressing need to improvise new data-sharing methods in the age of AI that preserve patient privacy while developing AI-based healthcare applications. In the literature, significant attention has been devoted to developing privacy-preserving techniques and overcoming the issues hampering AI adoption in an actual clinical environment. To this end, this study summarizes the state-of-the-art approaches for preserving privacy in AI-based healthcare applications. Prominent privacy-preserving techniques such as Federated Learning and Hybrid Techniques are elaborated along with potential privacy attacks, security challenges, and future directions.
Ranchon F., Chanoine S., Lambert-Lacroix S., Bosson J., Moreau-Gaudry A., Bedouch P.
2023-04-01 citations by CoLab: 39 Abstract  
Artificial Intelligence (AI) offers potential opportunities to optimize clinical pharmacy services in community or hospital settings. The objective of this systematic literature review was to identify and analyse quantitative studies using or integrating AI for clinical pharmacy services.A systematic review was conducted using PubMed/Medline and Web of Science databases, including all articles published from 2000 to December 2021. Included studies had to involve pharmacists in the development or use of AI-powered apps and tools..19 studies using AI for clinical pharmacy services were included in this review. 12 out of 19 articles (63.1%) were published in 2020 or 2021. Various methodologies of AI were used, mainly machine learning techniques and subsets (natural language processing and deep learning). The datasets used to train the models were mainly extracted from electronic medical records (6 studies, 32%). Among clinical pharmacy services, medication order review was the service most targeted by AI-powered apps and tools (9 studies), followed by health product dispensing (4 studies), pharmaceutical interviews and therapeutic education (2 studies). The development of these tools mainly involved hospital pharmacists (12/19 studies).The development of AI-powered apps and tools for clinical pharmacy services is just beginning. Pharmacists need to keep abreast of these developments in order to position themselves optimally while maintaining their human relationships with healthcare teams and patients. Significant efforts have to be made, in collaboration with data scientists, to better assess whether AI-powered apps and tools bring value to clinical pharmacy services in real practice.
Asad M., Aslam M., Jilani S.F., Shaukat S., Tsukada M.
Future Internet scimago Q2 wos Q2 Open Access
2022-11-18 citations by CoLab: 11 PDF Abstract  
Dynamic and smart Internet of Things (IoT) infrastructures allow the development of smart healthcare systems, which are equipped with mobile health and embedded healthcare sensors to enable a broad range of healthcare applications. These IoT applications provide access to the clients’ health information. However, the rapid increase in the number of mobile devices and social networks has generated concerns regarding the secure sharing of a client’s location. In this regard, federated learning (FL) is an emerging paradigm of decentralized machine learning that guarantees the training of a shared global model without compromising the data privacy of the client. To this end, we propose a K-anonymity-based secure hierarchical federated learning (SHFL) framework for smart healthcare systems. In the proposed hierarchical FL approach, a centralized server communicates hierarchically with multiple directly and indirectly connected devices. In particular, the proposed SHFL formulates the hierarchical clusters of location-based services to achieve distributed FL. In addition, the proposed SHFL utilizes the K-anonymity method to hide the location of the cluster devices. Finally, we evaluated the performance of the proposed SHFL by configuring different hierarchical networks with multiple model architectures and datasets. The experiments validated that the proposed SHFL provides adequate generalization to enable network scalability of accurate healthcare systems without compromising the data and location privacy.
Roy P.K., Saumya S., Singh J.P., Banerjee S., Gutub A.
2022-05-04 citations by CoLab: 59 Abstract  
Over the last couple of decades, community question-answering sites (CQAs) have been a topic of much academic interest. Scholars have often leveraged traditional machine learning (ML) and deep learning (DL) to explore the ever-growing volume of content that CQAs engender. To clarify the current state of the CQA literature that has used ML and DL, this paper reports a systematic literature review. The goal is to summarise and synthesise the major themes of CQA research related to (i) questions, (ii) answers and (iii) users. The final review included 133 articles. Dominant research themes include question quality, answer quality, and expert identification. In terms of dataset, some of the most widely studied platforms include Yahoo! Answers, Stack Exchange and Stack Overflow. The scope of most articles was confined to just one platform with few cross-platform investigations. Articles with ML outnumber those with DL. Nonetheless, the use of DL in CQA research is on an upward trajectory. A number of research directions are proposed.
Rai S., Kumari A., Prasad D.K.
AI scimago Q2 wos Q2 Open Access
2022-02-25 citations by CoLab: 20 PDF Abstract  
Federated learning promises an elegant solution for learning global models across distributed and privacy-protected datasets. However, challenges related to skewed data distribution, limited computational and communication resources, data poisoning, and free riding clients affect the performance of federated learning. Selection of the best clients for each round of learning is critical in alleviating these problems. We propose a novel sampling method named the irrelevance sampling technique. Our method is founded on defining a novel irrelevance score that incorporates the client characteristics in a single floating value, which can elegantly classify the client into three numerical sign defined pools for easy sampling. It is a computationally inexpensive, intuitive and privacy preserving sampling technique that selects a subset of clients based on quality and quantity of data on edge devices. It achieves 50–80% faster convergence even in highly skewed data distribution in the presence of free riders based on lack of data and severe class imbalance under both Independent and Identically Distributed (IID) and Non-IID conditions. It shows good performance on practical application datasets.
Liu P., Xu X., Wang W.
Cybersecurity scimago Q1 wos Q1 Open Access
2022-02-02 citations by CoLab: 116 PDF Abstract  
Empirical attacks on Federated Learning (FL) systems indicate that FL is fraught with numerous attack surfaces throughout the FL execution. These attacks can not only cause models to fail in specific tasks, but also infer private information. While previous surveys have identified the risks, listed the attack methods available in the literature or provided a basic taxonomy to classify them, they mainly focused on the risks in the training phase of FL. In this work, we survey the threats, attacks and defenses to FL throughout the whole process of FL in three phases, including Data and Behavior Auditing Phase, Training Phase and Predicting Phase. We further provide a comprehensive analysis of these threats, attacks and defenses, and summarize their issues and taxonomy. Our work considers security and privacy of FL based on the viewpoint of the execution process of FL. We highlight that establishing a trusted FL requires adequate measures to mitigate security and privacy threats at each phase. Finally, we discuss the limitations of current attacks and defense approaches and provide an outlook on promising future research directions in FL.
Abdulrahman S., Tout H., Ould-Slimane H., Mourad A., Talhi C., Guizani M.
IEEE Internet of Things Journal scimago Q1 wos Q1
2021-04-01 citations by CoLab: 456 Abstract  
Driven by privacy concerns and the visions of deep learning, the last four years have witnessed a paradigm shift in the applicability mechanism of machine learning (ML). An emerging model, called federated learning (FL), is rising above both centralized systems and on-site analysis, to be a new fashioned design for ML implementation. It is a privacy-preserving decentralized approach, which keeps raw data on devices and involves local ML training while eliminating data communication overhead. A federation of the learned and shared models is then performed on a central server to aggregate and share the built knowledge among participants. This article starts by examining and comparing different ML-based deployment architectures, followed by in-depth and in-breadth investigation on FL. Compared to the existing reviews in the field, we provide in this survey a new classification of FL topics and research fields based on thorough analysis of the main technical challenges and current related work. In this context, we elaborate comprehensive taxonomies covering various challenging aspects, contributions, and trends in the literature, including core system models and designs, application areas, privacy and security, and resource management. Furthermore, we discuss important challenges and open research directions toward more robust FL systems.
Altalhi S., Gutub A.
2021-01-01 citations by CoLab: 32 Abstract  
In recent years, the number of cyber-attacks increased affecting different application types and targets. Many studies tried to focus on proposing solutions to detect imminent and current attacks. Besides that, they tried to extract useful information expecting these attacks in different ways. This study considered surveying recognizing the popular social online network Twitter data to detect and predict security attacks possibility. In this paper, we review and compare the relevant existing works that make use of Twitter streaming data to extract knowledge about current and imminent security cyber-attacks. The survey comparison is based on different effectiveness factors that are essential in the cyber domain for obtaining useful results. The work considered prediction factors investigating the detection scope, feature extraction technique, algorithm complexity, information summarization level, scalability over time, and performance measurements, all analysed to gain its prediction contribution. The comparison results are utilized for arranging previous work by a suggested unified (figure of merit) degree of achieving the factors. Thus, many improvements are proposed to enhance the top two models, SYNAPSE and DataFreq, to take a further step toward accurate predictions. This survey work is focussing on linking unrelated viewed studies aiming common prediction of cyber-security attacks in an attractive way, opening the door for more precise predictions of cyber-attacks research to come.
Corny J., Rajkumar A., Martin O., Dode X., Lajonchère J., Billuart O., Bézie Y., Buronfosse A.
2020-09-27 citations by CoLab: 92 Abstract  
Abstract Objective To improve patient safety and clinical outcomes by reducing the risk of prescribing errors, we tested the accuracy of a hybrid clinical decision support system in prioritizing prescription checks. Materials and Methods Data from electronic health records were collated over a period of 18 months. Inferred scores at a patient level (probability of a patient’s set of active orders to require a pharmacist review) were calculated using a hybrid approach (machine learning and a rule-based expert system). A clinical pharmacist analyzed randomly selected prescription orders over a 2-week period to corroborate our findings. Predicted scores were compared with the pharmacist’s review using the area under the receiving-operating characteristic curve and area under the precision-recall curve. These metrics were compared with existing tools: computerized alerts generated by a clinical decision support (CDS) system and a literature-based multicriteria query prioritization technique. Data from 10 716 individual patients (133 179 prescription orders) were used to train the algorithm on the basis of 25 features in a development dataset. Results While the pharmacist analyzed 412 individual patients (3364 prescription orders) in an independent validation dataset, the areas under the receiving-operating characteristic and precision-recall curves of our digital system were 0.81 and 0.75, respectively, thus demonstrating greater accuracy than the CDS system (0.65 and 0.56, respectively) and multicriteria query techniques (0.68 and 0.56, respectively). Discussion Our innovative digital tool was notably more accurate than existing techniques (CDS system and multicriteria query) at intercepting potential prescription errors. Conclusions By primarily targeting high-risk patients, this novel hybrid decision support system improved the accuracy and reliability of prescription checks in a hospital setting.
Chen Y., Qin X., Wang J., Yu C., Gao W.
IEEE Intelligent Systems scimago Q1 wos Q1
2020-07-01 citations by CoLab: 617 Abstract  
With the rapid development of computing technology, wearable devices make it easy to get access to people's health information. Smart healthcare achieves great success by training machine learning models on a large quantity of user personal data. However, there are two critical challenges. First, user data often exist in the form of isolated islands, making it difficult to perform aggregation without compromising privacy security. Second, the models trained on the cloud fail on personalization. In this article, we propose FedHealth, the first federated transfer learning framework for wearable healthcare to tackle these challenges. FedHealth performs data aggregation through federated learning, and then builds relatively personalized models by transfer learning. Wearable activity recognition experiments and real Parkinson's disease auxiliary diagnosis application have evaluated that FedHealth is able to achieve accurate and personalized healthcare without compromising privacy and security. FedHealth is general and extensible in many healthcare applications.
Silva S., Gutman B.A., Romero E., Thompson P.M., Altmann A., Lorenzi M.
2019-04-01 citations by CoLab: 141 Abstract  
At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts.
Brisimi T.S., Chen R., Mela T., Olshevsky A., Paschalidis I.C., Shi W.
2018-04-01 citations by CoLab: 609 Abstract  
In an era of "big data," computationally efficient and privacy-aware solutions for large-scale machine learning problems become crucial, especially in the healthcare domain, where large amounts of data are stored in different locations and owned by different entities. Past research has been focused on centralized algorithms, which assume the existence of a central data repository (database) which stores and can process the data from all participants. Such an architecture, however, can be impractical when data are not centrally located, it does not scale well to very large datasets, and introduces single-point of failure risks which could compromise the integrity and privacy of the data. Given scores of data widely spread across hospitals/individuals, a decentralized computationally scalable methodology is very much in need.We aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We seek to develop a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data.We focus on the soft-margin l1-regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the data holders to collaborate, while keeping every participant's data private.We test cPDS on the problem of predicting hospitalizations due to heart diseases within a calendar year based on information in the patients Electronic Health Records prior to that year. cPDS converges faster than centralized methods at the cost of some communication between agents. It also converges faster and with less communication overhead compared to an alternative distributed algorithm. In both cases, it achieves similar prediction accuracy measured by the Area Under the Receiver Operating Characteristic Curve (AUC) of the classifier. We extract important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?