pages 385-392

Disaggregating Household Water End-Uses: A Comparative Study Between XGBoost and TabNet

Prathik Pradeep 1
Wanqing Zhao 1
Mark Kowalski 2
Claudia Bakeev 2
Publication typeProceedings Article
Publication date2024-08-16
Shabbir N., Vassiljeva K., Nourollahi Hokmabad H., Husev O., Petlenkov E., Belikov J.
Electronics (Switzerland) scimago Q2 wos Q2 Open Access
2024-04-09 citations by CoLab: 8 PDF Abstract  
Non-intrusive load monitoring (NILM) has emerged as a pivotal technology in energy management applications by enabling precise monitoring of individual appliance energy consumption without the requirements of intrusive sensors or smart meters. In this technique, the load disaggregation for the individual device is accrued by the recognition of their current signals by employing machine learning (ML) methods. This research paper conducts a comprehensive comparative analysis of various ML techniques applied to NILM, aiming to identify the most effective methodologies for accurate load disaggregation. The study employs a diverse dataset comprising high-resolution electricity consumption data collected from an Estonian household. The ML algorithms, including deep neural networks based on long short-term memory networks (LSTM), extreme gradient boost (XgBoost), logistic regression (LR), and dynamic time warping with K-nearest neighbor (DTW-KNN) are implemented and evaluated for their performance in load disaggregation. Key evaluation metrics such as accuracy, precision, recall, and F1 score are utilized to assess the effectiveness of each technique in capturing the nuanced energy consumption patterns of diverse appliances. Results indicate that the XgBoost-based model demonstrates superior performance in accurately identifying and disaggregating individual loads from aggregated energy consumption data. Insights derived from this research contribute to the optimization of NILM techniques for real-world applications, facilitating enhanced energy efficiency and informed decision-making in smart grid environments.
Angelis G.F., Timplalexis C., Salamanis A.I., Krinidis S., Ioannidis D., Kehagias D., Tzovaras D.
2023-08-01 citations by CoLab: 19
Mazzoni F., Alvisi S., Blokker M., Buchberger S.G., Castelletti A., Cominola A., Gross M., Jacobs H.E., Mayer P., Steffelbauer D.B., Stewart R.A., Stillwell A.S., Tzatchkov V., Yamanaka V.A., Franchini M.
Water Research scimago Q1 wos Q1
2023-02-01 citations by CoLab: 41 Abstract  
A detailed characterization of residential water consumption is essential for ensuring urban water systems' capability to cope with changing water resources availability and water demands induced by growing population, urbanization, and climate change. Several studies have been conducted in the last decades to investigate the characteristics of residential water consumption with data at a sufficiently fine temporal resolution for grasping individual end uses of water. In this paper, we systematically review 114 studies to provide a comprehensive overview of the state-of-the-art research about water consumption at the end-use level. Specifically, we contribute with: (1) an in-depth discussion of the most relevant findings of each study, highlighting which water end-use characteristics were so far prioritized for investigation in different case studies and water demand modelling and management studies from around the world; and (2) a multi-level analysis to qualitatively and quantitatively compare the most common results available in the literature, i.e. daily per capita end-use water consumption, end-use parameter average values and statistical distributions, end-use daily profiles, end-use determinants, and considerations about efficiency and diffusion of water-saving end uses. Our findings can support water utilities, consumers, and researchers (1) in understanding which key aspects of water end uses were primarily investigated in the last decades; and (2) in exploring their main features considering different geographical, cultural, and socio-economic regions of the world.
Heydari Z., Cominola A., Stillwell A.S.
2022-10-07 citations by CoLab: 14 Abstract  
Abstract Water monitoring in households provides occupants and utilities with key information to support water conservation and efficiency in the residential sector. High costs, intrusiveness, and practical complexity limit appliance-level monitoring via sub-meters on every water-consuming end use in households. Non-intrusive machine learning methods have emerged as promising techniques to analyze observed data collected by a single meter at the inlet of the house and estimate the disaggregated contribution of each water end use. While fine temporal resolution data allow for more accurate end-use disaggregation, there is an inevitable increase in the amount of data that needs to be stored and analyzed. To explore this tradeoff and advance previous studies based on synthetic data, we first collected 1 s resolution indoor water use data from a residential single-point smart water metering system installed at a four-person household, as well as ground-truth end-use labels based on a water diary recorded over a 4-week study period. Second, we trained a supervised machine learning model (random forest classifier) to classify six water end-use categories across different temporal resolutions and two different model calibration scenarios. Finally, we evaluated the results based on three different performance metrics (micro, weighted, and macro F1 scores). Our findings show that data collected at 1- to 5-s intervals allow for better end-use classification (weighted F-score higher than 0.85), particularly for toilet events; however, certain water end uses (e.g., shower and washing machine events) can still be predicted with acceptable accuracy even at coarser resolutions, up to 1 min, provided that these end-use categories are well represented in the training dataset. Overall, our study provides insights for further water sustainability research and widespread deployment of smart water meters.
Zhou M., Shao S., Wang X., Zhu Z., Hu F.
Sensors scimago Q1 wos Q2 Open Access
2022-07-13 citations by CoLab: 12 PDF Abstract  
Commercial load is an essential demand-side resource. Monitoring commercial loads helps not only commercial customers understand their energy usage to improve energy efficiency but also helps electric utilities develop demand-side management strategies to ensure stable operation of the power system. However, existing non-intrusive methods cannot monitor multiple commercial loads simultaneously and do not consider the high correlation and severe imbalance among commercial loads. Therefore, this paper proposes a deep learning-based non-intrusive commercial load monitoring method to solve these problems. The method takes the total power signal of the commercial building as input and directly determines the state and power consumption of several specific appliances. The key elements of the method are a new neural network structure called TTRNet and a new loss function called MLFL. TTRNet is a multi-label classification model that can autonomously learn correlation information through its unique network structure. MLFL is a loss function specifically designed for multi-label classification tasks, which solves the imbalance problem and improves the monitoring accuracy for challenging loads. To validate the proposed method, experiments are performed separately in seen and unseen scenarios using a public dataset. In the seen scenario, the method achieves an average F1 score of 0.957, which is 7.77% better than existing multi-label classification methods. In the unseen scenario, the average F1 score is 0.904, which is 1.92% better than existing methods. The experimental results show that the method proposed in this paper is both effective and practical.
Sykiotis S., Kaselimi M., Doulamis A., Doulamis N.
Sensors scimago Q1 wos Q2 Open Access
2022-04-11 citations by CoLab: 38 PDF Abstract  
Non-Intrusive Load Monitoring (NILM) describes the process of inferring the consumption pattern of appliances by only having access to the aggregated household signal. Sequence-to-sequence deep learning models have been firmly established as state-of-the-art approaches for NILM, in an attempt to identify the pattern of the appliance power consumption signal into the aggregated power signal. Exceeding the limitations of recurrent models that have been widely used in sequential modeling, this paper proposes a transformer-based architecture for NILM. Our approach, called ELECTRIcity, utilizes transformer layers to accurately estimate the power signal of domestic appliances by relying entirely on attention mechanisms to extract global dependencies between the aggregate and the domestic appliance signals. Another additive value of the proposed model is that ELECTRIcity works with minimal dataset pre-processing and without requiring data balancing. Furthermore, ELECTRIcity introduces an efficient training routine compared to other traditional transformer-based architectures. According to this routine, ELECTRIcity splits model training into unsupervised pre-training and downstream task fine-tuning, which yields performance increases in both predictive accuracy and training time decrease. Experimental results indicate ELECTRIcity’s superiority compared to several state-of-the-art methods.
Arik S.Ö., Pfister T.
2021-05-18 citations by CoLab: 566 Abstract  
We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features. We demonstrate that TabNet outperforms other variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into its global behavior. Finally, we demonstrate self-supervised learning for tabular data, significantly improving performance when unlabeled data is abundant.
Gourmelon N., Bayer S., Mayle M., Bach G., Bebber C., Munck C., Sosna C., Maier A.
Water (Switzerland) scimago Q1 wos Q2 Open Access
2021-01-19 citations by CoLab: 12 PDF Abstract  
With an increasing need for secured water supply, a better understanding of the water consumption behavior is beneficial. This can be achieved through end-use classification, i.e., identifying end-uses such as toilets, showers or dishwashers from water consumption data. Previously, both supervised and unsupervised machine learning (ML) techniques are employed, demonstrating accurate classification results on particular datasets. However, a comprehensive comparison of ML techniques on a common dataset is still missing. Hence, in this study, we are aiming at a quantitative evaluation of various ML techniques on a common dataset. For this purpose, a stochastic water consumption simulation tool with high capability to model the real-world water consumption pattern is applied to generate residential data. Subsequently, unsupervised clustering methods, such as dynamic time warping, k-means, DBSCAN, OPTICS and Hough transform, are compared to supervised methods based on SVM. The quantitative results demonstrate that supervised approaches are capable to classify common residential end-uses (toilet, shower, faucet, dishwasher, washing machine, bathtub and mixed water-uses) with accuracies up to 0.99, whereas unsupervised methods fail to detect those consumption categories. In conclusion, clustering techniques alone are not suitable to separate end-use categories fully automatically. Hence, accurate labels are essential for the end-use classification of water events, where crowdsourcing and citizen science approaches pose feasible solutions for this purpose.
Wang W., Chakraborty G., Chakraborty B.
Applied Sciences (Switzerland) scimago Q2 wos Q2 Open Access
2020-12-28 citations by CoLab: 64 PDF Abstract  
Background: Creatinine is a type of metabolite of blood that is strongly correlated to glomerular filtration rate (GFR). As measuring GFR is difficult, creatinine value is used for indirectly determining GFR and then the stage of chronic kidney disease (CKD). Adding a creatinine test into routine health examination could detect CKD. As more items for comprehensive examination means higher cost, creatinine testing is not included in the routine health examination in many countries. An algorithm based on common test results, without creatinine test, to evaluate the risk of CKD will increase the chance of its early detection and treatment. Methods: In this study, we used open source data containing 1 million samples. These data contain 23 health-related features, including common diagnostic test results provided by National Health Insurance Sharing Service (NHISS). A low GFR indicates possible chronic kidney disease (CKD). As is commonly accepted in the medical community, a GFR of 60 mL/min is used as the threshold, below which is considered to have CKD. In this study, the first step aims to build a regression model to predict the value of creatinine from 23 features, and then combine the predicted value of creatinine with the original 23 features to evaluate the risk of CKD. We will show by simulation that by the proposed method we can achieve better prediction results compared to direct prediction from 23 features. The data is extremely unbalanced for predicting the target variable creatinine. We used undersampling method and proposed a new cost-sensitive mean-squared error (MSE) loss function to deal with the problem. Regrading model selection, this work used three machine learning models: a bagging tree model named Random Forest, a boosting tree model named XGBoost, and a neural network based model named ResNet. To improve the result of the creatinine predictor, we averaged results from eight predictors, a method known as ensemble learning. Finally, the predicted creatinine and the original 23 features is used to predict the risk of CKD. Results: We optimized results of R-Squared (R2) value to select the appropriate undersampling strategy and the regression model for the regression stage of creatinine prediction. Ensembled model achieved the best performance of R2 of 0.5590. The six factors from 23 are selected from the top of the list of how strongly they affect the creatinine value. They are sex, age, hemoglobin, the level of urine protein, waist circumference, and habit of smoking. Using the predicted value of creatinine, an area under Receiver Operating Characteristic curve (AUC) of 0.76 is achieved while classifying samples for CKD. Conclusions: Using commonly available health parameters, the proposed system can assess the risk of CKD for public health. High-risk subjects can be screened and advised to take a creatinine test for further confirmation. In this way, we can reduce the impact of CKD on public health and facilitate early detection for many, where a blanket test of creatinine is not available for all.
Chen Z., Chen J., Xu X., Peng S., Xiao J., Qiao H.
2020-10-30 citations by CoLab: 9 Abstract  
Refined measurement of electricity consumption information on the residential side is an important foundation for achieving demand-side management and promoting smart grid construction. In order to obtain highly accurate appliancelevel energy consumption data, this paper proposes a load identification method based on feature extraction of change-point and XGBoost classifier. Firstly, the event detection is performed by using secondary window determination to determine whether an event has occurred based on changes in power statistics; then, according to the detection results, comprehensively extract the transient-state and steady-state features of the electrical signal in the window; lastly, it is input into the XGBoost model for load energy decomposition. In the case validation phase, the PLAID dataset is compared with five benchmark algorithms, and the results show the effectiveness and superiority of the method proposed in this paper for resident-side load identification.
Yang A., Zhang H., Stewart R., Nguyen K.
Water (Switzerland) scimago Q1 wos Q2 Open Access
2018-09-10 citations by CoLab: 20 PDF Abstract  
The aim of residential water end-use studies is to disaggregate water consumption into different water end-use categories (i.e., shower, toilet, etc.). The authors previously developed a beta application software (i.e., Autoflow v2.1) that provides an intelligent platform to autonomously categorize residential water consumption data and generate management analysis reports. However, the Autoflow v2.1 software water end use event recognition accuracy achieved was between 75 to 90%, which leaves room for improvement. In the present study, a new module augmented to the existing procedure improved flow disaggregation accuracy, which resulted in Autoflow v3.1. The new module applied self-organizing maps (SOM) and K-means clustering algorithms for undertaking an initial pre-grouping of water end-use events before the existing pattern recognition procedures were applied (i.e., ANN, HMM, etc.) For validation, a dataset consisting of over 100,000 events from 252 homes in Australia were employed to verify accuracy improvements derived from augmenting the new hybrid SOM and K-means algorithm techniques into the existing Autoflow v2.1 software. The water end use event categorization accuracy ranged from 86 to 94.2% for the enhanced model (Autoflow v3.1), which was a 1.7 to 9% improvement on event categorization.
Chen T., Guestrin C.
2016-08-13 citations by CoLab: 24205 Abstract  
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Cominola A., Giuliani M., Piga D., Castelletti A., Rizzoli A.E.
2015-10-01 citations by CoLab: 218 Abstract  
Over the last two decades, water smart metering programs have been launched in a number of medium to large cities worldwide to nearly continuously monitor water consumption at the single household level. The availability of data at such very high spatial and temporal resolution advanced the ability in characterizing, modeling, and, ultimately, designing user-oriented residential water demand management strategies. Research to date has been focusing on one or more of these aspects but with limited integration between the specialized methodologies developed so far. This manuscript is the first comprehensive review of the literature in this quickly evolving water research domain. The paper contributes a general framework for the classification of residential water demand modeling studies, which allows revising consolidated approaches, describing emerging trends, and identifying potential future developments. In particular, the future challenges posed by growing population demands, constrained sources of water supply and climate change impacts are expected to require more and more integrated procedures for effectively supporting residential water demand modeling and management in several countries across the world. We review high resolution residential water demand modeling studies.We provide a classification of existing technologies and methodologies.We identify current trends, challenges and opportunities for future development.
Nguyen K.A., Stewart R.A., Zhang H., Jones C.
Applied Soft Computing Journal scimago Q1 wos Q1
2015-06-01 citations by CoLab: 65 Abstract  
Smart metering technology enables the capture of high resolution water consumption data.Intelligent algorithms autonomously categorise single and combined water end use events.Hybrid combination of HMM, ANN and DTW for pattern recognition problem.Expert system developed to autonomously disaggregate water use into end use categories. Over half of the world's population will live in urban areas in the next decade, which will impose significant pressure on water security. The advanced management of water resources and their consumption is pivotal to maintaining a sustainable water future. To contribute to this goal, the aim of this study was to develop an autonomous and intelligent system for residential water end-use classification that could interface with customers and water business managers via a user-friendly web-based application. Water flow data collected directly from smart water metres connected to dwellings includes both single (e.g., a shower event occurring alone) and combined (i.e., an event that comprises several overlapping single events) water end use events. The authors recently developed an intelligent application called Autoflow which served as a prototype tool to solve the complex problem of autonomously categorising residential water consumption data into a registry of single and combined events. However, this first prototype application achieved overall recognition accuracy of 85%, which is not sufficient for a commercial application. To improve this accuracy level, a larger dataset consisting of over 82,000 events from over 500 homes in Melbourne and South-east Queensland, Australia, were employed to derive a new single event recognition method employing a hybrid combination of Hidden Markov Model (HMM), Artificial Neural Networks (ANN) and the Dynamic Time Warping (DTW) algorithm. The classified single event registry was then used as the foundations of a sophisticated hybrid ANN-HMM combined event disaggregation module, which was able to strip apart concurrently occurring end use events. The new hybrid model's recognition accuracy ranged from 85.9% to 96.1% for single events and 81.8-91.5% for combined event disaggregation, which was a 4.9% and 8.0% improvement, respectively, when compared to the first prototype model. The developed Autoflow tool has far-reaching implications for enhanced urban water demand planning and management, sustained customer behaviour change through more granular water conservation awareness, and better customer satisfaction with water utility providers.
Makki A.A., Stewart R.A., Beal C.D., Panuwatwanich K.
2015-02-01 citations by CoLab: 78 Abstract  
The purpose of this comprehensive study was to explore the principal determinants of six residential indoor water end-use consumption categories at the household scale (i.e. namely clothes washer, shower, toilet, tap, dishwasher, and bath), and to find an overarching research design and approach for building a residential indoor water end-use demand forecasting model. A mixed method research design was followed to collect both quantitative and qualitative data from 210 households with a total of 557 occupants located in SEQ, Australia, utilising high resolution smart water metering technology, questionnaire surveys, diaries, and household water stock inventory audits. The principal determinants, main drivers, and predictors of residential indoor water consumption for each end-use category were revealed, and forecasting models were developed this study. This was achieved utilising an array of statistical techniques for each of the six end-use consumption categories. Cluster analysis and dummy coding were used to prepare the data for analysis and modelling. Subsequently, independent t-test and independent one-way ANOVA extended into a series of bootstrapped regression models were used to explore the principal determinants of consumption. Successively, a series of Pearson's Chi-Square tests was used to reveal the main drivers of higher water consumption and to determine alternative sets of consumption predictors. Lastly, independent factorial ANOVA extended into a series of bootstrapped multiple regression models was used for the development of alternative forecasting models. Key findings showed that the usage physical characteristics and the demographic and household makeup characteristics are the most significant determinants of all six end-use consumption categories. Further, the appliances/fixtures physical characteristics are significant determinants of all end-use consumption categories except the bath end-use category. Moreover, the socio-demographic characteristics are significant determinants of all end-use consumption categories except the tap and toilet end-use categories. Results also demonstrated that the main drivers of higher end-use water consumption were households with higher frequency and/or longer end-use events which are most likely to be those larger family households with teenagers and children, with higher income, predominantly working occupants, and/or higher educational level. Moreover, a total of 14 forecasting model alternatives for all six end-use consumption categories, as well as three total indoor bottom-up forecasting model alternatives were developed in this study. All of the developed forecasting model alternatives demonstrated strong statistical power, significance of fit, met the generalisation statistical criteria, and were cross-validated utilising an independent validation data set. The paper concludes with a discussion on the most significant determinants, drivers and predictors of water end-use consumption, and outlines the key implications of the research to enhanced urban water planning and policy design.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Found error?