Journal of Chemometrics, volume 39, issue 3

Multimodal Stacked Modeling for Simultaneous Detection of Nutrient Concentrations With Turbidity Correction

Publication typeJournal Article
Publication date2025-02-17
scimago Q3
SJR0.383
CiteScore5.2
Impact factor1.9
ISSN08869383, 1099128X
Abstract
ABSTRACT

In this paper, an innovative method for the simultaneous determination of nitrite, nitrate, and COD in water in the presence of turbidity as a source of noise in spectroscopic data has been investigated. UV–Vis absorption spectrometry and advanced machine learning are proposed to develop a stacking model, a sophisticated modeling approach that combines several basic models (PLS, Lasso, and Ridge regression) and a meta‐regressor (Random Forest regressor) to improve prediction accuracy by incorporating baseline correction and principal component analysis (PCA) to mitigate the effects of turbidity on spectroscopic data. After applying these corrections, a significant improvement was observed: The root mean square error (RMSE) and the mean absolute error (MAE) were significantly reduced, and the correlation coefficient (R2) between predicted and actual values of nitrite, nitrate, COD, and turbidity was greater than 0.96, for all compounds in the test data set, that demonstrate the ability of the proposed stacking model to accurately predict nutrient concentrations simultaneously, even in complex environments; the proposed model may provide a valuable alternative to wet chemical methods. Due to its high accuracy and fast response, the proposed model can be used as an algorithm for the construction of nutrient sensors. This paper highlights the importance of integrating advanced modeling and data correction techniques to improve the robustness and accuracy of predictive models in environmental chemistry, thus providing valuable information for environmental monitoring and management.

Özen F.
Heliyon scimago Q1 wos Q1 Open Access
2024-02-08 citations by CoLab: 11 Abstract  
Abstract During pandemic periods, there is an intense flow of patients to hospitals. Depending on the disease, many patients may require hospitalization. In some cases, these patients must be taken to intensive care units and emergency interventions must be performed. However, finding a sufficient number of hospital beds or intensive care units during pandemic periods poses a big problem. In these periods, fast and effective planning is more important than ever. Another problem experienced during pandemic periods is the burial of the dead in case the number of deaths increases. This is also a situation that requires due planning. We can learn some lessons from Covid 19 pandemic and be prepared for the future ones. In this paper, statistical properties of the daily cases and daily deaths in Turkey, which is one of the most affected countries by the pandemic in the World, are studied. It is found that the characteristics are nonstationary. Then, random forest regression is applied to predict Covid-19 daily cases and deaths. In addition, seven other machine learning models, namely bagging, AdaBoost, gradient boosting, XGBoost, decision tree, LSTM and ARIMA regressors are built for comparison. The performance of the models are measured using accuracy, coefficient of variation, root-mean-square score and relative error metrics. When random forest regressors are employed, test data related to daily cases are predicted with an accuracy of 92.30% and with an r2 score of 0.9893. Besides, daily deaths are predicted with an accuracy of 91.39% and with an r2 score of 0.9834. The closest rival in predictions is the bagging regressor. Nevertheless, the results provided by this algoritm changed in different runs and this fact is shown in the study, as well. Comparisons are based on test data. Comparisons with the earlier works are also provided.
Fan R., Wang S., Chen H.
Analytical Methods scimago Q2 wos Q1
2023-09-27 citations by CoLab: 5 Abstract  
In recent years, ultraviolet-visible spectrometry has been widely used to measure sewage's Chemical oxygen demand (COD). However, most methods that use UV-vis spectroscopy for COD measurement have not eliminated the...
NINI M., Khoumri E.M., Ait Layachi O., Nohair M.
Journal of Chemometrics scimago Q3 wos Q1
2023-06-22 citations by CoLab: 3 Abstract  
AbstractThe determination of nitrite concentration is crucial due to its toxicity. A novel model has been developed to accurately determine nitrite concentration within the non‐linear range, utilizing the Zambelli method. Previously, techniques for measure nitrite concentration were primarily restricted to the linear range. This new method employs UV‐Visible absorption spectra and correlated component regression (CCR) to determine nitrite concentration within the range of 0.27–11.34 ppm. A wavelength selection strategy in conjunction with partial least squares (PLS) was implemented prior to applying CCR. The spectral data underwent pre‐processing using standard normal variant (SNV) and Savitzky Golay (SG) techniques, and a backward selection (BS) strategy with PLS was applied to select wavelengths. The 15 most sensitive wavelengths, determined through the RMSECV criterion, were utilized to create a PLS model within the range 377–497 nm, resulting in a model with R2C = 0.9999 and R2CV = 0.9999, RMSEC = 0.006, RMSECV = 0.027. A CCR model was then established using the 15selected wavelengths and nitrite concentration. The results yielded strong correlation between predicted and measured nitrite values with R2C = 0.9996, RMSEC = 4.7491 E‐15, RMSECV = 0.0004, and MAPE = 0.68%. The method has been validated through an accuracy profile, which demonstrates that 80% of future results will fall within the 10% acceptability limit within the validation range of 1.30–8.83 mg/L.
Zhang H., Zhou X., Tao Z., Lv T., Wang J.
2022-09-09 citations by CoLab: 7 PDF Abstract  
Ultraviolet-visible spectroscopy is an effective tool for reagent-free qualitative analysis and quantitative detection of water parameters. Suspended particles in water cause turbidity that interferes with the ultraviolet-visible spectrum and ultimately affects the accuracy of water parameter calculations. This paper proposes a deep learning method to compensate for turbidity interference and obtain water parameters using a partial least squares regression approach. Compared with orthogonal signal correction and extended multiplicative signal correction methods, the deep learning method specifically utilizes an accurate one-dimensional U-shape neural network (1D U-Net) and represents the first method enabling turbidity compensation in sampling real river water of agricultural catchments. After turbidity compensation, the R2 between the predicted and true values increased from 0.918 to 0.965, and the RMSE (Root Mean Square Error) value decreased from 0.526 to 0.343 mg. Experimental analyses showed that the 1D U-Net is suitable for turbidity compensation and provides accurate results.
Wang H., Ju A., Wang L.
Molecules scimago Q1 wos Q2 Open Access
2021-06-16 citations by CoLab: 22 PDF Abstract  
A direct, reagent-free, ultraviolet spectroscopic method for the simultaneous determination of nitrate (NO3−), nitrite (NO2−), and salinity in seawater is presented. The method is based on measuring the absorption spectra of the raw seawater range of 200–300 nm, combined with partial least squares (PLS) regression for resolving the spectral overlapping of NO3−, NO2−, and sea salt (or salinity). The interference from chromophoric dissolved organic matter (CDOM) UV absorbance was reduced according to its exponential relationship between 275 and 295 nm. The results of the cross-validation of calibration and the prediction sets were used to select the number of factors (4 for NO3−, NO2−, and salinity) and to optimize the wavelength range (215–240 nm) with a 1 nm wavelength interval. The linear relationship between the predicted and the actual values of NO3−, NO2−, salinity, and the recovery of spiked water samples suggest that the proposed PLS model can be a valuable alternative method to the wet chemical methods. Due to its simplicity and fast response, the proposed PLS model can be used as an algorithm for building nitrate and nitrite sensors. The comparison study of PLS and a classic least squares (CLS) model shows both PLS and CLS can give satisfactory results for predicting NO3− and salinity. However, for NO2− in some samples, PLS is superior to CLS, which may be due to the interference from unknown substances not included in the CLS algorithm. The proposed method was applied to the analysis of NO3−, NO2−, and salinity in the Changjiang (Yangtze River) estuary water samples and the results are comparable with that determined by the colorimetric Griess assay.
Chen X., Yin G., Zhao N., Gan T., Yang R., Xia M., Feng C., Chen Y., Huang Y.
2021-01-01 citations by CoLab: 41 Abstract  
In this paper, a new method for simultaneous determination of nitrate, COD and turbidity in water based on UV–Vis absorption spectrometry combined with interval analysis was studied. By analyzing the spectral absorption characteristics of nitrate, COD, and turbidity standard solutions and the mixtures of them, the absorption spectra in the range of 225–260 nm, 260–320 nm and 320–700 nm were selected as the characteristic spectra of nitrate, COD and turbidity, respectively. Multiplicative scatter correction was employed to compensate turbidity of the absorption spectra of the mixture solutions in the wavelength range of 225–320 nm. Then, the spectra after turbidity compensation in the range of 225–260 nm was compensated for COD using the method of spectral difference. The original spectra in the range of 320–700 nm, the turbidity compensated spectra in the range of 260–320 nm, and the COD compensated spectra in the range of 225–260 nm were analyzed by PLS algorithm in order to calculate the concentrations of nitrate, COD and turbidity in the mixture solutions. The results showed that this method could simultaneously and accurately determine the concentrations of nitrate, COD and turbidity. After interval analysis, all the correlation coefficients (R 2 ) between the predicted values and the true values of nitrate, COD and turbidity were higher than 0.9, and root mean square error (RMSE) of predicted values were between 0.696 and 2.337. • Simultaneous determination of nitrate, COD and turbidity in water. • A novel use of interval analysis method in UV-Vis absorption spectroscopy. • Turbidity compensation of MSC and COD compensation of spectral difference are used to reduce cross sensitivity.
Wang C., Li W., Huang M.
2019-12-01 citations by CoLab: 33 Abstract  
In this paper, a new method for high precision and wide range measurement of chemical oxygen demand (COD) based on ultraviolet absorption spectroscopy without reagent is proposed. The reasons for limiting measurement range and the main factors affecting the measurement accuracy are analyzed. A novel method of selecting different calibration wavelengths according to COD value to expand the measurement range and the turbidity compensation strategy based on full-spectrum data analysis to improve accuracy in wide measurement range are proposed and realized by an automatic wavelength selection calibration algorithm. Experiments were conducted to verify our idea with the self-developed micro UV–vis spectrophotometer (with a spectral range of 200–750 nm and a resolution of 5 nm) with a 10mm-path-length sample cell. By comparing various algorithms, the Savitzky-Golay (SG) convolution smoothing algorithm and the orthogonal signal correction (OSC) algorithm were chosen to compensate the additional absorption owing to turbidity. The experimental results show that the algorithm can automatically select the optimal characteristic wavelengths according to the spectral data and the measurement range of COD is enormously expanded from 10–150 mg/L to 1–1000 mg/L. The linear correlation coefficient (R2) of model is above 0.9995 and the relative error of measurement is less than 5%. This method can be used for in-situ and online COD measurement.
Ward M., Jones R., Brender J., de Kok T., Weyer P., Nolan B., Villanueva C., van Breda S.
2018-07-23 citations by CoLab: 892 PDF Abstract  
Nitrate levels in our water resources have increased in many areas of the world largely due to applications of inorganic fertilizer and animal manure in agricultural areas. The regulatory limit for nitrate in public drinking water supplies was set to protect against infant methemoglobinemia, but other health effects were not considered. Risk of specific cancers and birth defects may be increased when nitrate is ingested under conditions that increase formation of N-nitroso compounds. We previously reviewed epidemiologic studies before 2005 of nitrate intake from drinking water and cancer, adverse reproductive outcomes and other health effects. Since that review, more than 30 epidemiologic studies have evaluated drinking water nitrate and these outcomes. The most common endpoints studied were colorectal cancer, bladder, and breast cancer (three studies each), and thyroid disease (four studies). Considering all studies, the strongest evidence for a relationship between drinking water nitrate ingestion and adverse health outcomes (besides methemoglobinemia) is for colorectal cancer, thyroid disease, and neural tube defects. Many studies observed increased risk with ingestion of water nitrate levels that were below regulatory limits. Future studies of these and other health outcomes should include improved exposure assessment and accurate characterization of individual factors that affect endogenous nitrosation.
Granato D., Santos J.S., Escher G.B., Ferreira B.L., Maggio R.M.
2018-02-01 citations by CoLab: 706 Abstract  
Background The development of statistical software has enabled food scientists to perform a wide variety of mathematical/statistical analyses and solve problems. Therefore, not only sophisticated analytical methods but also the application of multivariate statistical methods have increased considerably. Herein, principal component analysis (PCA) and hierarchical cluster analysis (HCA) are the most widely used tools to explore similarities and hidden patterns among samples where relationship on data and grouping are until unclear. Usually, larger chemical data sets, bioactive compounds and functional properties are the target of these methodologies. Scope and approach In this article, we criticize these methods when correlation analysis should be calculated and results analyzed. Key findings and conclusions The use of PCA and HCA in food chemistry studies has increased because the results are easy to interpret and discuss. However, their indiscriminate use to assess the association between bioactive compounds and in vitro functional properties is criticized as they provide a qualitative view of the data. When appropriate, one should bear in mind that the correlation between the content of chemical compounds and bioactivity could be duly discussed using correlation coefficients.
Li J., Luo G., He L., Xu J., Lyu J.
2017-10-30 citations by CoLab: 140 Abstract  
Chemical oxygen demand (COD) is a critical analytical parameter for water quality assessment. COD represents the degree of organic pollution in water bodies. However, the standard analytical methods for COD are time-consuming and possess low oxidation efficiency, chloride interference, and severe secondary pollution. Works performed during the last two decades have resulted in several technologies, including modified standard methods (e.g., microwave-assisted method) and new technologies or methods (e.g., electro- and photo-oxidative methods based on advanced oxidation processes) that are less time-consuming, environment friendly, and more reliable. This review is devoted in analyzing the technical features of the principal methods described in the literature to compare their performances (i.e., measuring window, reliability, and robustness) and identify the advantages and disadvantages of each method.
Singh B., Sihag P., Singh K.
2017-07-04 citations by CoLab: 169 Abstract  
In this paper, Infiltration rate of the soil is investigated by using predictive models of Random forest regression and their performance were compared with Artificial neural network (ANN) and M5P model tree techniques. A dataset consists of 132 field measurements were used. Out of 132 observations randomly selected 88 observations were used for training, whereas remaining 44 were used for testing the model. Input variables consist of cumulative time (Tf), type of impurities (It), concentration of impurities (Ci), and moisture content (Wc) whereas the infiltration rate was considered as output. Correlation coefficient (CC), root mean square error (RMSE), mean absolute error (MAE), relative absolute error (RAE) and root relative square error (RRSE) were considered to compare the performance the both modelling approaches. The result of evolution suggests that Random forest regression approach works well than the other two models (ANN and M5P model tree). The estimated value of infiltration rate using Random forest regression lies within ±25% error lines. Sensitivity analysis suggests that cumulative time is an important parameter for predicting the infiltration rate of the soil.
Lopez-Ferber M., Lin L., Jauzein V., Pérot J., Carré E.
Water Science and Technology scimago Q2 wos Q2 Open Access
2017-04-22 citations by CoLab: 33 Abstract  
The aim of this study is to investigate the potential of ultraviolet/visible (UV/Vis) spectrometry as a complementary method for routine monitoring of reclaimed water production. Robustness of the models and compliance of their sensitivity with current quality limits are investigated. The following indicators are studied: total suspended solids (TSS), turbidity, chemical oxygen demand (COD) and nitrate. Partial least squares regression (PLSR) is used to find linear correlations between absorbances and indicators of interest. Artificial samples are made by simulating a sludge leak on the wastewater treatment plant and added to the original dataset, then divided into calibration and prediction datasets. The models are built on the calibration set, and then tested on the prediction set. The best models are developed with: PLSR for COD (Rpred2 = 0.80), TSS (Rpred2 = 0.86) and turbidity (Rpred2 = 0.96), and with a simple linear regression from absorbance at 208 nm (Rpred2 = 0.95) for nitrate concentration. The input of artificial data significantly enhances the robustness of the models. The sensitivity of the UV/Vis spectrometry monitoring system developed is compatible with quality requirements of reclaimed water production processes.
Wu Decao 吴., Wei Biao 魏.彪., Tang Ge 汤.戈., Feng Peng 冯.鹏., Tang Yuan 唐.媛., Liu Juan 刘.娟., Xiong Shuangfei 熊.
Acta Optica Sinica scimago Q3 wos Q3
2017-02-13 citations by CoLab: 8
Pereira J.M., Basto M., Silva A.F.
2016-07-17 citations by CoLab: 84 Abstract  
The prediction of corporate bankruptcy is a phenomenon of interest to investors, creditors, borrowing firms, and governments alike. Many quantitative methods and distinct variable selection techniques have been employed to develop empirical models for predicting corporate bankruptcy. For the present study the lasso and ridge approaches were undertaken, since they deal well with multicolinearity and display the ideal properties to minimize the numerical instability that may occur due to overfitting. The models were employed to a dataset of 2032 non-bankrupt firms and 401 bankrupt firms belonging to the hospitality industry, over the period 2010-2012. The results showed that the lasso and ridge models tend to favor the category of the dependent variable that appears with heavier weight in the training set, when compared to the stepwise methods implemented in SPSS.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?