Mendeleev Communications, volume 34, issue 6, pages 780-782

Machine learning-enabled prediction of ecotoxicity (EC50) of diverse organic compounds via infrared spectroscopy

Maksim Yu. Sidorov
Mikhail E. Gasanov
Artur A Dzeranov
Lyubov S Bondarenko
Anastasiya P. Kiryushina
Vera A Terekhova
Kamila A Kydralieva
Publication typeJournal Article
Publication date2024-11-29
scimago Q3
wos Q3
SJR0.332
CiteScore3.0
Impact factor1.8
ISSN09599436, 1364551X
Koshelev D.S.
Applied Spectroscopy scimago Q2 wos Q2
2024-01-28 citations by CoLab: 4 Abstract  
Fourier transform infrared spectroscopy (FT-IR) is a widely used spectroscopic method for routine analysis of substances and compounds. Spectral interpretation of spectra is a labor-intensive process that provides important information about functional groups or bonds present in compounds and complex substances. In this paper, based on deep learning methods of convolutional neural networks, models were developed to determine the presence of 17 classes of functional groups or 72 classes of coupling oscillations in the FT-IR spectra. Using web scanning, the spectra of 14 361 FT-IR spectra of organic molecules were obtained. Several different variants of model architectures with different sizes of feature maps have been tested. Based on the Shapley additive explanations (SHAP) and gradient-weighted class activation mapping (GradCAM) methods, visualization tools have been developed for visualizing and highlighting the areas of absorption bands manifestation for corresponding functional groups or bonds in the spectrum. To determine 17 and 72 classes, the F1-weighted metric, which is the harmonic mean of the class' precision and class' recall weighted by class' fraction, reached 93 and 88%, respectively, when using data on the position of absorption maxima in the spectrum as an additional source layer. The resulting model can be used to facilitate the routine analysis of spectra for all areas such as organic chemistry, materials science, and biology, as well as to facilitate the preparation of the obtained experimental data for publication.
Schür C., Gasser L., Perez-Cruz F., Schirmer K., Baity-Jesi M.
Scientific data scimago Q1 wos Q1 Open Access
2023-10-18 citations by CoLab: 17 PDF Abstract  
AbstractThe use of machine learning for predicting ecotoxicological outcomes is promising, but underutilized. The curation of data with informative features requires both expertise in machine learning as well as a strong biological and ecotoxicological background, which we consider a barrier of entry for this kind of research. Additionally, model performances can only be compared across studies when the same dataset, cleaning, and splittings were used. Therefore, we provide ADORE, an extensive and well-described dataset on acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.
Bondarenko L., Saveliev Y., Chernyaev D., Baimuratova R., Dzhardimalieva G.I., Dzeranov A., Kelbysheva E., Kydralieva K.
2023-05-16 citations by CoLab: 5 Abstract  
This study comprehensively investigates the efficiency of the formulation of tetraethoxysilane (TEOS) and 3-aminopropyltriethoxysilane (APTES) copolymer in sol-gel syntheses as part of a multivariate experiment. A methodology-based response surface was...
Olker J.H., Elonen C.M., Pilli A., Anderson A., Kinziger B., Erickson S., Skopinski M., Pomplun A., LaLone C.A., Russom C.L., Hoff D.
2022-04-26 citations by CoLab: 149 Abstract  
The need for assembled existing and new toxicity data has accelerated as the amount of chemicals introduced into commerce continues to grow and regulatory mandates require safety assessments for a greater number of chemicals. To address this evolving need, the ECOTOXicology Knowledgebase (ECOTOX) was developed starting in the 1980s and is currently the world's largest compilation of curated ecotoxicity data, providing support for assessments of chemical safety and ecological research through systematic and transparent literature review procedures. The recently released version of ECOTOX (Ver 5, www.epa.gov/ecotox) provides single-chemical ecotoxicity data for over 12,000 chemicals and ecological species with over one million test results from over 50,000 references. Presented is an overview of ECOTOX, detailing the literature review and data curation processes within the context of current systematic review practices and discussing how recent updates improve the accessibility and reusability of data to support the assessment, management, and research of environmental chemicals. Relevant and acceptable toxicity results are identified from studies in the scientific literature, with pertinent methodological details and results extracted following well-established controlled vocabularies and newly extracted toxicity data added quarterly to the public website. Release of ECOTOX, Ver 5, included an entirely redesigned user interface with enhanced data queries and retrieval options, visualizations to aid in data exploration, customizable outputs for export and use in external applications, and interoperability with chemical and toxicity databases and tools. This is a reliable source of curated ecological toxicity data for chemical assessments and research and continues to evolve with accessible and transparent state-of-the-art practices in literature data curation and increased interoperability to other relevant resources. Environ Toxicol Chem 2022;41:1520–1539. © 2022 SETAC. This article has been contributed to by US Government employees and their work is in the public domain in the USA.
Cunningham P., Delany S.J.
ACM Computing Surveys scimago Q1 wos Q1
2021-07-13 citations by CoLab: 436 Abstract  
Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier—classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance, because issues of poor runtime performance is not such a problem these days with the computational power that is available. This article presents an overview of techniques for Nearest Neighbour classification focusing on: mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours, and mechanisms for reducing the dimension of the data. This article is the second edition of a paper previously published as a technical report [16]. Sections on similarity measures for time-series, retrieval speedup, and intrinsic dimensionality have been added. An Appendix is included, providing access to Python code for the key methods.
Bondarenko L., Illés E., Tombácz E., Dzhardimalieva G., Golubeva N., Tushavina O., Adachi Y., Kydralieva K.
Nanomaterials scimago Q1 wos Q2 Open Access
2021-05-27 citations by CoLab: 25 PDF Abstract  
Nowadays, numerous researches are being performed to formulate nontoxic multifunctional magnetic materials possessing both high colloidal stability and magnetization, but there is a demand in the prediction of chemical and colloidal stability in water solutions. Herein, a series of silica-coated magnetite nanoparticles (MNPs) has been synthesized via the sol-gel method with and without establishing an inert atmosphere, and then it was tested in terms of humic acids (HA) loading applied as a multifunctional coating agent. The influence of ambient conditions on the microstructure, colloidal stability and HA loading of different silica-coated MNPs has been established. The XRD patterns show that the content of stoichiometric Fe3O4 decreases from 78.8% to 42.4% at inert and ambient atmosphere synthesis, respectively. The most striking observation was the shift of the MNPs isoelectric point from pH ~7 to 3, with an increasing HA reaching up to the reversal of the zeta potential sign as it was covered completely by HA molecules. The zeta potential data of MNPs can be used to predict the loading capacity for HA polyanions. The data help to understand the way for materials’ development with the complexation ability of humic acids and with the insolubility of silica gel to pave the way to develop a novel, efficient and magnetically separable adsorbent for contaminant removal.
Vo A.H., Van Vleet T.R., Gupta R.R., Liguori M.J., Rao M.S.
Chemical Research in Toxicology scimago Q1 wos Q2
2019-10-18 citations by CoLab: 140 Abstract  
Drug toxicity evaluation is an essential process of drug development as it is reportedly responsible for the attrition of approximately 30% of drug candidates. The rapid increase in the number and types of large toxicology data sets together with the advances in computational methods may be used to improve many steps in drug safety evaluation. The development of in silico models to screen and understand mechanisms of drug toxicity may be particularly beneficial in the early stages of drug development where early toxicity assessment can most reduce expenses and labor time. To facilitate this, machine learning methods have been employed to evaluate drug toxicity but are often limited by small and less diverse data sets. Recent advances in machine learning methods together with the rapid increase in big toxicity data such as molecular descriptors, toxicogenomics, and high-throughput bioactivity data may help alleviate some of the current challenges. In this article, the most common machine learning methods used in toxicity assessment are reviewed together with examples of toxicity studies that have used machine learning methodology. Furthermore, a comprehensive overview of the different types of toxicity tools and data sets available to build in silico toxicity prediction models has been provided to give an overview of the current big toxicity data landscape and highlight opportunities and challenges related to them.
Carvalho D.V., Pereira E.M., Cardoso J.S.
Electronics (Switzerland) scimago Q2 wos Q2 Open Access
2019-07-26 citations by CoLab: 958 PDF Abstract  
Machine learning systems are becoming increasingly ubiquitous. These systems’s adoption has been expanding, accelerating the shift towards a more algorithmic society, meaning that algorithmically informed decisions have greater potential for significant social impact. However, most of these accurate decision support systems remain complex black boxes, meaning their internal logic and inner workings are hidden to the user and even experts cannot fully understand the rationale behind their predictions. Moreover, new regulations and highly regulated domains have made the audit and verifiability of decisions mandatory, increasing the demand for the ability to question, understand, and trust machine learning systems, for which interpretability is indispensable. The research community has recognized this interpretability problem and focused on developing both interpretable models and explanation methods over the past few years. However, the emergence of these methods shows there is no consensus on how to assess the explanation quality. Which are the most suitable metrics to assess the quality of an explanation? The aim of this article is to provide a review of the current state of the research field on machine learning interpretability while focusing on the societal impact and on the developed methods and metrics. Furthermore, a complete literature review is presented in order to identify future directions of work on this field.
Pukalchik M.A., Katrutsa A.M., Shadrin D., Terekhova V.A., Oseledets I.V.
Journal of Soils and Sediments scimago Q1 wos Q2
2019-01-23 citations by CoLab: 12 Abstract  
The full understanding of the effect of mineral waste-based fertilizer in soil is still unrelieved, because of the extreme complex chemical composition and plethora of their action pathways. The purposes of this paper is to quantify the input of PG into the soil ecosystem process, considering the direct effects of PG as a whole on soil environment using of a plethora of chemical, toxicological, and biological tests. Greenhouse experiment includes different PG doses (0, 1%, 3%, 7.5%, 15%, 25%, and 40%) and two-time collection points after treatments—7 and 28 days. For each treatment and each time collection point, we measure (i) soil pH, bioavailable (H20 and NH4COOH-extractable) element content (S, P, K, Na, Mg, Ca, Fe, Zn, Sr, Ba, F); (ii) soil enzyme activities—dehydrogenase, urease, acid phosphatase, FDA; (iii) soil CO2 respiration activity with and without glucose addition; (iv) Eisenia fetida, Sinapis alba, and Avena sativa responses. Finally, we combine the ordinary chemical, toxicology, and biological measuring of soil properties with state-of-the-art mathematical analysis, namely (i) support vector machines (used for prediction), (ii) mutual information test (variable importance tasks), (iii) t-SNE and LLE algorithms (used for unsupervised classification). The results show similarity between the 0%, 1%, and 3% PG treatments in all collection times based on the toxicological and biological properties. Beyond 7.5% PG, some biological test was significantly inhibited in response to trace element stress. Among all tested parameters, soil urease activities, soil respiration activities after glucose addition, S. alba root lengths, and E. fetida survival rates show sensitivity to PG addition. Furthermore, the machine learning algorithms revealed that only several elements (mobile and water-soluble forms of Ca, Ba, Sr, S, and Na; water-soluble F) could be responsible to elevated soil toxicity for those indicators. SVR models were able to predict soil biological and ecotoxicity properties, and increasing numbers of randomly selected training examples from 50 to 90% of initial experimental data significantly improved model performance. At this study, we demonstrate benefits of unsupervised machine learning methods for investigating toxicity of man-made substances in soil that can be further applied to risk assessments of various toxins, which are of significant interest to environmental protection.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?