Accreditation and Quality Assurance

Value assignment and uncertainty evaluation for certified reference gas mixtures

Publication typeJournal Article
Publication date2024-09-09
scimago Q3
wos Q4
SJR0.274
CiteScore1.8
Impact factor0.8
ISSN09491775, 14320517
Abstract

The procedures used to assign values to certified reference gas mixtures and to evaluate their associated uncertainties, which are described in ISO 6143, and that were variously improved by Guenther and Possolo (Anal Bioanal Chem 399:489–500, 2011. 10.1007/s00216-010-4379-z), are further enhanced by the following developments: (i) evaluating and propagating uncertainty contributions derived from comparisons with historical reference gas mixtures of similar nominal composition; (ii) recognizing and quantifying mutual inconsistency (dark uncertainty) between primary standard gas mixtures used for calibration; (iii) employing Bayesian procedures for calibration, value assignment, and uncertainty evaluations; and (iv) employing state-of-the-art methods of meta-analysis to combine cylinder-specific measurement results. These developments are illustrated in examples of certification of two gas mixture Standard Reference Materials developed by the National Institute of Standards and Technology (NIST, USA). These examples serve only to demonstrate the methods described in this contribution and do not replace any official measurement results delivered in the certificates of any reference materials developed by NIST.

Meija J., Bodnar O., Possolo A.
Metrologia scimago Q2 wos Q3
2023-09-29 citations by CoLab: 6 Abstract  
Abstract Bayesian statistical methods are being used increasingly often in measurement science, similarly to how they now pervade all the sciences, from astrophysics to climatology, and from genetics to social sciences. Within metrology, the use of Bayesian methods is documented in peer-reviewed publications that describe the development of certified reference materials or the characterization of CCQM key comparison reference values and the associated degrees of equivalence.

This contribution reviews Bayesian concepts and methods, and provides guidance for how they can be used in measurement science, illustrated with realistic examples of application. In the process, this review also provides compelling evidence to the effect that the Bayesian approach offers unparalleled means to exploit all the information available that is relevant to rigorous and reliable measurement. The Bayesian outlook streamlines the interpretation of uncertainty evaluations, aligning their meaning with how they are perceived intuitively: not as promises about performance in the long run, but as expressions of documented and justified degrees of belief about the truth of specific conclusions supported by empirical evidence.

This review also demonstrates that the Bayesian approach is practicable using currently available modeling and computational techniques, and, most importantly, that measurement results obtained using Bayesian methods, and predictions based on Bayesian models, including the establishment of metrological traceability, are amenable to empirical validation, no less than when classical statistical methods are used for the same purposes.

Our goal is not to suggest that everything in metrology should be done in a Bayesian way. Instead, we aim to highlight applications and kinds of metrological problems where Bayesian methods shine brighter than the classical alternatives, and deliver results that any classical approach would be hard-pressed to match.
Lang B.E., Molloy J.L., Vetter T.W., Kotoski S.P., Possolo A.
2023-02-27 citations by CoLab: 2 Abstract  
The National Institute of Standards and Technology, which is the national metrology institute of the USA, assigns certified values to the mass fractions of individual elements in single-element solutions, and to the mass fractions of anions in anion solutions, based on gravimetric preparations and instrumental methods of analysis. The instrumental method currently is high-performance inductively coupled plasma optical emission spectroscopy for the single-element solutions, and ion chromatography for the anion solutions. The uncertainty associated with each certified value comprises method-specific components, a component reflecting potential long-term instability that may affect the certified mass fraction during the useful lifetime of the solutions, and a component from between-method differences. Lately, the latter has been evaluated based only on the measurement results for the reference material being certified. The new procedure described in this contribution blends historical information about between-method differences for similar solutions produced previously, with the between-method difference observed when a new material is characterized. This blending procedure is justified because, with only rare exceptions, the same preparation and measurement methods have been used historically: in the course of almost 40 years for the preparation methods, and of 20 years for the instrumental methods. Also, the certified values of mass fraction, and the associated uncertainties, have been very similar, and the chemistry of the solutions also is closely comparable within each series of materials. If the new procedure will be applied to future SRM lots of single-element or anion solutions routinely, then it is expected that it will yield relative expanded uncertainties that are about 20 % smaller than the procedure for uncertainty evaluation currently in use, and that it will do so for the large majority of the solutions. However, more consequential than any reduction in uncertainty, is the improvement in the quality of the uncertainty evaluations that derives from incorporating the rich historical information about between-method differences and about the stability of the solutions over their expected lifetimes. The particular values listed for several existing SRMs are given merely as retrospective illustrations of the application of the new method, not to suggest that the certified values or their associated uncertainties should be revised.
Viallon J., Choteau T., Flores E., Idrees F., Moussay P., Wielgosz R.I., Lim J.S., Lee J., Lee J., Moon D., Wijk J.I., Persijn S., Veen A.M., Efremova O.V., Konopelko L., et. al.
Metrologia scimago Q2 wos Q3
2023-01-01 citations by CoLab: 9 Abstract  
Main text The key comparison CCQM-K68.2019 was aimed at evaluating the level of comparability of laboratories' capabilities for preparing nitrous oxide in air primary reference mixtures at ambient amount fractions, in the range 320 nmol mol−1 to 350 nmol mol−1. The comparison was coordinated by the BIPM and the KRISS. It consisted in the simultaneous comparison of a suite of 2n primary gas standards, two standards to be prepared by each of the n participating laboratories. Two independent analytical methods were used by the BIPM to analyse the amount fraction of N2O in air, namely Gas Chromatography with an Electron Capture Detector (GC−ECD) and Quantum Cascade Laser Absorption Spectroscopy (QCLAS). Since the circulation of the Draft A report in April 2021, four meetings took place with participants to discuss the mathematical treatment of the comparisons results, and several models were proposed. The model chosen by participants is the Bayesian Errors−In−Variables regression with shades of dark uncertainty. In this final report, the Key Comparison Reference Values were obtained with this model, with calculations performed by B Toman and A Possolo. The key comparison CCQM-K68.2019 is considered to present an analytical challenge and therefore classified as Track C comparison in the CCQM nomenclature. To reach the main text of this paper, click on Final Report. Note that this text is that which appears in Appendix B of the BIPM key comparison database https://www.bipm.org/kcdb/. The final report has been peer-reviewed and approved for publication by the CCQM, according to the provisions of the CIPM Mutual Recognition Arrangement (CIPM MRA).
Cecelski C.E., Toman B., Liu F., Meija J., Possolo A.
Metrologia scimago Q2 wos Q3
2022-06-16 citations by CoLab: 5 Abstract  
Abstract A model for errors-in-variables regression is described that can be used to overcome the challenge posed by mutually inconsistent calibration data. The model and its implementation are illustrated in applications to the measurement of the amount fraction of oxygen in nitrogen from key comparison CCQM-K53, and of carbon isotope delta values in steroids from human urine. These two examples clearly demonstrate that inconsistencies in measurement results can be addressed similarly to how laboratory effects are often invoked to deal with mutually inconsistent results from interlaboratory studies involving scalar measurands. Bayesian versions of errors-in-variables regression, fitted via Markov Chain Monte Carlo sampling, are employed, which yield estimates of the key comparison reference function in one example, and of the analysis function in the other. The fitting procedures also characterize the uncertainty associated with these functions, while quantifying and propagating the ‘excess’ dispersion that was unrecognized in the uncertainty budgets for the individual measurements, and that therefore is missing from the reported uncertainties. We regard this ‘excess’ dispersion as an expression of dark uncertainty, which we take into account in the context of calibrations that involve regression models. In one variant of the model the estimate of dark uncertainty is the same for all the participants in the comparison, while in another variant different amounts of dark uncertainty are assigned to different participants. We compare these models with the conventional errors-in-variables model underlying the procedure that ISO 6143 recommends for building analysis functions. Applications of this procedure are often preceded by the selection of a subset of the measurement results deemed to be mutually consistent, while the more discrepant ones are set aside. This new model is more inclusive than the conventional model, in that it easily accommodates measurement results that are mutually inconsistent. It produces results that take into account contributions from all apparent sources of uncertainty, regardless of whether these sources are already understood and their contributions have been included in the reported uncertainties, or still require investigation after they will have been detected and quantified.
Possolo A., Koepke A., Newton D., Winchester M.R.
2021-04-27 citations by CoLab: 14 Abstract  
This contribution describes a Decision Tree intended to guide the selection of statistical models and data reduction procedures in key comparisons (KCs). The Decision Tree addresses a specific need of the Inorganic Analysis Working Group (IAWG) of the Consultative Committee (CC) for Amount of Substance, Metrology in Chemistry and Biology (CCQM), of the International Committee for Weights and Measures (CIPM), and it is likely to address similar needs of other working groups and consultative committees. Because the portfolio of KCs previously organized by the CCQM-IAWG affords a full range of opportunities to demonstrate the capabilities of the Decision Tree, the majority of the illustrative examples of application of the Decision Tree are from this working group. However, the Decision Tree is widely applicable in other areas of metrology, as illustrated in examples of application to measurements of radionuclides and of the efficiency of a thermistor power sensor. The Decision Tree is intended for use after choices will have been made about the measurement results that qualify for inclusion in the calculation of the key comparison reference value (KCRV), and about the measurement results for which degrees of equivalence should be produced. Both these choices should be based on substantive considerations, not on purely statistical criteria. However, the Decision Tree does not require that the measurement results selected for either purpose be mutually consistent. The Decision Tree should be used as a guide, not as the sole and autonomous determinant of the model that should be selected for the measurement results obtained in a KC, or of the procedure that should be employed to reduce these results. The scientists running the KCs ultimately have the freedom and responsibility to make the corresponding choices that they deem most appropriate and that best fit the purpose of each KC. The Decision Tree involves three statistical tests, and comprises five terminal leaves, which correspond to as many alternative ways in which the KCRV, its associated uncertainty, and the degrees of equivalence (DoEs) may be computed. This contribution does not purport to suggest that any of the KCRVs, associated uncertainties, or DoEs, presented in previously approved final reports issued by working groups of the CCs should be modified. Neither do the alternative results question existing, demonstrated calibration and measurement capabilities (CMCs), nor do they support any new CMCs.
Beauchamp C.R., Camara J.E., Carney J., Choquette S.J., Cole K.D., DeRose P.C., Duewer D.L., Epstein M.S., Kline M.C., Lippa K.A., Lucon E., Phinney K.W., Polakoski M., Possolo A., Sharpless K.E., et. al.
2020-07-15 citations by CoLab: 14
Vehtari A., Gelman A., Simpson D., Carpenter B., Bürkner P.
Bayesian Analysis scimago Q1 wos Q1 Open Access
2020-07-04 citations by CoLab: 732 Abstract  
Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic $\widehat{R}$ of Gelman and Rubin (1992) has serious flaws. Traditional $\widehat{R}$ will fail to correctly diagnose convergence failures when the chain has a heavy tail or when the variance varies across the chains. In this paper we propose an alternative rank-based diagnostic that fixes these problems. We also introduce a collection of quantile-based local efficiency measures, along with a practical approach for computing Monte Carlo error estimates for quantiles. We suggest that common trace plots should be replaced with rank plots from multiple chains. Finally, we give recommendations for how these methods should be used in practice.
Koepke A., Lafarge T., Possolo A., Toman B.
Metrologia scimago Q2 wos Q3
2017-05-11 citations by CoLab: 66 Abstract  
Interlaboratory studies in measurement science, including key comparisons, and meta-analyses in several fields, including medicine, serve to intercompare measurement results obtained independently, and typically produce a consensus value for the common measurand that blends the values measured by the participants. Since interlaboratory studies and meta-analyses reveal and quantify differences between measured values, regardless of the underlying causes for such differences, they also provide so-called top-down evaluations of measurement uncertainty. Measured values are often substantially over-dispersed by comparison with their individual, stated uncertainties, thus suggesting the existence of yet unrecognized sources of uncertainty (dark uncertainty). We contrast two different approaches to take dark uncertainty into account both in the computation of consensus values and in the evaluation of the associated uncertainty, which have traditionally been preferred by different scientific communities. One inflates the stated uncertainties by a multiplicative factor. The other adds laboratory-specific effects to the value of the measurand. After distinguishing what we call recipe-based and model-based approaches to data reductions in interlaboratory studies, we state six guiding principles that should inform such reductions. These principles favor model-based approaches that expose and facilitate the critical assessment of validating assumptions, and give preeminence to substantive criteria to determine which measurement results to include, and which to exclude, as opposed to purely statistical considerations, and also how to weigh them. Following an overview of maximum likelihood methods, three general purpose procedures for data reduction are described in detail, including explanations of how the consensus value and degrees of equivalence are computed, and the associated uncertainty evaluated: the DerSimonian-Laird procedure; a hierarchical Bayesian procedure; and the Linear Pool. These three procedures have been implemented and made widely accessible in a Web-based application (NIST Consensus Builder). We illustrate principles, statistical models, and data reduction procedures in four examples: (i) the measurement of the Newtonian constant of gravitation; (ii) the measurement of the half-lives of radioactive isotopes of caesium and strontium; (iii) the comparison of two alternative treatments for carotid artery stenosis; and (iv) a key comparison where the measurand was the calibration factor of a radio-frequency power sensor.
Carpenter B., Gelman A., Hoffman M.D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., Riddell A.
Journal of Statistical Software scimago Q1 wos Q1 Open Access
2017-01-11 citations by CoLab: 4544
Hoaglin D.C.
Statistics in Medicine scimago Q1 wos Q1
2015-08-24 citations by CoLab: 223 Abstract  
Many meta-analyses report using 'Cochran's Q test' to assess heterogeneity of effect-size estimates from the individual studies. Some authors cite work by W. G. Cochran, without realizing that Cochran deliberately did not use Q itself to test for heterogeneity. Further, when heterogeneity is absent, the actual null distribution of Q is not the chi-squared distribution assumed for 'Cochran's Q test'. This paper reviews work by Cochran related to Q. It then discusses derivations of the asymptotic approximation for the null distribution of Q, as well as work that has derived finite-sample moments and corresponding approximations for the cases of specific measures of effect size. Those results complicate implementation and interpretation of the popular heterogeneity index I(2) . Also, it turns out that the test-based confidence intervals used with I(2) are based on a fallacious approach. Software that outputs Q and I(2) should use the appropriate reference value of Q for the particular measure of effect size and the current meta-analysis. Q is a key element of the popular DerSimonian-Laird procedure for random-effects meta-analysis, but the assumptions of that procedure and related procedures do not reflect the actual behavior of Q and may introduce bias. The DerSimonian-Laird procedure should be regarded as unreliable.
Bates D., Mächler M., Bolker B., Walker S.
Journal of Statistical Software scimago Q1 wos Q1 Open Access
2015-02-27 citations by CoLab: 58346 Abstract  
Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.
Thompson M., Ellison S.L.
2011-07-12 citations by CoLab: 103 Abstract  
Standard uncertainties obtained by the GUM approach for a range of analytical methods are compared directly and indirectly with estimates of reproducibility standard deviation for the same methods. Results were obtained from both routine analysis and international key comparisons. A general tendency for the uncertainty to be substantially less than the reproducibility standard deviation was found.
Guenther F.R., Possolo A.
2010-11-12 citations by CoLab: 19 Abstract  
The weighted least squares method to build an analysis function described in ISO 6143, Gas analysis—Comparison methods for determining and checking the composition of calibration gas mixtures, is modified to take into account the typically small number of instrumental readings that are obtained for each primary standard gas mixture used in calibration. The theoretical basis for this modification is explained, and its superior performance is illustrated in a simulation study built around a concrete example, using real data. The corresponding uncertainty assessment is obtained by application of a Monte Carlo method consistent with the guidance in the Supplement 1 to the Guide to the expression of uncertainty in measurement, which avoids the need for two successive applications of the linearizing approximation of the conventional method for uncertainty propagation. The three main steps that NIST currently uses to certify a reference gas mixture (homogeneity study, calibration, and assignment of value and uncertainty assessment), are described and illustrated using data pertaining to an actual standard reference material.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?