Assessing machine learning models

As an example of how Sagitto evaluates different calibration models - and for a bit of fun - we've created an NIR model that has the potential to detect adulteration of durum wheat flour with plain bread flour.

Durum Wheat

Durum wheat (Triticum turgidum subsp. durum) is the second most cultivated species of wheat after common wheat, although it represents only 5% to 8% of global wheat production. Durum in Latin means 'hard', and this species is the hardest of all wheats. This makes it ideal for pasta production, by ensuring that the shape of the pasta is retained after cooking.

Sagitto builds accurate calibration models for grains and flours — Amber durum wheat has a rich amber colour, and flour made from it is cream rather than white.

Data Collection

Sagitto measures semola remacinata adulteration — The two flours used in this experiment

For this experiment, we purchased Caputo Semola Rimacinata durum wheat flour, and Woolworths Plain flour. From this we filled 23 small plastic bags of flour, each weighing 50 gm, with different ratios of durum and plain flour.

Semola rimacinata durum flour mixed with plain wheat flour — Each sample was a 50gm mixture of semola rimacinata and plain bread flour

Each of these 23 sample bags was scanned three or four times with one of our NIR spectrometers, resulting in a set of 71 NIR spectra.

We also purchased flour from two other Italian Semola Rimacinata brands, a New Zealand durum wheat flour (from a non-amber variety), and samples of 'High Grade' bread flour, spelt flour, and Italian 'doppio zero' soft flour from several brands. Their NIR absorbance are shown here. With such an obvious difference between the spectra for the semola rimacinata brands, and the 'bread' wheat flours obtained from local supermarkets, we should be able to build a reasonably accurate model to predict mixtures of the two types of flour.

Mean NIR absorbance spectra of different wheat flours — The Semola Rimacinata amber durum flour has a distinctive NIR absorbance spectrum.

Linear discriminant analysis of several wheat flours — The spectra of the supermarket flours (High Grade and Plain) are almost indistinguishable

Model Building Results

We took the 71 spectra and built a model to predict the percentage of durum flour in mixtures of amber durum and plain wheat flour. Since multiple spectra were taken from the same sample bag, we used group-wise cross validation to generate the following plot. Each red dot is a prediction made for one of the 71 spectra. The model shows good predictive capability, with

Coefficient of Determination (R²) of 99.6%
Mean Absolute Error (MAE) of 1.6%, and
Root Mean Square Error (RMSE) of 2.1%

Group-wise cross validation of percentage durum wheat flour — Group-wise cross validation for a model to predict % durum flour. Each of the 23 sample bags has been scanned at least three times. These scans of the same bag are treated as a group when undertaking cross-validation.

Italian law allows semola rimacinata flour to contain up to 3% non-durum wheat. Therefore any practical application of NIR to detect possible adulteration of durum wheat flour should probably focus on small amounts of adulteration, up to about 10%. For this reason, it's of interest to examine the accuracy of our model in the 85% to 100% range.

Percentage durum wheat mixed with plain wheat flour

With the exception of scans from two bags (90% durum and 98% durum) the group-wise cross-validation results continue to be strong in this range, with a Mean Absolute Error of 1.3% and a Root Mean Squared Error of 1.7%

What Is Cross Validation?

Cross-Validation is a statistical technique used to evaluate the performance of machine learning models by ensuring they generalise well to unseen data. Our results above used K-Fold Cross-Validation - here's how it works:

Splitting the Dataset: The dataset is divided into k equal-sized subsets or "folds."
Training and Testing: In each iteration, k-1 folds are used for training, and the remaining fold is used for testing.
Repeating the Process: This process is repeated k times, with each fold serving as the test set once.
Averaging Results: The performance metrics from each iteration are averaged to provide a comprehensive evaluation of the model's performance

In this example, since the training set is so small, we have set k=23 - the number of bags of flour mixtures. That is, at each of the Train/Test steps we've set aside the scans from 1 bag, and used them to validate a model built on the scans from all the other 22 bags.

Group-Wise Cross-Validation

Group-wise cross-validation is designed for situations where the data has natural groupings or dependencies. For example, in a medical study, one might have multiple measurements from the same patient. Group-wise cross-validation ensures that all data points from the same group are kept together in the same fold. This prevents data leakage, where information from one sample in a group influences the model's predictions on other samples in the same group.

Sagitto emphasises the importance of group-wise cross validation to customers who use our our free Calibration Benchmarking service. Otherwise it would be very easy to give a falsely optimistic comparison between models built using Sagitto's machine learning techniques, and our customers' existing calibration models built using software provided by instrument manufacturers.

In this flour example, scans of the same bag of flour mixture are clearly going to be related. If we remove the restriction of group-wise cross-validation on our data we can see that it gives a much more favourable result : MAE goes from 1.6% to 1.2%, and RMSE goes from 2.1% to 1.6%. However this is likely to be an over-optimistic view of how the model would behave in future use.

Treating scans of the same bag of flour as independent gives an over-optimistic assessment

Coefficient of Determination (R²)

The Coefficient of Determination, commonly signified by R², tells us how much of the variance in the dependent variable (in this case, % durum flour) is explained by the independent variables (the NIR spectra that we have gathered from scanning each bag.)

The value of R² ranges from 0 to 1:

An R² of 1 indicates that the regression predictions perfectly fit the data.
An R² of 0 indicates that the model does not explain any of the variance in the dependent variable.

Mean Absolute Error (MAE)

Mean absolute error is exactly what it says: the mean of the absolute error values, regardless whether the error is above or below the true value.

Root Mean Squared Error (RMSE)

Root Mean Squared Error (RMSE) is another commonly used metric to measure the accuracy of a regression model. It represents the square root of the average of the squared differences between the predicted values and the actual values. Like MAE, it quantifies how much the predicted values deviate from the actual values. However it gives greater weight to larger errors, which can be useful in applications where very large errors - even if uncommon - are very undesirable.

RMSE tends to emphasise extreme errors — RMSE gives greater weight to larger errors

Let's take our durum wheat flour example, and look at the three largest errors in our model. If just these three predictions had been much closer to the true values of 90% and 98% respectively, we would have seen a considerable reduction in the RMSE value : from 1.70% to 1.20% or less.

Conclusion

We've built a model to explain:

the importance of using group-wise cross validation when samples are not independent; and
illustrate three of the metrics that we generate each time we build a new regression model, or rebuild an existing one

We have shown how using regular cross validation when there is data dependency between samples can lead to falsely optimistic results. And while R² is the most well known measure of how good a regression model is, and MAE is easy to understand, Sagitto also closely monitors RMSE when it evaluates the performance metrics of new models.

‍Note: This example has used a model that measures, with remarkably good accuracy, the percentage of durum wheat flour in a mixture of two flours. it should be emphasised that this model is very specific to not just two types of wheat flour, but two particular samples from two manufacturers. It does not pretend to be anything more.

References

WIkipedia - Durum Wheat

Marina Cocchi, Caterina Durante, Giorgia Foca, Andrea Marchetti, Lorenzo Tassi, Alessandro Ulrici,
Durum wheat adulteration detection by NIR spectroscopy multivariate calibration,
Talanta, Volume 68, Issue 5, 2006, Pages 1505-1511, ISSN 0039-9140

De Girolamo, A.; Cervellieri, S.; Mancini, E.; Pascale, M.; Logrieco, A.F.; Lippolis, V.
‍Rapid Authentication of 100% Italian Durum Wheat Pasta by FT-NIR Spectroscopy Combined with Chemometric Tools.
‍Foods 2020, 9, 1551. https://doi.org/10.3390/foods9111551

Subscribe to Sagitto's Blog

Get industry insights that you won't delete, straight in your inbox.

We use contact information you provide to us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For information, check out our Privacy Policy.

George Hill

Sagitto Ltd

Sagitto's founder, George Hill, first started working with artificial intelligence during the 1980s, while developing 'expert systems' within Bank of America in London. On returning to New Zealand, he undertook part-time study with the University of Waikato's Machine Learning Group while working for Hill Laboratories, a well-known New Zealand commercial testing laboratory. This led to the formation of Sagitto Limited, dedicated to combining the power of artificial intelligence and machine learning with spectroscopy.

More news

Authentication

From Bee Or Not From Bee

That is the question. Is this pure honey, produced only by honey bees? Or has it been adulterated with cheaper sugar syrups? This blog post explores some of the methods that can be used for testing honey for adulteration with syrups. We start with NIR analysis, then look at the C4 Sugars test, Spatial Offset Raman Spectroscopy (SORS), and Nuclear Magnetic Resonance Spectroscopy (NMR)

Read Article

Food

Just Peanuts?

Peanuts are a hugely valuable food crop, with worldwide production of more than 50 million tonnes. This blog post looks at how NIR spectroscopy can help peanut breeders and manufacturers of peanut products. We focus on what NIR can tell us about the composition of peanut kernels and peanut butter; and as an example we use NIR to measure sucrose when added to peanut butter.

Read Article

Machine Vision

More Than Meets The Eye

We often make decisions based on colour. For example, we know that yellow bananas are riper than green bananas and therefore we expect them to be sweeter. But what if we could also see beyond the visible spectrum? Could we make even better decisions by combining visible image data with near infrared data?

Read Article

Learning From Durum Flour

George Hill

Durum Wheat

Data Collection

Model Building Results

What Is Cross Validation?

Group-Wise Cross-Validation

Coefficient of Determination (R2)

Mean Absolute Error (MAE)

Root Mean Squared Error (RMSE)

Conclusion

References

Subscribe to Sagitto's Blog

George Hill

More news

From Bee Or Not From Bee

Just Peanuts?

More Than Meets The Eye

Coefficient of Determination (R²)