Durum Wheat
Durum wheat (Triticum turgidum subsp. durum) is the second most cultivated species of wheat after common wheat, although it represents only 5% to 8% of global wheat production. Durum in Latin means 'hard', and this species is the hardest of all wheats. This makes it ideal for pasta production, by ensuring that the shape of the pasta is retained after cooking.

Data Collection

For this experiment, we purchased Caputo Semola Rimacinata durum wheat flour, and Woolworths Plain flour. From this we filled 23 small plastic bags of flour, each weighing 50 gm, with different ratios of durum and plain flour.

Each of these 23 sample bags was scanned three or four times with one of our NIR spectrometers, resulting in a set of 71 NIR spectra.
We also purchased flour from two other Italian Semola Rimacinata brands, a New Zealand durum wheat flour (from a non-amber variety), and samples of 'High Grade' bread flour, spelt flour, and Italian 'doppio zero' soft flour from several brands. Their NIR absorbance are shown here. With such an obvious difference between the spectra for the semola rimacinata brands, and the 'bread' wheat flours obtained from local supermarkets, we should be able to build a reasonably accurate model to predict mixtures of the two types of flour.


Model Building Results
We took the 71 spectra and built a model to predict the percentage of durum flour in mixtures of amber durum and plain wheat flour. Since multiple spectra were taken from the same sample bag, we used group-wise cross validation to generate the following plot. Each red dot is a prediction made for one of the 71 spectra. The model shows good predictive capability, with
- Coefficient of Determination (R2) of 99.6%
- Mean Absolute Error (MAE) of 1.6%, and
- Root Mean Square Error (RMSE) of 2.1%

Italian law allows semola rimacinata flour to contain up to 3% non-durum wheat. Therefore any practical application of NIR to detect possible adulteration of durum wheat flour should probably focus on small amounts of adulteration, up to about 10%. For this reason, it's of interest to examine the accuracy of our model in the 85% to 100% range.

With the exception of scans from two bags (90% durum and 98% durum) the group-wise cross-validation results continue to be strong in this range, with a Mean Absolute Error of 1.3% and a Root Mean Squared Error of 1.7%
What Is Cross Validation?
Cross-Validation is a statistical technique used to evaluate the performance of machine learning models by ensuring they generalise well to unseen data. Our results above used K-Fold Cross-Validation - here's how it works:
- Splitting the Dataset: The dataset is divided into k equal-sized subsets or "folds."
- Training and Testing: In each iteration, k-1 folds are used for training, and the remaining fold is used for testing.
- Repeating the Process: This process is repeated k times, with each fold serving as the test set once.
- Averaging Results: The performance metrics from each iteration are averaged to provide a comprehensive evaluation of the model's performance
In this example, since the training set is so small, we have set k=23 - the number of bags of flour mixtures. That is, at each of the Train/Test steps we've set aside the scans from 1 bag, and used them to validate a model built on the scans from all the other 22 bags.
Group-Wise Cross-Validation
Group-wise cross-validation is designed for situations where the data has natural groupings or dependencies. For example, in a medical study, one might have multiple measurements from the same patient. Group-wise cross-validation ensures that all data points from the same group are kept together in the same fold. This prevents data leakage, where information from one sample in a group influences the model's predictions on other samples in the same group.
Sagitto emphasises the importance of group-wise cross validation to customers who use our our free Calibration Benchmarking service. Otherwise it would be very easy to give a falsely optimistic comparison between models built using Sagitto's machine learning techniques, and our customers' existing calibration models built using software provided by instrument manufacturers.
In this flour example, scans of the same bag of flour mixture are clearly going to be related. If we remove the restriction of group-wise cross-validation on our data we can see that it gives a much more favourable result : MAE goes from 1.6% to 1.2%, and RMSE goes from 2.1% to 1.6%. However this is likely to be an over-optimistic view of how the model would behave in future use.

Coefficient of Determination (R2)
The Coefficient of Determination, commonly signified by R2, tells us how much of the variance in the dependent variable (in this case, % durum flour) is explained by the independent variables (the NIR spectra that we have gathered from scanning each bag.)
The value of R2 ranges from 0 to 1:
- An R2 of 1 indicates that the regression predictions perfectly fit the data.
- An R2 of 0 indicates that the model does not explain any of the variance in the dependent variable.
Mean Absolute Error (MAE)
Mean absolute error is exactly what it says: the mean of the absolute error values, regardless whether the error is above or below the true value.
Root Mean Squared Error (RMSE)
Root Mean Squared Error (RMSE) is another commonly used metric to measure the accuracy of a regression model. It represents the square root of the average of the squared differences between the predicted values and the actual values. Like MAE, it quantifies how much the predicted values deviate from the actual values. However it gives greater weight to larger errors, which can be useful in applications where very large errors - even if uncommon - are very undesirable.

Let's take our durum wheat flour example, and look at the three largest errors in our model. If just these three predictions had been much closer to the true values of 90% and 98% respectively, we would have seen a considerable reduction in the RMSE value : from 1.70% to 1.20% or less.
Conclusion
We've built a model to explain:
- the importance of using group-wise cross validation when samples are not independent; and
- illustrate three of the metrics that we generate each time we build a new regression model, or rebuild an existing one
We have shown how using regular cross validation when there is data dependency between samples can lead to falsely optimistic results. And while R2 is the most well known measure of how good a regression model is, and MAE is easy to understand, Sagitto also closely monitors RMSE when it evaluates the performance metrics of new models.
Note: This example has used a model that measures, with remarkably good accuracy, the percentage of durum wheat flour in a mixture of two flours. it should be emphasised that this model is very specific to not just two types of wheat flour, but two particular samples from two manufacturers. It does not pretend to be anything more.
References
Marina Cocchi, Caterina Durante, Giorgia Foca, Andrea Marchetti, Lorenzo Tassi, Alessandro Ulrici,
Durum wheat adulteration detection by NIR spectroscopy multivariate calibration,
Talanta, Volume 68, Issue 5, 2006, Pages 1505-1511, ISSN 0039-9140
De Girolamo, A.; Cervellieri, S.; Mancini, E.; Pascale, M.; Logrieco, A.F.; Lippolis, V.
Rapid Authentication of 100% Italian Durum Wheat Pasta by FT-NIR Spectroscopy Combined with Chemometric Tools.
Foods 2020, 9, 1551. https://doi.org/10.3390/foods9111551