A l’article Why are computational chemists making up their data? del Chemsitry World es parla del concepte de “dades sintètiques”, que sembla que en camps com l’economia es fa servir, però que a la química pot portar a confusió:
Synthetic data is not just an input for AI models – it’s an output from them too. Jacobsen, in fact, previously defined synthetic data as data ‘generated by algorithms and for algorithms’, although chemists may not be concerned with having such a strict definition. Some of the techniques commonly used to create it are related to those used in making deepfakes. In the same way that deepfakers might ask their machines to generate realistic-looking faces and speech, chemists might prompt theirs to generate realistic-looking chemical structures.
Cal notar que les dades sintètiques són fabricades de bona fe, i no pas per fer trampes:
For scientists, faking or making up data has obvious connotations and, thanks to some high-profile cases of scientific misconduct, they’re generally not positive ones. Chemists may, for example, be aware of a 2022 case in which a respected journal retracted two papers by a Japanese chemistry group that were found to contain ‘manipulated or fabricated’ data. Or the case of Bengü Sezen, the Columbia University chemist who, during the 2000s, ‘falsified, fabricated and plagiarised’ data to get her work on chemical bonding published – including fixing her NMR spectra with correcting fluid.
‘Synthetic data’, unlike dishonestly made-up data, is created in a systematic way for legitimate reasons, however, usually by a machine – and for a variety of reasons. Synthetic data is familiar to machine learning experts, and increasingly to computational chemists, but relatively unknown to the wider chemistry community, as Keith Butler, a materials researcher who works with machine learning methods at University College London in the UK, acknowledges.
Un tema molt interessant!