Techniques Thoughtworks Technology Radar 21. Synthetic data for testing models Assess During our discussions for this edition of the Radar, several tools and applications for synthetic data generation came up. As the tools mature, we’ve found that using synthetic data for testing models is a powerful and broadly useful technique. Although not intended as a substitute for real data in validating the discrimination power of machine-learning models, synthetic data can be used in a variety of situations. For example, it can be used to guard against catastrophic model failure in response to rarely occurring events or to test data pipelines without exposing personally identifiable information. Synthetic data is also useful for exploring edge cases that lack real data or for identifying model bias. Some helpful tools for generating data include Faker or Synth, which generate data that conforms to desired statistical properties, and tools like Synthetic Data Vault that can generate data that mimics the properties of an input data set. 22. TinyML Assess We continue to be excited by the TinyML technique and the ability to create machine learning (ML) models designed to run on low-powered and mobile devices. Until recently, executing an ML model was seen as computationally expensive and, in some cases, required special-purpose hardware. While creating the models still broadly sits within this classification, they can now be created in a way that allows them to be run on small, low-cost and low-power consumption devices. If you’ve been considering using ML but thought it unrealistic because of compute or network constraints, then this technique is worth assessing. 23. Verifiable credentials Assess When we first included it in the Radar two years ago, verifiable credentials (VC) was an intriguing standard with some promising potential applications, but it wasn’t widely known or understood outside the community of enthusiasts. This was particularly true when it came to the credential-granting institutions, such as state governments, who would be responsible for implementing the standards. Two years and one pandemic later, the demand for cryptographically secure, privacy-respecting and machine-verifiable electronic credentials has grown and, as a result, governments are starting to wake up to VC’s potential. We’re now starting to see VC crop up in our work for public-sector clients. The W3C standard puts credential holders at the center, which is similar to our experience when using physical credentials: users can put their verifiable credentials in their own digital wallets and show them to anyone at any time without the permission of the credentials’ issuer. This decentralized approach also enables users to better manage and selectively disclose their own information which greatly improves data privacy protection. For example, powered by zero-knowledge proof technology, you can construct a verifiable credential to prove that you’re an adult without revealing your birthday. It’s important to note that although many VC-based decentralized identity solutions rely on blockchain technology, blockchain is not a prerequisite for all VC implementations. © Thoughtworks, Inc. All Rights Reserved. 18
