AI Content Chat (Beta) logo

Tools 45.DVC Adopt DVC continues to be our tool of choice for managing experiments in data science projects. The fact that it’s Git-based makes it a known turf for developers to bring engineering practices to the data science ecosystem. DVC’s opinionated view of a model checkpoint carefully encapsulates a training data set, a test data set, model hyperparameters and the code. By making reproducibility a first-class concern, it allows the team to time travel across various versions of the model. Our teams have successfully used DVC in production to enable continuous delivery for ML (CD4ML); it can be plugged in with any type of storage (including AWS S3, Google Cloud Storage, MinIO and Google Drive). However, with data sets getting bigger, file system–based snapshotting could become particularly expensive. When the underlying data is changing rapidly, DVC on top of a good versioned storage allows tracking model drifts over a period of time. Our teams have effectively used DVC on top of data storage formats like Delta Lake which optimizes versioning (COW). A majority of our data science teams set up DVC as a day zero task while they bootstrap a project; for this reason we’re happy to move it to Adopt. 46.Akeyless Trial As more organizations adopt cloud computing, many are starting to integrate multiple cloud providers simultaneously to maximize flexibility and minimize vendor lock-in. However, managing keys and access controls across multiple cloud providers can be a significant challenge, leading to increased complexity and security risks. Akeyless is a centralized, cloud-based platform that provides unified secrets management with a range of advantages for managing secrets and sensitive data. It integrates seamlessly with different providers, simplifying the management of secrets and access controls to monitor and control who has access to sensitive data; with encryption, access controls, multi-factor authentication and other security mechanisms it ensures only authorized users are able to access sensitive data. Additionally, it provides an intuitive interface for administration and monitoring, providing a less complex and more scalable developer and administration experience. 47.Apicurio Registry Trial Within any organization, API producers and consumers need to stay in sync about the schemas that will be used for communication among them. Especially as the number of APIs and related producers and consumers grow in the organization, what may start with simply passing around schemas among teams will start to hit scaling challenges. Faced with this issue, some of our teams have turned to Apicurio Registry, an open-source, centralized registry for various types of schemas and API artifacts, including OpenAPI specifications and Protobuf and Avro schemas. Apicurio Registry allows users to interact with it through a UI as well as a REST API and a Maven plugin. It also has the option to enforce schema evolution restrictions, such as backward compatibility. Moreover, when it comes to working with Kafka clients, Apicurio Registry is compatible with Confluent Schema Registry. While our teams have found Confluent Schema Registry’s documentation more helpful, Apicurio Registry meets their needs for a source of truth for various schemas. © Thoughtworks, Inc. All Rights Reserved. 28

Immersive Experience — Vol 28 | Thoughtworks Technology Radar - Page 28 Immersive Experience — Vol 28 | Thoughtworks Technology Radar Page 27 Page 29