Languages and Frameworks 100. LangChain Assess LangChain is a framework for building applications with large language models (LLMs). These models have triggered a race to incorporate generative AI in several use cases. However, using these LLMs in isolation may not be enough — you have to combine them with your differentiated assets to build an impactful product. LangChain fills this niche with some neat features, including prompt management, chaining, data augmented generation and a rich set of agents to determine which actions to take and in what order. We expect more tooling and frameworks to evolve with LLMs, and we recommend assessing LangChain. 101. mljar-supervised Assess mljar-supervised is an AutoML Python package that assists with understanding and explaining tabular data. Our data science teams are excited about it and use it to automate exploratory data analysis. It abstracts the common way to preprocess the data, construct the machine learning (ML) models and perform hyper-parameters tuning to find the best model. Explainability and transparency are important tenets, and that’s where mljar-supervised shines. It allows you to see exactly how the ML pipeline is constructed with a detailed markdown report for each ML model. It’s definitely an interesting AutoML package that’s worth assessing for your ML needs. 102. nanoGPT Assess nanoGPT is a framework for training and fine-tuning medium-sized generative pretrained transformers (GPT). The author, Andrej Karpathy, references Attention is All You Need and OpenAI’s GPT-3 papers to build a GPT from scratch using PyTorch. With all the hype around generative AI, we want to highlight nanoGPT for its simplicity and focus on clearly articulating the building blocks of the GPT architecture. 103. pandera Assess In previous Radars, we’ve featured data validation and testing platforms like Great Expectations that can be used to validate assumptions and test the quality of incoming data used for training or classification. Sometimes, though, all you need is a simple code library to implement tests and quality checks directly in pipelines. pandera is a Python library for testing and validating data across a wide range of frame types such as pandas, Dask or PySpark. pandera can implement simple assertions about fields or hypothesis tests based on statistical models. The wide range of supported frame libraries means tests can be written once and then applied to a variety of underlying data formats. pandera can also be used to generate synthetic data to test ML models. © Thoughtworks, Inc. All Rights Reserved. 44
Immersive Experience — Vol 28 | Thoughtworks Technology Radar Page 43 Page 45