Thoughtworks Technology Radar (44/47)

Languages and Frameworks 98. GGML Assess GGML is a C library for machine learning that allows for CPU inferencing. It defines a binary format for distributing large language models (LLMs). To do that it uses quantization, a technique that allows LLMs to run on consumer hardware with effective CPU inferencing. GGML supports a number of different quantization strategies (e.g., 4-bit, 5-bit, and 8-bit quantization), each of which offers different trade-offs between efficiency and performance. A quick way to test, run and build apps with these quantized models is a Python binding called C Transformers. This is a Python wrapper on top of GGML that takes away the boilerplate code for inferencing by providing a high level API. We’ve leveraged these libraries to build proof of concepts and experiments. If you’re considering self-hosted LLMs, carefully assess these community-supported libraries for your organization. 99. GPTCache Assess GPTCache is a semantic cache library for large language models (LLMs). We see the need for a cache layer in front of LLMs for two main reasons — to improve the overall performance by reducing external API calls and to reduce the cost of operation by caching similar responses. Unlike traditional caching approaches that look for exact matches, LLM-based caching solutions require similar or related matches for the input queries. GPTCache approaches this with the help of embedding algorithms to convert the input queries into embeddings and then use a vector datastore for similarity search on these embeddings. One drawback of such a design is that you may encounter false positives during cache hits or false negatives during cache misses, which is why we recommend you carefully assess GPTCache for your LLM-based applications. 100. Grammatical In昀氀ection API Assess In many languages, gender is more visible than in English, and words change based on gender. Addressing users, for example, can require words to be inflected, but it is common practice to use the masculine form by default. There is evidence that this has a negative impact on performance and attitude — and, of course, it’s impolite. Workarounds using gender-neutral language can often feel clumsy. Addressing users correctly, then, is the preferred option, and with the Grammatical Inflection API, introduced in Android 14, Android developers now have an easier path to do so. 101. htmx Assess htmx is a small, neat HTML UI library that recently became popular seemingly out of nowhere. During our Radar discussion, we found its predecessor intercooler.js existed ten years ago. Unlike other increasingly complex pre-compiled JavaScript/TypeScript frameworks, htmx encourages the direct use of HTML attributes to access operations such as AJAX, CSS transitions, WebSockets and Server- Sent Events. There’s nothing technically sophisticated about htmx, but its popularity recalls the simplicity of hypertext in the early days of the web. The project’s website also features some insightful (and amusing) essays on hypermedia and web development, which suggests the team behind htmx have thought carefully about its purpose and philosophy. © Thoughtworks, Inc. All Rights Reserved. 44

Thoughtworks Technology Radar Page 43 Page 45