AI, particularly generative AI and large language models (LLMs), has made tremendous technical strides and is reaching the inflection point of widespread industry adoption. With McKinsey reporting that AI high-performers are already going “all in on artificial intelligence,” companies know they must embrace the latest AI technologies or be left behind.
However, the field of AI safety is still immature, which poses an enormous risk for companies using the technology. Examples of AI and machine learning (ML) going rogue are not hard to come by. In fields ranging from medicine to law enforcement, algorithms meant to be impartial and unbiased are exposed as having hidden biases that further exacerbate existing societal inequalities with huge reputational risks to their makers.
Microsoft’s Tay Chatbot is perhaps the best-known cautionary tale for corporates: Trained to speak in conversational teenage patois before being retrained by internet trolls to spew unfiltered racist misogynist bile, it was quickly taken down by the embarrassed tech titan — but not before the reputational damage was done. Even the much-vaunted ChatGPT has been called “dumber than you think.”
Corporate leaders and boards understand that their companies must begin leveraging the revolutionary potential of gen AI. But how do they even start to think about identifying initial use cases and prototyping when operating in a minefield of AI safety concerns?
The answer lies in focusing on a class use cases I call a “Needle in a Haystack” problem. Haystack problems are ones where searching for or generating potential solutions is relatively difficult for a human, but verifying possible solutions is relatively easy. Due to their unique nature, these problems are ideally suited for early industry use cases and adoption. And, once we recognize the pattern, we realize that Haystack problems abound.
Here are some examples:
Checking a lengthy document for spelling and grammar mistakes is hard. While computers have been able to catch spelling mistakes ever since the early days of Word, accurately finding grammar mistakes has proven more elusive until the advent of gen AI, and even these often incorrectly flag perfectly valid phrases as ungrammatical.
We can see how copyediting fits within the Haystack paradigm. It may be hard for a human to spot a grammar mistake in a lengthy document; once an AI identifies a potential error, it is easy for humans to verify if they are indeed ungrammatical. This last step is critical, because even modern AI-powered tools are imperfect. Services like Grammarly are already exploiting LLMs to do this.
2: Writing boilerplate code
One of the most time-consuming aspects of writing code is learning the syntax and conventions of a new API or library. The process is heavy in researching documentation and tutorials, and is repeated by millions of software engineers every day. Leveraging gen AI trained on the collective code written by these engineers, services like Github Copilot and Tabnine have automated the tedious step of generating boilerplate code on demand.
This problem fits well within the Haystack paradigm. While it is time-consuming for a human to do the research needed to generate a working code in an unfamiliar library, verifying that the code works correctly is relatively easy (for example, running it). Finally, as with other AI-generated content, engineers must further verify that code works as intended before shipping it to production.
3: Searching scientific literature
Keeping up with scientific literature is a challenge even for trained scientists, as millions of papers are published annually. Yet, these papers offer a gold mine of scientific knowledge, with patents, drugs and inventions ready to be discovered if only their knowledge could be processed, assimilated and combined.
Particularly challenging are interdisciplinary insights that require expertise in two often very unrelated fields with few experts who have mastered both disciplines. Fortunately, this problem also fits within the Haystack class: It is much easier to sanity-check potential novel AI-generated ideas by reading the papers from which they are drawn from than to generate new ideas spread across millions of scientific works.
And, if AI can learn molecular biology roughly as well as it can learn mathematics, it will not be limited by the disciplinary constraints faced by human scientists. Products like Typeset are already a promising step in this direction.
Human verification critical
The critical insight in all the above use cases is that while solutions may be AI-generated, they are always human-verified. Letting AI directly speak to (or take action in) the world on behalf of a major enterprise is frighteningly risky, and history is replete with past failures.
Having a human verify the output of AI-generated content is crucial for AI safety. Focusing on Haystack problems improves the cost-benefit analysis of that human verification. This lets the AI focus on solving problems that are hard for humans, while preserving the easy but critical decision-making and double-checking for human operators.
In these nascent days of LLMs, focusing on Haystack use cases can help companies build AI experience while mitigating potentially serious AI safety concerns.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!