A.I. for Drug Discovery

There is more you can do with A.I. models than just automate tasks humans can already do.

June 17, 2024

Chemical structure of lovastatin, via Wikipedia

Much of the discussion about A.I. these days is about automating things that humans can do, but which they do slowly or unwillingly. Major tech platforms are rushing to incorporate task-automating A.I. into their services, with Apple being the latest company to announce a new suite of AI features. A.I. will write your emails, manage your calendar, edit your iPhone videos, make your graphics, book your travel, recommend social media comments, and write your reports — including, in some unfortunate cases, your scientific papers.

That’s fine as far as it goes, but with most stories about A.I. focused on automating routine tasks, it’s easy to miss the more transformative possibilities of A.I. in science. That’s why it was nice to see a piece by Steve Lohr in the New York Times on the possibilities of A.I. in drug discovery, in the form of a profile on the Monrovia, CA pharma start-up Terray Therapeutics. What Terray is doing illustrates some important broader themes on the value of A.I. in biomedical research, which go beyond merely automating human tasks.

Probably the single greatest challenge in drug discovery is the high failure rate. Tremendous resources are spent on candidate therapeutics that ultimately fail. According to a 2019 analysis the probability of success from phase 1 trials to full approval is ~14%. We could bring down the high costs of failure if we could accurately predict which candidates are likely to fail before bringing them to phase 1 trials. That’s the kind of prediction A.I. could be good at. And notably, this is something an A.I. could do that is not simply automating a task that humans are already capable of.

To bring A.I. to bear on drug discovery, Terray Therapeutics is developing platforms to generate “high quality data at scale” specifically meant to train A.I. models. That is, rather than traditional drug screening platforms designed to generate human-interpretable data (with the assistance of statistical tools, of course), Terray designs platforms that are good for training A.I. Scalability is key here – the NY Times piece highlight’s the company’s 32 million-microwell chips. The premise behind this approach is that to train A.I. models that are up to the job, you need to generate training datasets comprised of large volumes of reproducible, high quality data.

The power of A.I. models in all of their applications, whether it’s writing an email or picking out promising drugs, lies in the tremendous capacity for recognizing patterns in complex, high dimensional data. A.I.’s pattern recognition ability is a significant advantage over more conventional statistical methods like p-value-based hypothesis testing. Pattern recognition isn’t just for prediction – it’s also the basis of generative models like chemical design language models, such as ChemCrow. Pattern recognition is of course also one of the main strengths of human intelligence, and humans have used their pattern-recognition skills to perform computer aided drug design (PDF) for nearly half a century. But we can’t absorb multi-modal, quantitative datasets at the same scale as an A.I.

The concept of designing experiments at scale whose purpose is to generate high quality training data, rather than human-interpretable data is something that will become more common in science where large-scale experiments are feasible. How to produce optimal training datasets, particularly in genomics, is an active area of discussion, one we can have because we can, in many cases, do experiments at the scale needed.

At the same time, other groups are democratizing access to A.I. by building platforms that can give researchers off-the-shelf tools to create their own models. It can be as easy as ‘pip install torchdrug’. After that, all you need is the data.