Copenhagen Bioscience Snapshot - AI and Large Language Models in Biomedical Research

Dr. Jannis Born: Leveraging scientific language models for molecular discovery - IBM Research Europe - Zurich, Zurich, Switzerland

Abstract: The success of Transformers has extended into scientific domains, giving rise to the concept of “scientific language models” that operate on, for example, small molecules, proteins or polymers. In this talk, we exploit analogies between natural language and organic chemistry to develop language models that may accelerate molecular discovery across various stages.

We begin by developing conditional molecular generative models that leverage semantic context (e.g., protein targets) for flexible molecular design and demonstrate how such models can be integrated into a completely autonomous workflow that spans chemical synthesis planning tools and wet-lab synthesis on robotic hardware. We then address the apparent dichotomy between predictive and generative modeling in computational chemistry through the Regression Transformer (RT). The RT abstracts regression as a conditional sequence modeling problem thus bridging predictive and generative tasks. The RT excels at property-driven, local chemical space exploration and has enabled the discovery of novel block copolymer for ring-opening polymerisation.

We then turn to natural language and propose a prompt-based multitask model that interfaces textual and molecular representations for various tasks (e.g., molecule captioning, text-based molecule design, reaction prediction or retrosynthesis). All presented methodology is open-sourced in the Generative Toolkit for Scientific Discovery (GT4SD) which provides a harmonized interface for researchers to train, fine-tune and deploy 30+ state-of-the-art molecular generative models.

Blending these elements, we conclude with a vision for the future of computational chemistry that combines a chatbot interface with existing chemistry databases and tools. Such a chatbot harnesses the remarkable capabilities of natural language models and integrates it with decades of research in computational chemistry, through techniques such as retrieval-augmented generation or multi-agent LLMs.

Bio: Jannis is a research scientist at IBM Research in Zurich, Switzerland. His current research interests are on AI4Science, Language Models and Quantum Machine Learning. Jannis obtained his PhD from ETH Zurich (D-BSSE) in 2022, for his work on language models for molecular design performed at IBM Research. His PhD was advised by Matteo Manica (IBM), Prof. Karsten Borgwardt (ETH) and Prof. Alan Aspuru Guzik. Before that he completed a M.Sc. at ETH/UZH Zurich (with distinction) in Neural Systems & Computation in 2019 and a B.Sc. in Cognitive Science. He has received the FXH Scientific Excellence Award in 2019 and studied and researched in Germany, England, Singapore and Switzerland.

 

Prof. Ajasja Ljubetič: Using deep learning methods to design novel proteins - National Institute of Chemistry, Ljubljana, Slovenia

Abstract: Proteins are key molecules for life. Recent advances in artificial intelligence, including methods such as RFDiffusion, ProteinMPNN, and AlphaFold (versions 2 and 3), now enable the design of completely new proteins that do not exist in nature. These de novo proteins are extremely stable and well expressed, making them highly valuable for addressing contemporary challenges. Potential applications range from developing viral inhibitors and vaccines to creating more stable enzymes, biosensors, and molecular machines.

In this presentation, I will outline the fundamental workflow of de novo protein design, covering techniques for backbone generation, side-chain assignment, and design filtering. I will also highlight the work of my research group, focusing on our development of a de novo designed random walker system using these cutting-edge methodologies.

Bio: Research Assistant Prof Ljubetič has extensive experience in synthetic biology in and de novo protein design. He began his career working on protein origami (Ljubetič et. al., Nat. Biotech, 2017), where he designed novel shapes that can be made from a single chain of connected coiled coils. Following this, he was awarded a prestigious Marie Skłodowska-Curie Global Fellowship and spent three years at the Institute for Protein Design in the Baker lab, one of the world's leading centers for de novo protein design.

Upon his return to Slovenia, Prof. Ljubetič established the Designed Dynamic Proteins Group at the National Institute of Chemistry (NIC), where his research focuses on the de novo design of molecular machines. He is also a passionate science communicator, having been recognized with the "Prometheus of Science Award" for excellence in scientific communication in 2023.

 

The event is free and open to all – register here:

https://www.tilmeld.dk/copenhagenbiosciencesnapshot-nov14

 

Program

15:00-16:00 – Dr. Jannis Born “Leveraging scientific language models for molecular discovery

16:00-17:00 – Prof. Ajasja Ljubetič “Using deep learning methods to design novel proteins

17:00-18:00 – Networking, drinks and snacks