← 목록으로 돌아가기

AI for Genomic Science

AI for Genomic Science

Welcome to AI for Genomic Science

Author: Joon-Yong An, Korea University

Last Update: 2025/11/2 (Under Construction - probably weekly update by 2026 if I am not lazy enough..)

Front Cover


About This Book

This textbook introduces how artificial intelligence is revolutionizing biological research — from analyzing genetic variants to modeling entire cells. It is designed specifically for undergraduate biology students (senior level) who want to understand the computational approaches that are transforming genomics, without requiring prior experience in AI or advanced programming.

The field of genomics has been rapidly transformed by machine learning, deep learning, and large-scale computational methods. These advances now allow us to analyze massive genomic datasets, predict functional impacts of variants, and model complex biological systems with unprecedented accuracy. This textbook takes a biology-first approach to cover the essential AI concepts and methods that undergraduate students need to master, integrating computational techniques with genomic applications.

This book is written for biology majors who have a solid foundation in molecular biology and genetics, are curious about computational approaches, want to understand the why and how behind AI methods in genomics, and are comfortable with basic mathematics. Rather than assuming extensive programming background, we start from the basics and build up gradually, emphasizing conceptual understanding alongside practical applications.

Please note that chapters are currently being written and improved. The complete version is expected to be finished by 2026!


What Makes This Book Different?

🧬 Biology-First Approach

Each chapter starts with a real biological challenge—experimental limitations that motivate computational solutions. You’ll never wonder “why do I need to learn this?”

💻 Hands-On Coding Labs

Every chapter includes Google Colab-based coding exercises. No installation needed—just click and start learning! All code is heavily commented and designed for beginners.

🎯 Clear Mathematical Explanations

Math concepts are explained in Math Boxes with biological examples. We won’t shy away from equations, but we’ll make sure you understand what they mean.

📚 Real Case Studies

Learn from actual research papers and real datasets. See how these methods are being used to make biological discoveries right now.


What You’ll Learn

By the end of this book, you will be able to:

✅ Understand the fundamental concepts of machine learning and deep learning
✅ Explain how AI methods predict the effects of genetic variants
✅ Use pre-trained models to analyze genomic sequences
✅ Interpret results from tools like CADD, DeepSEA, Enformer, and DNABERT
✅ Understand how language models are applied to DNA and RNA sequences
✅ Analyze single-cell omics data using foundation models
✅ Critically evaluate AI-based studies in genomics literature
✅ Write basic Python code for bioinformatics analyses


Book Structure

This textbook is organized into five parts:

Part 1: Foundations of Artificial Intelligence

Learn the essential AI concepts every biologist should know—from neural networks to different architectures.

Part 2: Genetic Variants and Early AI Approaches

Understand how traditional and machine learning methods help us prioritize and interpret genetic variants.

Part 3: Deep Learning for Genomic Sequence Analysis

Explore how convolutional neural networks and transformers predict regulatory elements and variant effects from DNA sequences.

Part 4: Language Models Meet DNA

Discover how natural language processing techniques are revolutionizing genomics through DNA language models and foundation models.

Part 5: Single-Cell Omics and Foundation Models

See how AI is helping us understand individual cells and move toward whole-cell computational models.


How to Use This Book

For Self-Study:

  1. Read each chapter sequentially—concepts build on each other
  2. Work through the Coding Labs in Google Colab
  3. Try the Discussion Questions to deepen your understanding
  4. Explore the Further Reading for topics that interest you

For Classroom Use:

  • Each chapter is designed for 1-2 weeks of instruction
  • Coding labs can be used as homework or in-class activities
  • Discussion questions are great for group work
  • Case studies make excellent presentation topics

Prerequisites:

  • Biology: Molecular biology, genetics (sophomore/junior level)
  • Math: Algebra, basic statistics (we’ll review when needed)
  • Programming: None required! We start from scratch
  • Enthusiasm: Essential! 🎉

Getting Started

Setting Up Your Environment

All coding exercises use Google Colab, which runs in your web browser. You’ll need:

  1. A Google account (free)
  2. Internet connection
  3. That’s it!

No software installation required. We’ll walk you through everything in Chapter 1.


Quick Navigation

Part 1: Foundations of AI

Part 2: Genetic Variants and Early AI

  • [Chapter 5: Genetic Variation and Genomic Technologies (Soon!)]
  • [Chapter 6: Evolutionary Conservation and Traditional Tools (Soon!)]
  • [Chapter 7: Machine Learning Ensemble Methods (Soon!)]

Part 3: Deep Learning for Genomic Sequences

  • [Chapter 8: The Rise of Deep Learning in Genomics (Soon!)]
  • [Chapter 9: CNN-Based Regulatory Sequence Analysis (Soon!)]
  • [Chapter 10: Transformer-Based Models (Soon!)]

Part 4: Language Models Meet DNA

  • [Chapter 11: Introduction to Language Models (Soon!)]
  • [Chapter 12: Foundation Models for Genomics (Soon!)]
  • [Chapter 13: DNA Language Models (Soon!)]
  • [Chapter 14: Next-Generation DNA Models (Soon!)]

Part 5: Single-Cell Omics and Foundation Models

  • [Chapter 15: Introduction to Single-Cell Omics (Soon!)]
  • [Chapter 16: Single-Cell Foundation Models (Soon!)]
  • [Chapter 17: Toward Whole-Cell Modeling (Soon!)]

Happy Learning! 🧬🤖


License

License information to be added