Brian Sunter

Best resources for studying machine learning

Overview

I’m very eager to start studying AI, but with so much to learn, I’m not sure where to start. Do I need a lot of math? What kind? Which areas should I focus on? How can I make sense of all the topics? What tools should I use? Additionally, how is the field developing, and what direction is it headed in?

I’m a software engineer with some math under my belt, and my goal is to gain a thorough understanding of AI so I can apply it to my work. I’m particularly interested in generative AI, natural language processing, and building intelligent agents. I also want to gain the necessary math skills to understand the fundamentals, but I don’t want to get too bogged down in some of the advanced mathematical details.

To help me on this journey, I’ve collected and summarized the best courses and books to help me get started on the right foot. I included the topics covered by each resource, so the list got pretty long, but it gives me a better idea of what to prioritize and how things fit together.

These materials are roughly in order I plan to study them, though some of the materials at the end are pretty advanced. I want to start by getting a practical overview, then go deep on the math and fundamentals.

I plan on starting with some of the most popular courses, like Andrew Ng’s Deeplearning.ai courses and the fast.ai courses.

Next, I plan to take a few other high quality courses like the 3blue1brown math YouTube courses, the HuggingFace course, and Andrej Karpathy’s Neural Network courses.

Then I plan to study a variety of O’Reilly books focused on practical topics.

Finally, I plan to study more in depth materials, such as a Berkeley AI course, some math textbooks, and some of the famous AI textbooks.

Deeplearning AI Intro Course

Introductory course by Andrew Ng covering practical machine learning topics using Python

Time: 2.5 months (5 hours/week)

Topics

Supervised learning

Linear regression

Logistic regression

Neural networks

Decision trees

Tree Ensembles

Unsupervised learning

Clustering

Dimensionality reduction

Recommender systems

Anomaly detection

Tools

Python

numpy

scikit learn

Tensorflow

XGBoost

Best Practices

Regularization to Avoid Overfitting

Evaluating and tuning models

Improving performance

Deeplearning.ai Deep Learning Course

Course Link

Practical intermediate deep learning course by Andrew Ng

Topics

Tensorflow

Artificial Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Transformers

Python Programming

Deep Learning

Backpropagation

Optimization

Hyperparameter Tuning

Machine Learning

Transfer Learning

Multi-Task Learning

Object Detection and Segmentation

Facial Recognition System

Gated Recurrent Unit (GRU)

Long Short Term Memory (LSTM)

Attention Models

Natural Language Processing

Practical Deep Learning Fast.ai

A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.

Course Link

Topics

Deployment

Neural net foundations

Natural Language (NLP)

From-scratch model

Random forests

Collaborative filtering

Convolutions (CNNs)

Deeplearning.ai Natural Language Course

Course Link

How to design NLP applications that perform question-answering and sentiment analysis, create tools to translate languages, summarize text, and even build chatbots.

Time: 4 months (6 hours/week)

Topics

Sentiment Analysis

Transformers

Attention Models

Machine Translation

Word2vec

Word Embeddings

Locality-Sensitive Hashing

Vector Space Models

Parts-of-Speech Tagging

N-gram Language Models

Autocorrect

Sentiment with Neural Networks

Siamese Networks

Natural Language Generation

Named Entity Recognition (NER)

Reformer Models

Neural Machine Translation

Chatbots

T5 + BERT Models

Deeplearning.io Tensorflow Data and Deployment Course

Course Link

Learn how to get your machine learning models into the hands of real people on all kinds of devices. Start by understanding how to train and run machine learning models in browsers and in mobile applications. Learn how to leverage built-in datasets with just a few lines of code, learn about data pipelines with TensorFlow data services, use APIs to control data splitting, process all types of unstructured data and retrain deployed models with user data while maintaining data privacy.

Time: 4 months (3 hours/week)

Topics

Tensorflow

Object Detection

JavaScript

Convolutional Neural Network

Tensorflow.js

Tensorflow Lite

Mathematical Optimization

Extraction, Transformation And Loading (ETL)

Data Pipelines

Deeplearning.io Generative Adversarial Networks Course

Course Link

Introduction to image generation with GANs, charting a path from foundational concepts to advanced techniques through an easy-to-understand approach.

Time: 3 months (8 hours/week)

Topics

Generator

Image-to-Image Translation

Glossary of Computer Graphics

Discriminator

Generative Adversarial Networks

Controllable Generation

WGANs

Conditional Generation

Components of GANs

DCGANs

Bias in GANs

StyleGANs

Deeplearning.io Tensorflow Advanced

Course Link

Expand your knowledge of the Functional API and build exotic non-sequential model types. You will learn how to optimize training in different environments with multiple processors and chip types and get introduced to advanced computer vision scenarios such as object detection, image segmentation, and interpreting convolutions. You will also explore generative deep learning including the ways AIs can create new content from Style Transfer to Auto Encoding, VAEs, and GANs.

Time: 5 months (6 hours/week)

Topics

Model Interpretability

Custom Training Loops

Custom and Exotic Models

Generative Machine Learning

Object Detection

Functional API

Custom Layers

Custom and Exotic Models with Functional API

Custom Loss Functions

Distribution Strategies

Basic Tensor Functionality

GradientTape for Optimization

Deeplearning.io MLOps Course

Course Link

How to conceptualize, build, and maintain integrated systems that continuously operate in production.

Time: 4 months (5 hours/week)

Topics

Data Pipelines

Model Pipelines

Deploy Pipelines

Managing Machine Learning Production systems

ML Deployment Challenges

Project Scoping and Design

Concept Drift

Model Baseline

Human-level Performance (HLP)

TensorFlow Extended (TFX)

ML Metadata

Data transformation

Data augmentation

Data validation

AutoML

Precomputing predictions

Fairness Indicators

Explainable AI

Model Performance Analysis

TensorFlow Serving

Model Monitoring

General Data Protection Regulation (GDPR)

Model Registries

Deeplearning.io Data Science on AWS Course

Course Link

Develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker.

Time: 3 months (5 hours/week)

Topics

Automated Machine Learning (AutoML)

Natural Language Processing with BERT

ML Pipelines and ML Operations (MLOps)

A/B Testing, Model Deployment, and Monitoring

Data Labeling at Scale

Data Ingestion

Exploratory Data Analysis

Statistical Data Bias Detection

Multi-class Classification with FastText and BlazingText

Feature Engineering and Feature Store

Model Training, Tuning, and Deployment with BERT

Model Debugging, Profiling, and Evaluation

ML Pipelines and MLOps

Artifact and Lineage Tracking

Distributed Model Training and Hyperparameter Tuning

Cost Savings and Performance Improvements

Human-in-the-Loop Pipelines

Huggingface Course

Course Link

This course will teach you about natural language processing (NLP) using libraries from the Hugging Face ecosystem — 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate — as well as the Hugging Face Hub.

Topics

Transformer Models

Fine-tuning a pretrained model

Sharing models and tokenizers

Datasets library

Tokenizers Library

Building and sharing demos

Optimizing for production

Huggingface Diffusion Models Class

Course Link

👩‍🎓 Study the theory behind diffusion models

🧨 Learn how to generate images and audio with the popular 🤗 Diffusers library

🏋️‍♂️ Train your own diffusion models from scratch

📻 Fine-tune existing diffusion models on new datasets

🗺 Explore conditional generation and guidance

🧑‍🔬 Create your own custom diffusion model pipelines

Topics

pytorch

difussers and diffusion models

Fine tuning

Stable Difussion

Huggingface Deep Reinforcement Learning Course

Course Link

📖 Study Deep Reinforcement Learning in theory and practice.

🧑‍💻 Learn to use famous Deep RL libraries such as Stable Baselines3RL Baselines3 ZooSample Factory and CleanRL.

🤖 Train agents in unique environments such as SnowballFightHuggy the Doggo 🐶MineRL (Minecraft ⛏️)VizDoom (Doom) and classical ones such as Space Invaders and PyBullet.

💾 Share your trained agents with one line of code to the Hub and also download powerful agents from the community.

🏆 Participate in challenges where you will evaluate your agents against other teams. You’ll also get to play against the agents you’ll train.

Topics

Q-Learning

Policy Gradient with PyTorch

Actor Critic Methods

Proximal Policy Optimization

Multi-Agents

Decision Transformers

offline Reinforcement Learning

Andrej Karpathy Neural Networks Zero to Hero Course

This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.

Topics

Backpropagation

pytorch

Multi-layer perceptron

Loss function

Gradient descent optimization

Bigrams

Vector normalization

Tensor broadcasting

Model smoothing

One-hot encodings

Vectorized loss

Embeddings

Hidden layers

Negative log likelihood loss

Cross entropy

Overfitting

Learning rate

Character embeddings

Sampling from models

Google colab

TanH activation function

Batch normalization

Forward pass activation statistics

Backward pass gradient

Kaiming init

Parameter activation

Gradient statistics

Batchnorm

3blue1brown YouTube courses

Neural Networks from the Ground Up

The basics of neural networks, and the math behind how they learn

Topics

Neural Networks

Gradient Descent

Backpropagation

Essence of Linear Algebra

An introduction to visualizing what matrices are really doing

Topics

Vectors

Linear Combinations

Span

Basis Vectors

Linear Transformation

Matrices

Matrix Multiplication

Three dimensional linear transformations

Determinant

Inverse Matrices

Column Space

Null Space

Nonsquare Matrices

Dot Product

Duality

Cross Products

Cramer’s Rule

Change of basis

Eigenvectors and Eigenvalues

Abstract Vector spaces

Essence of Calculus

Visual introductions to the core ideas of derivatives, integrals, limits and more

Topics

Derivative

Chain Rule

Product Rule

Euler’s Number

Implicit Differentiation

Limits

L’Hôpital’s rule

Epsilon Delta

Integration

Fundamental Theorem of Calculus

Higher Order Derivatives

Taylor Series

Probability

An assortment of introductory ideas in probability

Topics

Bayes Theorem

Binomial Distribution

Probability Density Functions

Hands-On Machine Learning with Scikit-Learn and TensorFlow

By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned, all you need is programming experience to get started.

Book Link

Topics

Types of Machine Learning Systems

Supervised/Unsupervised Learning

Batch and Online Learning

Instance-Based Versus Model-Based Learning

Challenges of Machine Learning

End-to-End Machine Learning Project

Classification

Binary Classifier

Performance Measures

Cross validation

Confusion matrix

Multiclass classification

Training Models

Linear Regressions

Gradient Descent

Polynomial Regression

Learning Curves

Regularized Linear Models

Logistic Regression

Support Vector Machines

Linear SVM Classification

Soft Margin Classification

Nonlinear SVM Classification

Decision Function and Predictions

Training Objective

Quadratic Programming

The Dual Problem

Kernelized SVMs

Online SVMs

Decision Trees

Ensemble Learning and Random Forests

Voting Classifiers

Bagging and Pasting

Bagging and Pasting in Scikit-Learn

Out-of-Bag Evaluation

Random Patches and Random Subspaces

Random Forests

Extra-Trees

Feature Importance

Boosting

AdaBoost

Gradient Boosting

Dimensionality Reduction

PCA

Projection

Manifold Learning

Kernel PCA

LLE

Unsupervised Learning

Clustering

Gaussian Mixtures

Introduction to Artificial Neural Networks with Keras

From Biological to Artificial Neurons

Implementing MLPs with Keras

Fine-Tuning Neural Network Hyperparameters

Training Deep Neural Networks

Vanishing/Exploding Gradients Problems

Reusing Pretrained Layers

Faster Optimizers

Avoiding Overfitting Through Regularization

Custom Models and Training with TensorFlow

Using TensorFlow like NumPy

Customizing Models and Training Algorithms

TensorFlow Functions and Graphs

Loading and Preprocessing Data with TensorFlow

Data API

TFRecord

Preprocessing the Input Features

TF Transform

Deep Computer Vision Using Convolutional Neural Networks

Convolutional Layers

Pooling Layers

CNN Architectures

Implementing a ResNet-34 CNN Using Keras

Object Detection

Semantic Segmentation

Processing Sequences Using RNNs and CNNs

Recurrent Neurons and Layers

Training RNNs

Forecasting a Time Series

Handling Long Sequences

Natural Language Processing with RNNs and Attention

Generating Shakespearean Text Using a Character RNN

Sentiment Analysis

An Encoder–Decoder Network for Neural Machine Translation

Attention Mechanisms

Transformers

Representation Learning and Generative Learning Using Autoencoders and GANs

Stacked Autoencoders

Generative Adversarial Networks

Reinforcement Learning

Policy Search

Neural Network Policies

Policy Gradients

Q-Learning

TF-Agents Library

Training and Deploying TensorFlow Models at Scale

Serving a TensorFlow Model

Deploying a Model to a Mobile or Embedded Device

Training Models Across Multiple Devices

SQL for Data Analysis

Book Link

You’ll learn how to use both common and exotic SQL functions such as joins, window functions, subqueries, and regular expressions in new, innovative ways—as well as how to combine SQL techniques to accomplish your goals faster, with understandable code.

Topics

Databases

Preparing Data for Analysis

Data cleaning

Deduplication

Nulls

Shaping data

Time Series Data

Dates and time

Trends

Windows

Seasonality

Cohort Analysis

Retention

Related Cohort Analysis

Cross section analysis

Text Analysis

Anomaly Detection

Experiment Analysis

Complex Data Sets

Practical Statistics for Data Scientists

Book Link

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

Topics

Rectangular Data

Data Frames

Estimates

Mean, median, mode, variability, percentile

Distribution and sampling

Bias, central limit theorem, standard error, resampling, confidence interval, normal distribution

Normal, long tail, t, binomial, poisson, weibuill distributions

Statistical Experiments and Significance Testing

A/B testing, Hypothesis testing, null hypothesis

Pvalue, alpha, t-test, anova, chi-square, multi arm bandit

Regression and Prediction

Simple Linear regression

Multiple linear regression

Confidence and prediction intervals

Classification

Naive bayes

Discriminant analysis

Logistic regression

Imbalanced data

Statistical Machine Learning

KNN

Tree models

Bagging and Random Forest

Boosting

Unsupervised Learning

Principal Components Analysis

k-means clustering

Hierarchal clustering

Model Clustering

Scaling and categorical variables

Essential Math for Data Science

Book Link

Master the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics

Topics

Calculus

Probability

Linear algebra

Vectors

Matrices

Matrix decomposition

statistics

p-values

Statistical significance

Linear regression

Logistic regression

Neural networks

SymPy

NumPy

scikit-learn

Data Science Career

Data Science from Scratch

Book Link

Get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. Packed with New material on deep learning, statistics, and natural language processing

Topics

Python

Matplotlib

Linear Algebra

Vectors

Matrices

Statistics

Probability

Bayes Theorem

Distribution

Central Limit Theorem

Hypothesis and Inference

p-value

Confidence intervals

p-hacking

Bayesian inference

Gradient Descent

Scraping Data

Working with Data

Dataclasses

Rescaling

Cleaning

Rescaling

Dimensionality reduction

Machine learning

Modeling

Overfitting

Bias-variance

Feature extraction

k-nearest neighbors

Model

Dimensionality

Naive Bayes

Simple Linear Regression

Multiple Regression

Logistic Regression

Decision Tree

Neural Networks

Deep Learning

Clustering

Natural Language Processing

Network Analysis

Eigenvector

Directed graphs

Recommender Systems

Collaborative Filtering

Matrix Factorization

Databases and SQL

Mapreduce

Python

numpy

pandas

scikit-learn

visualization

Practical Natural Language Processing

Book Link

This book gives a comprehensive view on building real world NLP applications. it covers the complete lifecycle of a typical NLP project - right from data collection to deploying and monitoring the model. Some of these steps are applicable to any ML pipeline while some are very specific to NLP. The book also introduces task-specific case studies and domain-specific guides to build an NLP system from scratch.

Topics

NLP: A Primer

NLP Pipeline

Text Representation

Text Classification

Information Extraction

Chatbots

Topics in Brief

Social Media

E-Commerce and Retail

Healthcare, Finance, and Law

The End-to-End NLP Process

Deep Learning from Scratch

Book Link

Shows you how neural networks work using a first principles approach. You’ll learn how to apply multilayer neural networks, convolutional neural networks, and recurrent neural networks from the ground up. With a thorough understanding of how neural networks work mathematically, computationally, and conceptually

Topics

Math Foundations

Fundamentals

Deep Learning from Scratch

Extensions

Convolutional Neural Networks

Recurrent Neural Networks

PyTorch

Generative Deep Learning by David Foster

Book Link

Discover how to re-create some of the most impressive examples of generative deep learning models, such as variational autoencoders, generative adversarial networks (GANs), encoder-decoder models, and world models.

Topics

Generative Versus Discriminative Modeling

Probabilistic Generative Models

Deep Neural Networks

Convolutional Layers

Batch Normalization

Dropout Layers

Autoencoders

Variational Autoencoder

Using VAEs to Generate Faces

Generative Adversarial Networks

Oscillating Loss

Mode Collapse

Uninformative Loss

Hyperparameters

Discrimators

Wasserstein GAN

CycleGAN

Neural Style Transfer

LSTM Network

Stacked Recurrent Networks

Gated Recurrent Units

Bidirectional Cells

Encoder–Decoder Models

Music-Generating RNN

Reinforcement Learning

MDN-RNN

Controller Architecture

In-Dream Training

Transformer

ProGAN

Self-Attention GAN (SAGAN)

BigGAN

StyleGAN

Introducing MLOps

Book Link

Introduces the key concepts of MLOps to help data scientists and application engineers not only operationalize ML models to drive real business change but also maintain and improve those models over time. Through lessons based on numerous MLOps applications around the world, nine experts in machine learning provide insights into the five steps of the model life cycle—Build, Preproduction, Deployment, Monitoring, and Governance

Topics

People of MLOps

Model Development

Data Sources and Exploratory Data Analysis

Feature Engineering and Selection

Training and Evaluation

Reproducibility

Productionalization and Deployment

Monitoring

Iteration and Life Cycle

Governance

Evaluating and Comparing Models

Adaptation from Development to Production Environments

Quality Assurance for Machine Learning

Reproducibility and Auditability

Machine Learning Security

Building ML Artifacts

Scaling Deployments

Model Degradation

Drift Detection in Practice

The Feedback Loop

The Feedback Loop

Model Governance

Responsible AI

MLOps in Practice

Consumer Credit Risk Management

Marketing Recommendation Engines

Consumption Forecast

Introduction to Statistical Learning

Book link

Site

Accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications

This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.

Easier than Elements of Statistical Learning

Topics

linear regression

classification

resampling methods

shrinkage approaches

tree-based methods

support vector machines

clustering

deep learning

survival analysis

multiple testing

naïve Bayes

generalized linear models

Bayesian additive regression trees

matrix completion

UC Berkeley CS188 Intro to AI

This introductory Berkeley course accompanies the “Artificial Intelligence: A Modern Approach” book and provides lectures and course materials

Course Link

Topics

Uninformed Search

A* Search and Heuristics

Constraint Satisfaction Problems

Game Trees

Minimax

Expectimax

Markov Decision Processes

Reinforcement Learning

Probability

Markov Models

Hidden Markov Models

Bayes’ Nets

Decision Diagrams

Naive Bayes

Perceptrons

Kernels and Clustering

Advanced Applications: NLP, Games, Cars, Robotics, and Computer Vision

Artificial Intelligence: A Modern Approach

Book Link

Site Link

The de facto bible of artificial intelligence* It combines in-depth treatments of introductory and advanced concepts, along with historical background and accessible explanations. Including algorithms, code and pseudo-code, the book sits between master’s and PhD

Focuses on machine learning, deep learning, probabilistic programming, multiagent systems, and includes sections where the AI’s utility function is uncertain, rather than certain.

Topics

Problem-solving

Searching

Adversarial Search and Games

Constraint Satisfaction Problems

Knowledge, reasoning, and planning

Logical Agents

First-Order Logic

Knowledge Representation

Automated Planning

Uncertain knowledge and reasoning

Probabilistic Reasoning

Decision Making

Machine Learning

Learning from Example

Learning Probabilistic Models

Deep Learning

Reinforcement Learning

Communicating, perceiving, and acting

Natural Language Processing

Deep Learning for NLP

Computer Vision

Robotics

An Introduction to Probability and Inductive Logic

Book Link

Book focused on probability and logic from a philosophical rather than mathemetical perspective.

The book has been designed to offer maximal accessibility to the widest range of students (not only those majoring in philosophy) and assumes no formal training in elementary symbolic logic. It offers a comprehensive course covering all basic definitions of induction and probability, and considers such topics as decision theory, Bayesianism, frequency ideas, and the philosophical problem of induction.

Probability for the Enthusiastic Beginner

This book is a resource for high school and college students learning about probability for the first time. It covers all of the standard introductory topics, such as combinatorics, the rules of probability, Bayes’ theorem, and expectation value, and includes 150 worked-out problems. Calculus is not required, although some problems involve it. It can be used as a main text or supplement in an introductory probability course.

Book Link

Topics

Combinatorics

Bayes Theorem

Stirling’s Formula.

Expected values

Variance

Standard deviation

Distributions

Uniform

Bernoulli

Binomial

Exponential

Poisson

Gaussian

Gaussian approximations

Law of large numbers

Central limit theorem

Correlation and regression

Elements of Statistical Learning

Book Link

This book descibes the important ideas in areas such as data mining, machine learning, and bioinformatics in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry.

Topics

Overview of supervised learning

Linear methods for regression

Linear methods for classification

Basis expansions and regularization

Kernel smoothing methods

Model assessment and selection

Model inference and averaging

Additive models, trees, and related methods

Boosting and additive trees

Neural networks

Support vector machines and flexible discriminants

Prototype methods and nearest-neighbors

Unsupervised learning

Random forests

Ensemble learning

Undirected graphical models

High-dimensional problems

Statistical Rethinking: A Bayesian Course

Book Link

A modern course focused on bayesian statistics, and includes a course

The text presents generalized linear multilevel models from a Bayesian perspective, relying on a simple logical interpretation of Bayesian probability and maximum entropy. It covers from the basics of regression to multilevel models. The author also discusses measurement error, missing data, and Gaussian process models for spatial and network autocorrelation.

Topics

Sampling

Linear models

Multivariate linear models

Overfitting, regularization, and information criteria

Interactions

Markov chain Monte Carlo

Big entropy and the generalized linear model

Counting and classification

Multilevel models

covariance

Missing data

Pattern Recognition and Machine Learning

Topics

Probability Theory

Model Selection

The Curse of Dimensionality

Decision Theory

Information Theory

Probability Distributions

The Gaussian Distribution

Exponential Family

Nonparametric Methods

Linear Models for Regression

Linear Basis Function Models

Bayesian Linear Regression

Linear Models for Classification

Discriminant Functions

Probabilistic Generative Models

Probabilistic Discriminative Models

The Laplace Approximation

Neural Network

Feed-forward Network Functions

Network Training

Error Backpropagation

Hessian Matrix

Mixture Density Networks

Bayesian Neural Networks

Kernel Methods

Gaussian Processes

Sparse Kernel Machines

Maximum Margin Classifiers

Relevance Vector Machines

Graphical Models

Conditional Independence

Markov Random Fields

Inference in Graphical Models

Mixture Models and EM

K-means Clustering

Mixtures of Gaussians

Approximate Inference

Variational Inference

Variational Linear Regression

Sampling Methods

Basic Sampling Algorithms

Markov Chain Monte Carlo

Hybrid Monte Carlo Algorithm

Continuous Latent Variables

Principal Component Analysis

Probabilistic PCA

Kernel PCA

Nonlinear Latent Variable Models

Sequential Data

Markov Models

Hidden Markov Models

Linear Dynamical Systems

Combining Models

Bayesian Model Averaging

Committees

Boosting

Tree-based Models

Conditional Mixture Models

Deep Learning Goodfellow Book

The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology

Book Link

Topics

Linear Algebra

Probability and Information Theory

Numerical Computation

Deep Feedforward Networks

Regularization for Deep Learning

Optimization for Training Deep Models Gradient Descent and Structure of Neural Network Cost Functions

Tutorial on Optimization for Deep Networks

Batch Normalization

Convolutional Networks

Sequence Modeling: Recurrent and Recursive Networks

Linear Factors

Autoencoders

Representation Learning

Structured Probabilistic Models for Deep Learning

Monte Carlo Methods

Confronting the Partition Function

Reinforcement Learning: An Introduction

Book Link

Site Link

Part I covers as much of reinforcement learning as possible without going beyond the tabular case for which exact solutions can be found including UCB, Expected Sarsa, and Double Learning.

Part II extends these ideas to function approximation, such as artificial neural networks and the Fourier basis, and offers expanded treatment of off-policy learning and policy-gradient methods.

Part III has new chapters on reinforcement learning’s relationships to psychology and neuroscience, as well as an updated case-studies chapter including AlphaGo and AlphaGo Zero, Atari game playing, and IBM Watson’s wagering strategy.

The final chapter discusses the future societal impacts of reinforcement learning.

Topics

Tabular Solution Methods

Multi-arm Bandits

Finite Markov Decision Processes

Dynamic Programming

Monte Carlo Methods

Temporal-Difference Learning

Eligibility Traces

Planning and Learning with Tabular Methods

Approximate Solution Methods

On-policy Approximation of Action Values

Off-policy Approximation of Action Values

Policy Approximation

Psychology

Neuroscience

Applications and case studies

Papers

Adam: A Method for Stochastic Optimization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Faster R-CNN: towards real-time object detection with region proposal networks

Neural Machine Translation by Jointly Learning to Align and Translate

Human-level control through deep reinforcement learning

Mastering the game of Go with deep neural networks and tree search

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Semi-Supervised Classification with Graph Convolutional Networks

Explaining and Harnessing Adversarial Examples

ImageNet Classification with Deep Convolutional Neural Networks(alexnet)

Deep Residual Learning for Image Recognition (resnet)

Attention Is All You Need (Transformers)

Mike Jordan Book list

Extremely rigorous books recommend by Mike Jordan from Berkeley intended for those focused on research. I will probably never read these and feels impossible to get through these in one lifetime.

Hackernews Comment

Esentially all of the material in the following intermediate-level statistics book:

1.) Casella, G. and Berger, R.L. (2001). “Statistical Inference” Duxbury Press.

For a slightly more advanced book that’s quite clear on mathematical techniques, the following book is quite good:

2.) Ferguson, T. (1996). “A Course in Large Sample Theory” Chapman & Hall/CRC.

You’ll need to learn something about asymptotics at some point, and a good starting place is:

3.) Lehmann, E. (2004). “Elements of Large-Sample Theory” Springer.

Those are all frequentist books. You should also read something Bayesian:

4.) Gelman, A. et al. (2003). “Bayesian Data Analysis” Chapman & Hall/CRC.

you should start to read about Bayesian computation:

5.) Robert, C. and Casella, G. (2005). “Monte Carlo Statistical Methods” Springer.

On the probability front, a good intermediate text is:

6.) Grimmett, G. and Stirzaker, D. (2001). “Probability and Random Processes” Oxford.

At a more advanced level, a very good text is the following:

7.) Pollard, D. (2001). “A User’s Guide to Measure Theoretic Probability” Cambridge.

The standard advanced textbook is Durrett, R. (2005). “Probability: Theory and Examples” Duxbury.

Machine learning research also reposes on optimization theory. A good starting book on linear optimization that will prepare you for convex optimization:

8.) Bertsimas, D. and Tsitsiklis, J. (1997). “Introduction to Linear Optimization” Athena.

And then you can graduate to:

9.) Boyd, S. and Vandenberghe, L. (2004). “Convex Optimization” Cambridge.

Getting a full understanding of algorithmic linear algebra is also important. At some point you should feel familiar with most of the material in

10.) Golub, G., and Van Loan, C. (1996). “Matrix Computations” Johns Hopkins.

It’s good to know some information theory. The classic is:

11.) Cover, T. and Thomas, J. “Elements of Information Theory” Wiley.

Finally, if you want to start to learn some more abstract math, you might want to start to learn some functional analysis (if you haven’t already). Functional analysis is essentially linear algebra in infinite dimensions, and it’s necessary for kernel methods, for nonparametric Bayesian methods, and for various other topics. Here’s a book that I find very readable:

12.) Kreyszig, E. (1989). “Introductory Functional Analysis with Applications” Wiley.

Other

Superintelligence

Karpathy’s CS231n: Convolutional Neural Networks for Visual Recognition

A First Course in Probability

Bayesian Reasoning and Machine Learning

Dive into Deep Learning

Principles of Mathematical Analysis by Walter Rudin

Probability and Random Processes

Statistical Inference

Think Stats

Share this post