Ji Kim

Hello:)

Optimizer

June 17, 2022 less than 1 minute read

Stochastic Gradient Descent

Let’s say there are three students who work on the 100 math questions.

Student A works 100 questions at once then checks the answers.

Student B works on 10 questions at once then checks the answers and rework and work next other 10 questions

Student C works on 1 question at once then checks an answer and a next question step by step.

Student C stands for Stochastic Gradient Descent. As you can imagine, it takes less time but it can’t guarantee the accuracy.

Layer-wise Adaptive Rate Scaling(LARS)

In order to issue to train the model with large batch size, the LARS has been proposed.

Share on

Twitter Facebook LinkedIn

Leave a comment

You may also enjoy

November 17, 2022 less than 1 minute read

Linear Algebra in PCA

November 17, 2022 less than 1 minute read

Keywords: Linear Combination, Linear Transformation, Basis, Eigenvector

How the MSE is used in decision tree regression

November 16, 2022 less than 1 minute read

Keywords: Decision Tree and MSE

Variational Inference

November 7, 2022 less than 1 minute read

Keywords: Variational Inference, Marginal Probability, Posteriori Probability, Bayes’ Theorem