amiepsa's website

data mining

A linear regression model assumes that the relationship between the target variable and the dependent variables are somehow linearlly dependent on each other Loss function in linear regression is the mean squared error Gradient descent algo is used to optimize the weights L1 regularization will encourage sparcity - if a specific feature is important, then it keeps it but if it is not then it sets the weights of that feature to 0

  1. Train a model
  2. Evaluate baseline performance - use a scoring function (accuracy, R-sqaured)
  3. Take each feature's randomly shuffle it breaking the relationship. After that if the performance of this randomized dataset decreases (when compared to the baseline score), that means that feature is not important. Feature importance via permutation - formula is important for final exam! Limitations - doesn't work well with correlated features ​Work around is to use collinearity test or PCA and then use a single PCA
  1. List all possible features in the subset except for the feature you want to check for - ie. check the importance or how much this feature contributes to the model.
  2. Evaluate the importance of this feature with all the different features - and then you average
  1. Forward pass: Input goes through the network → compute predictions.
  2. Compute loss: Compare predictions to true values using the loss function.
  3. Backward pass (backpropagation): Compute gradients of the loss w.r.t. each parameter.
  4. Gradient descent step: Use those gradients to update weights and reduce loss.

#study-notes