Philosophical question on logisitic regression: why isn't the optimal threshold value trained? ...

Could moose/elk survive in the Amazon forest?

What is the best way to deal with NPC-NPC combat?

Did the Roman Empire have penal colonies?

Israeli soda type drink

Do I need to watch Ant-Man and the Wasp and Captain Marvel before watching Avengers: Endgame?

Can I criticise the more senior developers around me for not writing clean code?

Multiple fireplaces in an apartment building?

Are there moral objections to a life motivated purely by money? How to sway a person from this lifestyle?

What is this word supposed to be?

How do I reattach a shelf to the wall when it ripped out of the wall?

Why did C use the -> operator instead of reusing the . operator?

Protagonist's race is hidden - should I reveal it?

How to find if a column is referenced in a computed column?

Air bladders in bat-like skin wings for better lift?

How long after the last departure shall the airport stay open for an emergency return?

My admission is revoked after accepting the admission offer

Why didn't the Space Shuttle bounce back into space as many times as possible so as to lose a lot of kinetic energy up there?

finding a tangent line to a parabola

All ASCII characters with a given bit count

When do you need buffers/drivers on buses in a microprocessor design?

Why do games have consumables?

Has a Nobel Peace laureate ever been accused of war crimes?

Crossed out red box fitting tightly around image

Should the Product Owner dictate what info the UI needs to display?



Philosophical question on logisitic regression: why isn't the optimal threshold value trained?



Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar ManaraWhy isn't Logistic Regression called Logistic Classification?Classification probability thresholdROC and false positive rate with over samplingGEE Logistic Model with Subject Specific Predictions?How to find the optimal cp value in rpart doing cross validation manually?Optimal cut-off calculation in logistic regressionDo I do threshold selection for my logit model on the testing or training subset?ROC curves from cross-validation are identical/overlaid and AUC is the same for each foldTurning Roc curve threshold by cross validationDetermine the cutoff threshold for binary classification models using cross validationHow are the training and cross-validation metrics calculated in H2O?Is it valid to use ROC calculated during test/validation to interpret results of final production model?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







2












$begingroup$


Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.



Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?










share|cite|improve this question











$endgroup$












  • $begingroup$
    Possible duplicate of Classification probability threshold
    $endgroup$
    – EdM
    19 mins ago






  • 2




    $begingroup$
    That thread is certainly related, but I wouldn't call it a duplicate.
    $endgroup$
    – gung
    11 mins ago


















2












$begingroup$


Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.



Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?










share|cite|improve this question











$endgroup$












  • $begingroup$
    Possible duplicate of Classification probability threshold
    $endgroup$
    – EdM
    19 mins ago






  • 2




    $begingroup$
    That thread is certainly related, but I wouldn't call it a duplicate.
    $endgroup$
    – gung
    11 mins ago














2












2








2





$begingroup$


Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.



Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?










share|cite|improve this question











$endgroup$




Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.



Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?







logistic cross-validation optimization roc threshold






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 20 mins ago









Sycorax

43.1k12112208




43.1k12112208










asked 22 mins ago









StatsSorceressStatsSorceress

15718




15718












  • $begingroup$
    Possible duplicate of Classification probability threshold
    $endgroup$
    – EdM
    19 mins ago






  • 2




    $begingroup$
    That thread is certainly related, but I wouldn't call it a duplicate.
    $endgroup$
    – gung
    11 mins ago


















  • $begingroup$
    Possible duplicate of Classification probability threshold
    $endgroup$
    – EdM
    19 mins ago






  • 2




    $begingroup$
    That thread is certainly related, but I wouldn't call it a duplicate.
    $endgroup$
    – gung
    11 mins ago
















$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
19 mins ago




$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
19 mins ago




2




2




$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung
11 mins ago




$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung
11 mins ago










2 Answers
2






active

oldest

votes


















2












$begingroup$

It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
    $endgroup$
    – StatsSorceress
    3 mins ago



















2












$begingroup$

Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.



A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.



However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.



Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.



For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    This doesn't really answer my question, but it's a very nice description of ROC curves.
    $endgroup$
    – StatsSorceress
    7 mins ago










  • $begingroup$
    In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    I was asking why we don't train the threshold instead of choosing it after training the model.
    $endgroup$
    – StatsSorceress
    6 mins ago










  • $begingroup$
    How would you train a threshold?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
    $endgroup$
    – StatsSorceress
    5 mins ago












Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405041%2fphilosophical-question-on-logisitic-regression-why-isnt-the-optimal-threshold%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
    $endgroup$
    – StatsSorceress
    3 mins ago
















2












$begingroup$

It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
    $endgroup$
    – StatsSorceress
    3 mins ago














2












2








2





$begingroup$

It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.






share|cite|improve this answer









$endgroup$



It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered 15 mins ago









gunggung

110k34268539




110k34268539












  • $begingroup$
    Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
    $endgroup$
    – StatsSorceress
    3 mins ago


















  • $begingroup$
    Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
    $endgroup$
    – StatsSorceress
    3 mins ago
















$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
3 mins ago




$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
3 mins ago













2












$begingroup$

Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.



A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.



However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.



Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.



For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    This doesn't really answer my question, but it's a very nice description of ROC curves.
    $endgroup$
    – StatsSorceress
    7 mins ago










  • $begingroup$
    In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    I was asking why we don't train the threshold instead of choosing it after training the model.
    $endgroup$
    – StatsSorceress
    6 mins ago










  • $begingroup$
    How would you train a threshold?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
    $endgroup$
    – StatsSorceress
    5 mins ago
















2












$begingroup$

Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.



A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.



However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.



Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.



For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    This doesn't really answer my question, but it's a very nice description of ROC curves.
    $endgroup$
    – StatsSorceress
    7 mins ago










  • $begingroup$
    In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    I was asking why we don't train the threshold instead of choosing it after training the model.
    $endgroup$
    – StatsSorceress
    6 mins ago










  • $begingroup$
    How would you train a threshold?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
    $endgroup$
    – StatsSorceress
    5 mins ago














2












2








2





$begingroup$

Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.



A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.



However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.



Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.



For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.






share|cite|improve this answer











$endgroup$



Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.



A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.



However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.



Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.



For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 8 mins ago

























answered 13 mins ago









SycoraxSycorax

43.1k12112208




43.1k12112208












  • $begingroup$
    This doesn't really answer my question, but it's a very nice description of ROC curves.
    $endgroup$
    – StatsSorceress
    7 mins ago










  • $begingroup$
    In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    I was asking why we don't train the threshold instead of choosing it after training the model.
    $endgroup$
    – StatsSorceress
    6 mins ago










  • $begingroup$
    How would you train a threshold?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
    $endgroup$
    – StatsSorceress
    5 mins ago


















  • $begingroup$
    This doesn't really answer my question, but it's a very nice description of ROC curves.
    $endgroup$
    – StatsSorceress
    7 mins ago










  • $begingroup$
    In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    I was asking why we don't train the threshold instead of choosing it after training the model.
    $endgroup$
    – StatsSorceress
    6 mins ago










  • $begingroup$
    How would you train a threshold?
    $endgroup$
    – Sycorax
    6 mins ago










  • $begingroup$
    Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
    $endgroup$
    – StatsSorceress
    5 mins ago
















$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
7 mins ago




$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
7 mins ago












$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
6 mins ago




$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
6 mins ago












$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
6 mins ago




$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
6 mins ago












$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
6 mins ago




$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
6 mins ago












$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
5 mins ago




$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
5 mins ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405041%2fphilosophical-question-on-logisitic-regression-why-isnt-the-optimal-threshold%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Gersau Kjelder | Navigasjonsmeny46°59′0″N 8°31′0″E46°59′0″N...

Hestehale Innhaldsliste Hestehale på kvinner | Hestehale på menn | Galleri | Sjå òg |...

What is the “three and three hundred thousand syndrome”?Who wrote the book Arena?What five creatures were...