Is it OK to use the testing sample to compare algorithms? Announcing the arrival of Valued...
Is there any significance to the prison numbers of the Beagle Boys starting with 176-?
malloc in main() or malloc in another function: allocating memory for a struct and its members
Where and when has Thucydides been studied?
What are some likely causes to domain member PC losing contact to domain controller?
Vertical ranges of Column Plots in 12
Centre cell vertically in tabularx
Pointing to problems without suggesting solutions
What does 丫 mean? 丫是什么意思?
How to make an animal which can only breed for a certain number of generations?
How to evaluate this function?
Nose gear failure in single prop aircraft: belly landing or nose landing?
How much damage would a cupful of neutron star matter do to the Earth?
Relating to the President and obstruction, were Mueller's conclusions preordained?
Is this Kuo-toa homebrew race balanced?
Did any compiler fully use 80-bit floating point?
Why not use the yoke to control yaw, as well as pitch and roll?
How will be cipher selected when Client is running on version TLS 1.3 and server is running on TLS 1.2?
Statistical analysis applied to methods coming out of Machine Learning
Does the transliteration of 'Dravidian' exist in Hindu scripture? Does 'Dravida' refer to a Geographical area or an ethnic group?
.bashrc alias for a command with fixed second parameter
How to resize main filesystem
Why weren't discrete x86 CPUs ever used in game hardware?
How could a hydrazine and N2O4 cloud (or it's reactants) show up in weather radar?
Putting class ranking in CV, but against dept guidelines
Is it OK to use the testing sample to compare algorithms?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsCan I use the test dataset to select a model?Training Validation Testing set split for facial expression datasetSample selection through clusteringPossible Reason for low Test accuracy and high AUCOverfitted model produces similar AUC on test set, so which model do I go with?Hyperparameter tuning for stacked modelsHyper-parameter tuning when you don't have an access to the test dataCan I use the test dataset to select a model?Oversampling before Cross-Validation, is it a problem?How to plan a model analysis that avoids overfitting?Supervised multiclass classification : is ANN a good idea ? or use other classifiers?
$begingroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?
machine-learning scikit-learn sampling
$endgroup$
add a comment |
$begingroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?
machine-learning scikit-learn sampling
$endgroup$
1
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
2 hours ago
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
36 mins ago
add a comment |
$begingroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?
machine-learning scikit-learn sampling
$endgroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?
machine-learning scikit-learn sampling
machine-learning scikit-learn sampling
asked 4 hours ago
Dan ChaltielDan Chaltiel
1385
1385
1
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
2 hours ago
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
36 mins ago
add a comment |
1
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
2 hours ago
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
36 mins ago
1
1
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
2 hours ago
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
2 hours ago
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
36 mins ago
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
36 mins ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
4 hours ago
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
4 hours ago
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
3 hours ago
add a comment |
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
24 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
4 hours ago
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
4 hours ago
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
3 hours ago
add a comment |
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
4 hours ago
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
4 hours ago
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
3 hours ago
add a comment |
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
edited 3 hours ago
answered 4 hours ago
Simon LarssonSimon Larsson
1,015214
1,015214
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
4 hours ago
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
4 hours ago
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
3 hours ago
add a comment |
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
4 hours ago
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
4 hours ago
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
3 hours ago
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
4 hours ago
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
4 hours ago
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
4 hours ago
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
4 hours ago
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
3 hours ago
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
3 hours ago
add a comment |
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
24 mins ago
add a comment |
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
24 mins ago
add a comment |
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 29 mins ago
Cameron KingCameron King
211
211
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
24 mins ago
add a comment |
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
24 mins ago
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
24 mins ago
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
24 mins ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
2 hours ago
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
36 mins ago