validation vs test vs training accuracy, which one to compare for claiming overfit?Which observation to use...

Time travel short story where dinosaur doesn't taste like chicken

Good allowance savings plan?

infinitive telling the purpose

Why don't MCU characters ever seem to have language issues?

It's a yearly task, alright

Is "history" a male-biased word ("his+story")?

What is the likely impact on flights of grounding an entire aircraft series?

Is having access to past exams cheating and, if yes, could it be proven just by a good grade?

US to Europe trip with Canada layover- is 52 minutes enough?

Recursive parser from Binary to JSON output

Do I need to leave some extra space available on the disk which my database log files reside, for log backup operations to successfully occur?

What is the blue range indicating on this manifold pressure gauge?

Is this animal really missing?

Want to switch to tankless, but can I use my existing wiring?

Co-worker team leader wants to inject the crap software product of his friends into our development. What should I say to our common boss?

Straight line with arrows and dots

Who is our nearest neighbor

Question about partial fractions with irreducible quadratic factors

Do the Bracer of Flying Daggers benefit from the Dueling Fighting style?

What to do when during a meeting client people start to fight (even physically) with each others?

When two POV characters meet

Are there situations where a child is permitted to refer to their parent by their first name?

Force user to remove USB token

What Happens when Passenger Refuses to Fly Boeing 737 Max?



validation vs test vs training accuracy, which one to compare for claiming overfit?


Which observation to use when doing k-fold validation or boostrap?why k-fold cross validation (CV) overfits? Or why discrepancy occurs between CV and test set?Consistently inconsistent cross-validation results that are wildly different from original model accuracyWhy use both validation set and test set?Reporting test result for cross-validation with Neural Networkvalidation/training accuracy and overfittingValidation accuracy for neural networkTraining score at parameter tuning lower than on hold out test set (RandomForestClassifier)Terminology - cross-validation, testing and validation set for classification taskValidation accuracy is always close to training accuracy













2












$begingroup$


I have read on the several answers here and on the internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



For example:



I divide my data for 70% training and 30% test.



When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



Afterward, I test the model on 30% test data and get Test Accuracy.



In this case, what will be training accuracy? and which two accuracies I compare to see if the model is overfitting or not?



This is my first question on this platform so please ignore errors.










share|improve this question







New contributor




A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    2












    $begingroup$


    I have read on the several answers here and on the internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



    But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



    For example:



    I divide my data for 70% training and 30% test.



    When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



    Afterward, I test the model on 30% test data and get Test Accuracy.



    In this case, what will be training accuracy? and which two accuracies I compare to see if the model is overfitting or not?



    This is my first question on this platform so please ignore errors.










    share|improve this question







    New contributor




    A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      2












      2








      2





      $begingroup$


      I have read on the several answers here and on the internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



      But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



      For example:



      I divide my data for 70% training and 30% test.



      When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



      Afterward, I test the model on 30% test data and get Test Accuracy.



      In this case, what will be training accuracy? and which two accuracies I compare to see if the model is overfitting or not?



      This is my first question on this platform so please ignore errors.










      share|improve this question







      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I have read on the several answers here and on the internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



      But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



      For example:



      I divide my data for 70% training and 30% test.



      When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



      Afterward, I test the model on 30% test data and get Test Accuracy.



      In this case, what will be training accuracy? and which two accuracies I compare to see if the model is overfitting or not?



      This is my first question on this platform so please ignore errors.







      machine-learning cross-validation accuracy overfitting






      share|improve this question







      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 hours ago









      A.BA.B

      1113




      1113




      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$


          When I get to run 10 fold cross-validation, I get 10 accuracies that I
          can take the average/mean of. should I call this mean as validation
          accuracy?




          No. It is a [estimate of] test accuracy.

          The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model (e.g. avoid over-fitting), meaning it affects the final model. However, in your case, 10-fold CV tests an already-built model on the 10% hold-out, thus the hold-out is a test set not a validation set.




          Afterward, I test the model on 30% test data and get Test Accuracy.




          If you don't use the K-fold to select/build a better model, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep a final test set, since the result of K-fold would be a validation accuracy.




          In this case, what will be training accuracy?




          From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only returns the test accuracies. Here is how to get both:



          from  sklearn import model_selection
          from sklearn import datasets
          from sklearn import svm

          iris = datasets.load_iris()
          clf = svm.SVC(kernel='linear', C=1)
          scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
          print('Train scores:')
          print(scores['train_score'])
          print('Test scores:')
          print(scores['test_score'])



          and which two accuracies I compare to see if the model is overfitting or not?




          You should compare the training and test accuracies to identify over-fitting. A training accuracy subjectively far higher than test accuracy indicates over-fitting.



          I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.






          share|improve this answer











          $endgroup$













          • $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            1 hour ago












          • $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            1 hour ago



















          1












          $begingroup$

          Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






          share|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            A.B is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47263%2fvalidation-vs-test-vs-training-accuracy-which-one-to-compare-for-claiming-overf%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$


            When I get to run 10 fold cross-validation, I get 10 accuracies that I
            can take the average/mean of. should I call this mean as validation
            accuracy?




            No. It is a [estimate of] test accuracy.

            The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model (e.g. avoid over-fitting), meaning it affects the final model. However, in your case, 10-fold CV tests an already-built model on the 10% hold-out, thus the hold-out is a test set not a validation set.




            Afterward, I test the model on 30% test data and get Test Accuracy.




            If you don't use the K-fold to select/build a better model, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep a final test set, since the result of K-fold would be a validation accuracy.




            In this case, what will be training accuracy?




            From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only returns the test accuracies. Here is how to get both:



            from  sklearn import model_selection
            from sklearn import datasets
            from sklearn import svm

            iris = datasets.load_iris()
            clf = svm.SVC(kernel='linear', C=1)
            scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
            print('Train scores:')
            print(scores['train_score'])
            print('Test scores:')
            print(scores['test_score'])



            and which two accuracies I compare to see if the model is overfitting or not?




            You should compare the training and test accuracies to identify over-fitting. A training accuracy subjectively far higher than test accuracy indicates over-fitting.



            I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.






            share|improve this answer











            $endgroup$













            • $begingroup$
              I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
              $endgroup$
              – Ben Reiniger
              1 hour ago












            • $begingroup$
              @BenReiniger You are right I should clear this case.
              $endgroup$
              – Esmailian
              1 hour ago
















            2












            $begingroup$


            When I get to run 10 fold cross-validation, I get 10 accuracies that I
            can take the average/mean of. should I call this mean as validation
            accuracy?




            No. It is a [estimate of] test accuracy.

            The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model (e.g. avoid over-fitting), meaning it affects the final model. However, in your case, 10-fold CV tests an already-built model on the 10% hold-out, thus the hold-out is a test set not a validation set.




            Afterward, I test the model on 30% test data and get Test Accuracy.




            If you don't use the K-fold to select/build a better model, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep a final test set, since the result of K-fold would be a validation accuracy.




            In this case, what will be training accuracy?




            From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only returns the test accuracies. Here is how to get both:



            from  sklearn import model_selection
            from sklearn import datasets
            from sklearn import svm

            iris = datasets.load_iris()
            clf = svm.SVC(kernel='linear', C=1)
            scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
            print('Train scores:')
            print(scores['train_score'])
            print('Test scores:')
            print(scores['test_score'])



            and which two accuracies I compare to see if the model is overfitting or not?




            You should compare the training and test accuracies to identify over-fitting. A training accuracy subjectively far higher than test accuracy indicates over-fitting.



            I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.






            share|improve this answer











            $endgroup$













            • $begingroup$
              I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
              $endgroup$
              – Ben Reiniger
              1 hour ago












            • $begingroup$
              @BenReiniger You are right I should clear this case.
              $endgroup$
              – Esmailian
              1 hour ago














            2












            2








            2





            $begingroup$


            When I get to run 10 fold cross-validation, I get 10 accuracies that I
            can take the average/mean of. should I call this mean as validation
            accuracy?




            No. It is a [estimate of] test accuracy.

            The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model (e.g. avoid over-fitting), meaning it affects the final model. However, in your case, 10-fold CV tests an already-built model on the 10% hold-out, thus the hold-out is a test set not a validation set.




            Afterward, I test the model on 30% test data and get Test Accuracy.




            If you don't use the K-fold to select/build a better model, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep a final test set, since the result of K-fold would be a validation accuracy.




            In this case, what will be training accuracy?




            From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only returns the test accuracies. Here is how to get both:



            from  sklearn import model_selection
            from sklearn import datasets
            from sklearn import svm

            iris = datasets.load_iris()
            clf = svm.SVC(kernel='linear', C=1)
            scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
            print('Train scores:')
            print(scores['train_score'])
            print('Test scores:')
            print(scores['test_score'])



            and which two accuracies I compare to see if the model is overfitting or not?




            You should compare the training and test accuracies to identify over-fitting. A training accuracy subjectively far higher than test accuracy indicates over-fitting.



            I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.






            share|improve this answer











            $endgroup$




            When I get to run 10 fold cross-validation, I get 10 accuracies that I
            can take the average/mean of. should I call this mean as validation
            accuracy?




            No. It is a [estimate of] test accuracy.

            The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model (e.g. avoid over-fitting), meaning it affects the final model. However, in your case, 10-fold CV tests an already-built model on the 10% hold-out, thus the hold-out is a test set not a validation set.




            Afterward, I test the model on 30% test data and get Test Accuracy.




            If you don't use the K-fold to select/build a better model, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep a final test set, since the result of K-fold would be a validation accuracy.




            In this case, what will be training accuracy?




            From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only returns the test accuracies. Here is how to get both:



            from  sklearn import model_selection
            from sklearn import datasets
            from sklearn import svm

            iris = datasets.load_iris()
            clf = svm.SVC(kernel='linear', C=1)
            scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
            print('Train scores:')
            print(scores['train_score'])
            print('Test scores:')
            print(scores['test_score'])



            and which two accuracies I compare to see if the model is overfitting or not?




            You should compare the training and test accuracies to identify over-fitting. A training accuracy subjectively far higher than test accuracy indicates over-fitting.



            I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 1 hour ago

























            answered 1 hour ago









            EsmailianEsmailian

            912110




            912110












            • $begingroup$
              I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
              $endgroup$
              – Ben Reiniger
              1 hour ago












            • $begingroup$
              @BenReiniger You are right I should clear this case.
              $endgroup$
              – Esmailian
              1 hour ago


















            • $begingroup$
              I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
              $endgroup$
              – Ben Reiniger
              1 hour ago












            • $begingroup$
              @BenReiniger You are right I should clear this case.
              $endgroup$
              – Esmailian
              1 hour ago
















            $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            1 hour ago






            $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            1 hour ago














            $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            1 hour ago




            $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            1 hour ago











            1












            $begingroup$

            Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






            share|improve this answer









            $endgroup$


















              1












              $begingroup$

              Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






              share|improve this answer









              $endgroup$
















                1












                1








                1





                $begingroup$

                Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






                share|improve this answer









                $endgroup$



                Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 2 hours ago









                astelastel

                111




                111






















                    A.B is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    A.B is a new contributor. Be nice, and check out our Code of Conduct.













                    A.B is a new contributor. Be nice, and check out our Code of Conduct.












                    A.B is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47263%2fvalidation-vs-test-vs-training-accuracy-which-one-to-compare-for-claiming-overf%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Gersau Kjelder | Navigasjonsmeny46°59′0″N 8°31′0″E46°59′0″N...

                    Nässjö kommun Tettstader | Kjelder | NavigasjonsmenyeVIAFISNIGeoNamesMusicBrainz (area)

                    Kvitkval Innhaldsliste Taksonomi og utvikling | Utsjånad og levevis | Utbreiing | Åtferd |...