{"id":4699,"date":"2018-10-31T15:27:02","date_gmt":"2018-10-31T21:27:02","guid":{"rendered":"http:\/\/www.designandexecute.com\/designs\/?p=4699"},"modified":"2023-10-30T11:39:55","modified_gmt":"2023-10-30T17:39:55","slug":"machine-learning-terms-every-data-scientist-should-know","status":"publish","type":"post","link":"https:\/\/www.designandexecute.com\/designs\/machine-learning-terms-every-data-scientist-should-know\/","title":{"rendered":"Machine Learning Terms Every Data Scientist Should Know"},"content":{"rendered":"<h2 class=\"graf graf--p graf--leading\"><strong>OverFitting and UnderFitting:\u00a0\u00a0<\/strong><\/h2>\n<p><strong>Overfitting<\/strong>\u00a0occurs when a\u00a0<a title=\"Machine Learning Lesson of the Day \u2013 Classification and Regression\" href=\"https:\/\/chemicalstatistician.wordpress.com\/2014\/01\/05\/machine-learning-lesson-of-the-day-classification-and-regression\/\">statistical model<\/a>\u00a0or\u00a0<a title=\"Machine Learning Lesson of the Day \u2013 Supervised and Unsupervised Learning\" href=\"https:\/\/chemicalstatistician.wordpress.com\/2014\/01\/04\/machine-learning-lesson-of-the-day-supervised-and-unsupervised-learning\/\">machine learning<\/a>\u00a0algorithm\u00a0<strong>captures the noise<\/strong>\u00a0of the data. \u00a0Intuitively, overfitting occurs when the model or the algorithm fits the data too well. \u00a0Specifically, overfitting occurs if the model or algorithm shows\u00a0<strong>low bias<\/strong>\u00a0but\u00a0<strong>high variance<\/strong>. \u00a0Overfitting is often a result of an excessively complicated model, and it can be prevented by fitting multiple models and using\u00a0<a title=\"Machine Learning Lesson of the Day \u2013 Using Validation to Assess Predictive Accuracy in Supervised Learning\" href=\"https:\/\/chemicalstatistician.wordpress.com\/2014\/01\/07\/machine-learning-lesson-of-the-day-using-validation-to-assess-predictive-accuracy-in-supervised-learning\/\">validation<\/a>\u00a0or\u00a0<a title=\"Machine Learning Lesson of the Day \u2013 Cross-Validation\" href=\"https:\/\/chemicalstatistician.wordpress.com\/2014\/01\/17\/machine-learning-lesson-of-the-day-cross-validation\/\">cross-validation<\/a>\u00a0to compare their predictive accuracies on test data.<\/p>\n<p><strong>Underfitting<\/strong>\u00a0occurs\u00a0when a statistical model or machine learning algorithm\u00a0<strong>cannot capture the underlying trend<\/strong>\u00a0of the data. \u00a0Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough. \u00a0Specifically, underfitting occurs if the model or algorithm shows\u00a0<strong>low variance<\/strong>\u00a0but\u00a0<strong>high bias<\/strong>. \u00a0Underfitting is often a result of an excessively simple model.<\/p>\n<h2><strong>Ensemble methods<\/strong><\/h2>\n<p class=\"graf graf--p graf--leading\"><strong>Ensemble methods<\/strong> are meta-algorithms that combine several machine learning techniques into one predictive model to <strong class=\"markup--strong markup--p-strong\">decrease<\/strong>\u00a0<strong class=\"markup--strong markup--p-strong\">variance\u00a0<\/strong>(bagging),\u00a0<strong class=\"markup--strong markup--p-strong\">bias<\/strong> (boosting), or <strong class=\"markup--strong markup--p-strong\">improve predictions<\/strong>\u00a0(stacking).<\/p>\n<p><strong>Bagging<\/strong>: Short for <strong>B<\/strong>ootstrap <strong>Agg<\/strong>regat<strong>ing<\/strong> using a statistical technique of Bootstrap Sampling, selecting samples from a data set.\u00a0 Each data point has an equal probability of getting picked and can be repeatedly picked in the bagging technique.<\/p>\n<p><strong>Random Subspace<\/strong>: Randomly picking a subset of features to build the decision tree. Multiple trees with a subset of different features.\u00a0 You can build a single tree with all the features, but complex models do not increase the model&#8217;s accuracy.<\/p>\n<p><strong>Random Forrest<\/strong> uses a combination of both Bagging and Random Subspace.\u00a0 You can use either technique independently, but when used together, you practice the <em class=\"markup--em markup--li-em\">parallel<\/em> ensemble methods known as random forest.\u00a0\u00a0The primary motivation of parallel methods is to <strong class=\"markup--strong markup--li-strong\">exploit independence between the base learners<\/strong> since the error can be reduced dramatically by averaging. For more detail on <a href=\"http:\/\/www.designandexecute.com\/designs\/how-does-random-forrest-apply-to-artificial-intelligence-ai\/\">Random Forrest<\/a><\/p>\n<p><strong>Homogeneous ensembles:\u00a0<\/strong>using a single base learning algorithm to create the model.<\/p>\n<p><em class=\"markup--em markup--pullquote-em\">Combining stable models is\u00a0less advantageous since the ensemble will not help improve generalization performance.<\/em><\/p>\n<p><strong>Boosting<\/strong>: Each tree is built sequentially from a different subset of the training set. <a href=\"http:\/\/www.designandexecute.com\/designs\/what-is-boosting-in-artificial-intelligence-ai\/\">See more details on Boosting.<\/a><\/p>\n<p><strong>Gradient Decent<\/strong>: Each tree has some learning from the previous tree that was built.\u00a0 Adjusting the probability of a data point getting in the following training set.<\/p>\n<p><strong>Gradient Boosted<\/strong>: This is the combination of boosting and gradient descent techniques. This is another popular ensemble technique where Data points classified correctly are down-weighted, and the incorrect ones are up-weighted so each subsequent iteration will have a concentration of incorrectly classified data points.<\/p>\n<p><strong>Note<\/strong> for gradient boosted trees: the larger the data set use fewer estimators, and for smaller data sets use more estimators.<\/p>\n<p><strong>Hyper Parameter Tuning: <\/strong>an optimization technique that examines a limited set of combinations while still finding the best parameter for building the model or decision tree.\u00a0 Bute force can do it, but there might be too many combinations to traverse to get the best parameters for the algorithm.\u00a0 This series of scored trails for the combination range specified is called hyperparameter tuning.<\/p>\n<p><strong>Thanks for reading\u00a0\u2764<\/strong><\/p>\n<div class=\"gv-post-content clearfix\">\n<p class=\"graf graf--h4\"><strong>Follow me:\u00a0<a class=\"markup--anchor markup--h4-anchor\" href=\"https:\/\/www.instagram.com\/taylorchooquan\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.instagram.com\/garyvee\">Instagram<\/a>\u00a0|\u00a0<a class=\"markup--anchor markup--h4-anchor\" href=\"https:\/\/www.facebook.com\/stephen.chooquan\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.facebook.com\/gary\">Facebook\u00a0<\/a>| <a href=\"https:\/\/www.linkedin.com\/in\/stephenchooquan\/\" target=\"_blank\" rel=\"noopener noreferrer\">LinkedIn<\/a><\/strong><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>OverFitting and UnderFitting:\u00a0\u00a0 Overfitting\u00a0occurs when a\u00a0statistical model\u00a0or\u00a0machine learning\u00a0algorithm\u00a0captures the noise\u00a0of the data. \u00a0Intuitively, overfitting occurs when the model or the algorithm fits the data too well. \u00a0Specifically, overfitting occurs if the model or algorithm shows\u00a0low bias\u00a0but\u00a0high variance. \u00a0Overfitting is often a result of an excessively complicated model, and it can be prevented by fitting multiple [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":15795,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32,31],"tags":[46],"class_list":["post-4699","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bi-dashboards-analytics","category-bi-data-warehouse","tag-machine-learning"],"jetpack_featured_media_url":"https:\/\/www.designandexecute.com\/designs\/wp-content\/uploads\/2018\/10\/Machine-learning_DevOps-Artisan.jpg","_links":{"self":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/4699","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/comments?post=4699"}],"version-history":[{"count":5,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/4699\/revisions"}],"predecessor-version":[{"id":19183,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/4699\/revisions\/19183"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media\/15795"}],"wp:attachment":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media?parent=4699"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/categories?post=4699"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/tags?post=4699"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}