If you are a software engineer or a programmer you must have used StackOverflow at least once in your lifetime. But have you ever wondered how StackOverflow predicts the tags for a given question ? In this blog, I will discuss the StackOverflow tag predictor case study.

- Overview of Stack OverFlow Dataset.
- Exploratory Data Analysis.
- Data Preprocessing.
- Downscaling of data.
- Train-Test split.
- Text Featurization using Tfidf Vectorizer.
- Hyper Parameter Tuning.
- Logistic Regression with OneVsRest Classifier
- OneVsRestClassifier with SVM
- Conclusions.
- Enhancements.

Stack Overflow is the largest, most trusted online community for developers to learn, share their programming knowledge, and build their…

- Overview of Dataset.
- Data Preprocessing.
- Train-Test split.
- Text Featurization using Bag of Words.
- Hyper Parameter Tuning.
- Model Building using the Naive Bayes algorithm.
- Performance Metrics.
- Model deployment into Web app using Flask API.
- Production of the model by Heroku platform.
- Results.

This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review. We also have reviews from all other Amazon categories.

Amazon reviews are often the most publicly visible reviews of consumer…

In this blog, we’ll try to understand one of the most important algorithms in machine learning i.e. Random Forest Algorithm. We will try to look at the things that make Random Forest so special and will try to implement it on a real-world dataset.

- What Are Ensembles?
- Types of Ensemble Learning.
- Bagging.
- Random Forest and Construction.
- Best and Worst cases of Random Forest.
- Boosting.
- Types of Boosting.
- Gradient Boosting.
- AdaBoost (Adaptive Boosting).
- XGBoost.
- Stacking Classifier.
- Cascading Classifier.
- Random Forest and XGBOOST with Amazon Food Reviews.

Commonly, the individual model suffers from bias or variances and that’s why we need ensemble…

Decision trees are a popular supervised learning method for a variety of reasons. The benefits of decision trees include that they can be used for both regression and classification, they are easy to interpret and they don’t require feature scaling. They have several flaws including being prone to overfitting.

- What are Decision Trees?.
- Geometric Intuition of Decision Trees.
- Entropy.
- Information Gain.
- Gini impurity.
- Play Tennis Dataset Example of Decision Tree.
- Steps to Constructing a Decision Tree.
- Decision Tree Regression.
- Real-world cases of Decision Tree.
- Best and Worst cases of Decision Tree Algorithm.
- Decision Tree with Amazon Food Reviews.

Decision trees…

SVM is a supervised Machine Learning algorithm that is used in many classifications and regression problems. It still presents as one of the most used robust prediction methods that can be applied to many use cases involving classifications.

- Geometric Intuition Of Support Vector Machines.
- Mathematical Formulation of Support Vector Machines.
- Loss Minimization Interpretation of SVMs.
- Dual Form of Support Vector Machines.
- Kernel Trick in Support Vector Machines.
- Train and Runtime Complexities of SVMs.
- Support Vector Machines — Regression (SVR)
- Best and Worst cases of Support Vector Machines Algorithm.
- Support Vector Machines with Amazon Food Reviews.

Logistic Regression doesn’t care whether…

Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. There are lots of classification problems that are available, but the logistics regression is common and is a useful regression method for solving the binary classification problem.

There are lots of classification problems that are available, but the logistics regression is common and is a useful regression method for solving the binary classification problem.

- Geometric Intuition Of Logistic Regression

2. Regularization techniques to avoid Overfitting and Underfitting

3. Probabilistic interpretation of Logistic Regression

4. Loss Minimization Interpretation of Logistic Regression

5.Implementation of Logistic Regression…

The **solution** to an **optimization problem** can be done by selecting different methods. Moreover, the user can navigate on the surface or curve to establish an initial point and find the optimal or critical point, which can be observed on the plotted function.

1.Single Value Differentiation

2. Minima and Maxima

3. Gradient descent algorithm

4. Steps for Gradient descent algorithm

5. Types of Gradient Descent algorithms

6. Implementation of Stochastic Gradient Descent

For Optimization problems Differentiation is very important, Let’s see some maths,

**Differentiation** allows us to find rates of change. …

- Geometric Intuition for Linear Regression

2. Linear Regression using Loss-Minimization

3. Assumptions of Linear Regression

4. Implementation of the Linear Regression using Python

Regression analysis is a form of predictive modeling technique that investigates the relationship between a dependent and independent variable.

Linear regression is perhaps one of the most well known and well-understood algorithms in statistics and machine learning. …

Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the simplest supervised learning algorithms. Naive Bayes classifier is a fast, accurate, and reliable algorithm. Naive Bayes classifiers have high accuracy and speed on large datasets.

To understand the Naive Bayes algorithm first we want to know some basic concepts of probability.

- Probability

2. Conditional Probability

3. Independent Events

4. Mutually Exclusive Events

5. Bayes Theorem

6. Naive Bayes Algorithm

7. Toy Example using Naive Bayes

8. Naive Bayes Algorithm on Text data

9. Laplace (or) Additive Smoothing

10. Log-Probabilities and Numerical Stability

11. Bias-variance…

**First We want to know What is Amazon Fine Food Review Analysis?**

This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review. We also have reviews from all other Amazon categories.

Amazon reviews are often the most publicly visible reviews of consumer products. …

Trained on Data Science and Machine Learning at @6benches