Ensemble techniques are one of the most effective tools in a data scientist’s toolkit. They involve combining the predictions of multiple machine learning models to produce a single, often more accurate, prediction. Let’s explore the essence of ensemble methods, understand why they work, and break down how they are constructed. What Are Ensemble Techniques? At…
BLOGS
Working with PySpark: A Guide to Scalable Data Processing
PySpark has become an essential tool for processing and analyzing large-scale data, especially when working with distributed computing environments. Over time, I’ve explored various PySpark functionalities to overcome challenges related to memory constraints, merging datasets, handling historical data, and model workflows. This guide documents key insights and techniques for working efficiently with PySpark. Transitioning from…
Logistic Regression in Depth: Unpacking its Concepts, Benefits, and Challenges
Classification is a supervised algorithm where we are trying to predict the dependent variable. In classification, it is either or category. Eg: Pass/Fail or Yes/No. Whether student pass or fail is classification problem whereas what score the student secures is a linear regression problem. For example, “How likely a student is to complete a course…
Exploring Linear Regression fundamentals: A Deep Dive into Covariance and Correlation Concepts
In this blog, I will discuss about the basics of the most popular data-science and supervised learning technique which is linear regression. Linear regression tries to establish linear relationship between two or more variables. Let’s start with some simple two variables examples: Diving deep into one specific example- “Does heavier car have lower mileage?” Intuitively,…
Exploratory data analysis (EDA) – Quick Guide
In this blog, I’ll be discussing about descriptive statistics- the set of techniques that is used to describe data. I’ll cover description of data through numbers, measures of location, measures of dispersion, measures of correlation and also how to summarize the data. While EDA may not be the most complex thing in data science, it…
Decision Tree Explained Part-2
In this blog, I will explain the pruning process for decision trees. Ideally, we would like a classification tree that doesn’t over-fit the given training data. Under-fitting means the tree is very simple and has high classification error. And, overfitting means that the model has modelled the training data too well such that it has…
Mental math concepts for everyday math- Part 1
Turn your brain into calculator with the mental math concepts discussed in this blog series. Mental math stimulates both sides of brain and is strongly associated with better memory skills. It helps to reduce mistakes in problem solving and improves your self-confidence. It also stimulates your interest in math and ability to concentrate. The first…
Decision Trees Explained
Decision trees are very powerful and vey flexible machine learning techniques that allows us to do both classification and regression. In this blog, I would explain the underlying principles on which decision trees are built. In following blogs, I would also explain how decision trees tend to overfit the data, and how we could trim…
5 steps to stick to the change you decided to make
Are you someone who would want to take some action, for example- lose weight or file for GST but keep procrastinating? Do you decide to take action like start exercising or diet changes only when you hit the pain point? There are people with heart ailments who know they need to change their diet but…