Many times Data Engineers get confused when exactly to use Window Functions and when they are propriate over aggregate functions? The main two cases for Window Functions would be following:

What is window function

A window function is a calculation across a set of rows in a table that are somehow related to the current row. Means they can be used for calculating running totals that incorporate the current row or, ranking records across rows, inclusive of the current…

CASE statements in SQL very helpful when it comes to the problem where result depends on a specific condition that could be apply to column. CASE statement helps to create Derive column that means you will take existing columns and modify it. By using CASE statement it is the way to perform “IF” “THEN” logic in SQL. The most important tips for using CASE statemen are following:

  • The CASE statement always goes in the SELECT clause.
  • CASE must include the following components: WHEN, THEN, and END. …

What is Data Cleaning

Data cleaning is foundational skill to be a Data Scientist or Data Analyst. By definition, data cleaning is a process of cleaning up raw data to make it usable and ready for analysis. Following are most common cases when data cleaning need to be preformed.

  • All data lumped together in a single column and you need to parse it to extract necessary information.
  • Data could default to string data types, and you need to cast each column appropriately to run computations.
  • Data could have un-standardized units and you need to normalize the column to ensure equally…

Data preprocessing and normalization become very important when it comes to the implementation of different Machine Learning Algorithms. As data preprocessing can affect the outcome of the learning model significantly, it is very important that all features are on the same scale. Normalization is important in such algorithms as k-NN, support vector machines, neural networks, principal components. The type of feature preprocessing and normalization that’s needed can depend on the data.

Preprocessing types

There are several different methods for data rescaling. The images below shows the four most common that could be used in machine learning algorithms.

Different ways to rescale and preprocess a data set.

The first plot under original…

One of the tasks when building a supervised learning model, whether it's for classification or regression, is to create a model that will make correct predictions learning from the training data. But the model will be useless if we can not make correct predictions on unseen set of data as well . This ability to perform well on a hold out test set is the algorithm’s ability to generalize. But how do we know if the trained model will generalize well or will be accurate on unseen before data.

In general, ML makes following assumptions about the data :

  • Future…

During past decades we have witnessed the power of visual information. As our life becoming more busy and more intense, we have less time to process and understand information. Many times people just relying on visual information without understanding how much misleading it could deliver in order to create different opinion about given information.

Famous writer and visual journalist Alberto Cairo recently published “The Truthful Art” where he emphasize the Five Qualities of Great Visualization which he encourage to keep in mind for everyone who deal with data visualization.

~ Is it truthful? Means based on through and honest research.

Diving deep in the world of Big Data we always looking for better tools to explore , operate and modify data. Any tool we would like to use will come with some advantages and disadvantages during the programming process.It is very important to understand when and how it is better to use python tools.

Lists are one one of the main built-in data structures in Python that can contain values of various data types.One of the most common operations on lists is “for loop” that can be easily replaced with list comprehension. …

Ivan Zakharchuk

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store