Work Flow to build a Machine Learning Algorithm:
Earlier, we discussed about Machine Learning and its types. In this document, we will discuss about the workflow to build a model and working on historical data (Data Preprocessing). Before building a model, we need to transform data from an unstructured format like incomplete data, inconsistent and lacking trends, to a structured format. Most of the time, we gather data from different sources. These sources are of different formats which is not feasible for analysis and prediction.
Hence, we preprocess the data before building a proper workflow. These are the steps involved in the preprocessing stage:
- Data Cleaning
- Data Integration
- Data Transformation
- Data Reduction etc.
Data can be cleaned by filling missing values i.e. (There are some imputing techniques to fill the missing values), smoothing noisy data and removing unused columns like ID columns.
Data from different sources are put together at one place.
In this step, the data is normalized, aggregated and generalized.
This step aims to present a reduced representation of the data in a data warehouse like applying slicing and dicing operations.
After doing different data preprocessing techniques our data is ready to build the model.
After getting the final historical data, split data into 2 parts as Training (70%) and Testing data (30%) sets. First, we will train our model with the training data as per our requirement and test the data with the testing data set.
Now, depending on the accuracy and performance of our model, we check whether our model is overfit or underfit.
To resolve this problem, need to take certain measures while building the model. If the problem is with accuracy (less accuracy), we need to take measures according to the algorithm we are using.
After ensuring that the model’s performance is good and having good accuracy, we will pass new data for prediction and build the reports accordingly.