Following a structured approach to data science helps you to maximize your chances of success in a data science project at the lowest cost. It also makes it possible to take up a project as a team, with each team member focusing on what they do best. Take care, however: this approach may not be suitable for every type of project or be the only way to do good data science.
The typical data science process consists of six steps through which you’ll iterate, as shown in figure
Figure 2.1 summarizes the data science process and shows the main steps and actions you’ll take during a project. The following list is a short introduction
1. The first step of this process is setting a research goal. The main purpose here is making sure all the stakeholders understand the what, how, and why of the project. In every serious project this will result in a project charter.
2. The second phase is data retrieval. You want to have data available for analysis, so this step includes finding suitable data and getting access to the data from the data owner. The result is data in its raw form, which probably needs polishing and transformation before it becomes usable.
3. Now that you have the raw data, it’s time to prepare it. This includes transforming the data from a raw form into data that’s directly usable in your models. To achieve this, you’ll detect and correct different kinds of errors in the data, combine data from different data sources, and transform it. If you have successfully completed this step, you can progress to data visualization and modeling.
4. The fourth step is data exploration. The goal of this step is to gain a deep understanding of the data. You’ll look for patterns, correlations, and deviations based on visual and descriptive techniques. The insights you gain from this phase will enable you to start modeling.
5 Finally, we get to the model building . It is now that you attempt to gain the insights or make the predictions stated in your project charter.
Now is the time to bring out the heavy guns, but remember research has taught us that often (but not always) a combination of simple models tends to outperform one complicated model. If you’ve done this phase right, you’re almost done.
6. The last step of the data science model is presenting your results and automating the analysis, if needed. One goal of a project is to change a process and/or make better decisions. You may still need to convince the business that your findings will indeed change the business process as expected.
This is where you can shine in your influencer role. The importance of this step is more apparent in projects on a strategic and tactical level. Certain projects require you to perform the business process over and over again, so automating the project will save time.
0comments:
Post a Comment
Note: only a member of this blog may post a comment.