Data Scientists vs Statisticians

I think being a data scientist is about getting useful ideas or making decisions based on complex and high dimensionally data sets from the modern digital world. The major duties of a data scientist involve data ingestion, data transformation, exploratory data analysis, model selection, model evaluation, and data storytelling. To realize the processes mentioned above, a data scientist should have a strong knowledge background in computer science, mathematics/statistics, and specific domain knowledge in a related area. Statistics is a crucial component of data science. In my understanding, statisticians often deal with data that have a relatively simpler structure and use a single model to fit or inference the data, for example, clinical trials. In contrast, data scientists focus on comparing a number of different methods to create the best machine learning model for prediction. However, since both data science and statistics aim to extract knowledge from data, the real situation of these two fields is that each is weak without the other. Statisticians need to understand the modeling and structure of data, while data scientists need to understand applied statistics. Ultimately, the boundary of these two disciplines will not be very clear in the near future as real-world data becomes increasingly complicated. For me, I will do my best to adapt to this coming trend by preparing myself with both solid statistical knowledge and proficient skill in “data science process”.

Written on August 18, 2021