The AchieVer Posted March 11, 2019 Share Posted March 11, 2019 Who are you, citizen data scientist? Many aspects of the data science life cycle are being abstracted by automated machine learning solutions -- which is good news for data scientists and citizen data scientists. Ugh. Everyone is talking about the citizen data scientist, but no one can define it (perhaps they know one when they see one). Here goes -- the simplest definition of a citizen data scientist is: non-data scientist. That's not a pejorative; it just means that citizen data scientists nobly desire to do data science but are not formally schooled in all the ins and outs of the data science life cycle. For example, a citizen data scientist may be quite savvy about what enterprise data is likely to be important to create a model but may not know the difference between GBM, random forester, and SVM. Those algorithms are data scientist geek-speak to many of them. The citizen data scientist's job is not data science; rather, they use it as a tool to get their job done. Here is my definition of the enterprise citizen data scientist: "A businessperson who aspires to use data science techniques such as machine learning to discover new insights and create predictive models to improve business outcomes." CITIZEN DATA SCIENTISTS ARE A HEARTY LOT They must be dedicated to their part-time craft, because doing data science is not easy. It requires learning the life cycle: data acquisition, data preparation, feature engineering, algorithm selection, model training, model evaluation, and, finally, insights and/or predictions. They may even have to learn to program in R or Python. If they are lucky (and smart), they will download RapidMiner, KNIME, or others, because these tools provide nice visual drag-and-drop interfaces versus harsh coding. GOOD NEWS FOR EVERYONE THAT DEALS WITH THE GNARLY The best news for citizen data scientists is that many of the gnarliest aspects of the data science life cycle are being abstracted by automated machine learning solutions (AutoML). Automated machine learning solutions such as DataRobot, H2O.ai's Driverless AI, Google Cloud AutoML, and more provide sophisticated tools that abstract the gory details of data science so that citizen data scientists and perhaps mere mortals can analyze data and build robust machine learning models. It's also good news for data scientists because the same automation of the data science life cycle can make data scientists more productive. And it's good news for business because the demand for machine learning is reaching voraciousness-level. Kjell Carlsson and I will be doing a Wave on automation-focused machine learning solutions, scheduled for publication in Q2 of 2019. We define this category as: "Software that provides enterprise data scientist teams and/or citizen data scientists with tools to train, deploy, and manage analytical results and models that are principally designed to automate key aspects of the machine learning life cycle, including feature engineering, algorithm selection, model evaluation, and explainability." Source Link to comment Share on other sites More sharing options...
Ugh. Everyone is talking about the citizen data scientist, but no one can define it (perhaps they know one when they see one). Here goes -- the simplest definition of a citizen data scientist is: non-data scientist. That's not a pejorative; it just means that citizen data scientists nobly desire to do data science but are not formally schooled in all the ins and outs of the data science life cycle. For example, a citizen data scientist may be quite savvy about what enterprise data is likely to be important to create a model but may not know the difference between GBM, random forester, and SVM. Those algorithms are data scientist geek-speak to many of them. The citizen data scientist's job is not data science; rather, they use it as a tool to get their job done. Here is my definition of the enterprise citizen data scientist: "A businessperson who aspires to use data science techniques such as machine learning to discover new insights and create predictive models to improve business outcomes." CITIZEN DATA SCIENTISTS ARE A HEARTY LOT They must be dedicated to their part-time craft, because doing data science is not easy. It requires learning the life cycle: data acquisition, data preparation, feature engineering, algorithm selection, model training, model evaluation, and, finally, insights and/or predictions. They may even have to learn to program in R or Python. If they are lucky (and smart), they will download RapidMiner, KNIME, or others, because these tools provide nice visual drag-and-drop interfaces versus harsh coding. GOOD NEWS FOR EVERYONE THAT DEALS WITH THE GNARLY The best news for citizen data scientists is that many of the gnarliest aspects of the data science life cycle are being abstracted by automated machine learning solutions (AutoML). Automated machine learning solutions such as DataRobot, H2O.ai's Driverless AI, Google Cloud AutoML, and more provide sophisticated tools that abstract the gory details of data science so that citizen data scientists and perhaps mere mortals can analyze data and build robust machine learning models. It's also good news for data scientists because the same automation of the data science life cycle can make data scientists more productive. And it's good news for business because the demand for machine learning is reaching voraciousness-level. Kjell Carlsson and I will be doing a Wave on automation-focused machine learning solutions, scheduled for publication in Q2 of 2019. We define this category as: "Software that provides enterprise data scientist teams and/or citizen data scientists with tools to train, deploy, and manage analytical results and models that are principally designed to automate key aspects of the machine learning life cycle, including feature engineering, algorithm selection, model evaluation, and explainability." Source
mp68terr Posted March 12, 2019 Share Posted March 12, 2019 14 hours ago, The AchieVer said: It requires learning the life cycle Where is this coming from? It's all about data, no need to know anything about the underneath life that creates the data. Or it should be written as 'life cycle of data'. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.