TORONTO: What you need to do to become a good data scientist

TORONTO: What you need to do to become a good data scientist

TORONTO: Despite the proliferation of
data science skilling programmes in India, there’s still a big supply gap.
Coursera’s Global Skills Index 2020 ranked India as a laggard in data science
(Rank No. 51), though 20% of all their enrolments in the country are in data
science courses and projects.

Data science professionals also often do not realise the subject requires
continuous learning and applying. “Things are changing so rapidly in this field
that what is state-of-the-art today will not be so a month later,” says Parul
Pandey, data science evangelist at H2O.ai, an open source AI company.

Platforms like Kaggle and HackerEarth are some of the best places to understand
the latest developments. Hackathons hosted on Kaggle help data professionals to
collaborate with others globally. “The insights and learnings that come with it
are invaluable. We have to look at what is happening in the research world,
what is happening in competitions, and which are the latest technologies,” says
Pandey.

A data
scientist’s job is a unique combination of domain expertise, analytical
capability and programming experience. Getting such candidates has been a bit
of a challenge for companies.

HackerEarth’s data science
offerings include a practice component, where individual developers can sign
up, and access lots of free content where they can build models, and test them
and run. “Post the training, there are options for self assessment by attending
challenges, where you get to compete with other data scientists,” says
Vishwastam Shukla, CTO at HackerEarth. More than 10% of HackerEarth’s
5-million-plus community of developers are into data science.

The quality of professionals required is rising. The 2020 State of Data Science
report by Anaconda, an open-source distribution of Python and R, predicts that
larger organisations will establish data science centres of excellence to
maximise the business impact from data science and cross-trained professionals.

People
are starting to understand the real skills and real value that a data scientist
brings. So the contours of data science jobs are getting well-defined. Because
of that, you see a lot of maturity coming into these candidates, as well as the
overall system.

However,
the daily grind of a data scientist will continue. The Anaconda report, which
surveyed professionals from 15 domains ranging from finance to healthcare, says
that data scientists spend most of their time (26%) cleaning data. The first
thing always in a data science pipeline, Pandey says, is to understand the
dataset before you start predicting from it. Since the data is drawn from
multiple sources, you don’t know what all it has or whether the data is clean.
So you need to explore the data to ensure there’s no bias. Visualisation
libraries like Plotly and Bokeh, and tools like Tableau and PowerBI are used to
understand data by visualising them. Data scientists spend around 21% of their
time on visualisation.

Such data exploration requires
domain expertise. When dealing with a healthcare dataset, only a healthcare
professional will be able to tell why there’s a particular pattern. A pure data
scientist cannot. This is why data science becomes a field for everybody. “Many
now are moving from their domain specific jobs to a data analytics sort of job,
which has some programming also involved,” says Pandey.

After everything is visualised and the data is cleaned, it is fed into
libraries like Tensorflow and Pytorch to do predictions.

Leave a Comment