Apart from the curiosity to play around with Data and Programming skills, one of the most crucial qualities that sets a Data scientist apart is their ability to understand the business nuances. One should always remember that data science is a tool to enable businesses to make better decisions and enhance user experience. Hence it is imperative that a data scientist clearly understands and articulates the business problem they are trying to solve.
1. Tell more about yourself, background, and journey.
Data science may have been around for a while now, but when I was pursuing my B. Tech in Computer Science, it was not as trending as it is now. My final year B.Tech project was all about playing around with Big Data around various features of songs and features and predict real-life the year in which a particular song might have been released. Little did I know, I had only scratched the surface of Data Science.
Post my engineering, I was lucky enough to get exposed to real life business problems that could be solved by data in multiple product-based companies. By the end of 3 years in corporate, I was sure that Data Science is what I would want to pursue going forward, I decided to get into the depth of what this field has to offer. I enrolled into a Master's by Research program in Data Science at IIIT Bangalore. Now, I am looking forward to merging the Academia and Industry in my next stint in corporate and solving purposeful business problems.
2. Do you believe that even if one is unrelated to this profession, one should have a fundamental understanding of it?
I was recently reading about how some of the core technical skills we are focusing on currently will get obsolete in the next few years. However, one skill that is hard to stay and become more impactful by the day is the art of storytelling. Irrespective of the profession, it is important to understand how numbers can help you weave a strong and compelling narrative.
This ability to turn data into actionable insights will become increasingly critical going forward. The best part about the current day and age is that there is no barrier to entry to learning something. While advanced data science may require exposure to sophisticated tools and frameworks, anybody irrespective of their profession can plunge into the world of data with basic tools (eg: excel) and the hunger to learn.
3. What does a data scientist need the most?
Apart from the curiosity to play around with Data and Programming skills, one of the most crucial qualities that sets a Data scientist apart is their ability to understand the business nuances. One should always remember that data science is a tool to enable businesses to make better decisions and enhance user experience. Hence it is imperative that a data scientist clearly understands and articulates the business problem they are trying to solve.
This brings me to the next point around a data scientist's passion and rigour to solve real-life problems. Being able to understand the variety of ways in which a data point can be utilised training to solve a problem. Last but not the least, having a strong mathematical foundation would help in thoughtfully performing experiments and iterate faster in the development cycle.
4. How do you handle missing data? What imputation techniques do you recommend?
I go about deciding it based on the type of task and % of missing values the data has. But a few techniques that mostly work out are imputation derived based on k-nearest neighbors, treating missing values as another category, and training the model with those augmented.
5. Any words of wisdom for Data Science students or practitioners starting out?
One thing that has worked well for me is not taking the shortcuts or running after learning quickly. For anyone who is getting started in Data Science, it can be very tempting to easily access pre-built packages and libraries for almost anything and everything. However, one of the best ways to learn is to dive deeper into these libraries and understand how these algorithms are implemented. It might not be the fastest but surely is something that will benefit in the long run.
6. What/when is the latest data mining book/article you read?
I try to keep myself updated with the latest and greatest happening in this domain by regularly reading relevant research papers. The latest one that I read was "HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing". I also enjoy summarising these research articles in layman's language on my YouTube Channel! I had, recently released the explanation for this research paper as well. Interested readers can go through the same here on my Youtube Channel TechViz - The Data Science GuyMy name is Prakhar Mishra. I am currently a Master's student in Data Science at IIIT Bangalore, India, and also doing my internship with Udaan. I work majorly in the domain of NLP where my thesis project is about automatically generating a short preview video by summarising a given set of text resources. Apart from generative NLP applications such as summarisation, translation, etc, my research interests also include working on unsupervised learning, adversarial training,g and, conversational AI. I also actively create content in the AI/NLP space through my blogs and youtube channel, where I touch upon various topics in the AI/ML space with a major focus on Natural Language Processing, Graph Machine Learning, and General Machine Learning/Deep Learning concepts.
Interviewed By - Sugandha Dhanawade
0 Comments