How to build a Data Project to 10x your chances of landing an interview.
Follow the 6-step framework to save 20+ hours and gain clarity in your approach.
Last week we launched the 30-Day Machine Learning Build Challenge, on how to build and deploy ML models in a production environment.
If you never had an opportunity to work in a production environment with large datasets and industry-approved frameworks, then this challenge is for you.
Every week, we will guide you through each step of the ML lifecycle here on Substack. Today we will focus on the first step of the process: How to choose the right project?
Where shall we begin?
- Anywhere, but Jupyter Notebook.
Data Science jokes aside, let’s dive in.
How do you pick the right Data project and not waste 3 months of your life?
To answer this question we invited Brad Yarbro to our “Break Into Data” community and learned about his 6-step framework. Read this article to learn what industry experts have to say about building your First Data Portfolio.
What are the 6 most common mistakes we should avoid first?
Before we dive into the nitty-gritty details of starting a Data/ML project, let’s first explore the most common pitfalls that beginners make all the time, unknowingly.
#1 Lack of strategy
Following your curiosity is important, but asking yourself these questions is crucial to your long-term success. Ask yourself :
What am I trying to achieve with this project?
What roles am I targeting? What skills do they require?
How can this project reflect my ability to solve those business problems?
Which industry or domain interests me?
For example, if you are targeting Product Analyst positions it would only make sense for you to work on A/B testing and NOT on training an image classifier model.
#2 Lack of excitement
If you are contributing at least 10-20 hours a week to a Data project, make sure to choose a topic that excites you. You will inevitably face many challenges and roadblocks. This will help you stay motivated enough to see through your project.
For instance, when I started our open-source-driven AI accountability bot I was mostly excited about how this can change the lives of our neurodivergent members in our community. It pushes me through debugging countless errors messages at midnight.
#3 Generic and clean datasets
If you think you are doing something by using Titanic, Iris, State Housing, and any other popular datasets, you are mistaken. The reality of the matter is - Data Is Messy.
Companies pay you, the data professional, to clean it, preprocess it, and transform it into a valuable form used by other analysts and scientists.
If you develop Web Scraping, Data Cleaning/Wrangling, and API integration skills, you will be irreplaceable.
#4 Unfinished and undeployed projects
Anything worse than not having a portfolio project is having an unfinished one.
Instead of having 10 Github repos with missing readme pages. Have 3 Solid Portfolio Pieces that showcase the variety and depth of your expertise.
#5 Poor visibility
Exposure. Exposure. Exposure.
Half the job is about showcasing the impact and results of your project. Prepare appealing visuals, charts, and diagrams. Take a look at my friend, Kelly Adam’s workout dashboard for inspiration. I love it.
Afterwards, share your findings on Linkedin, Reddit, Hacker News, Kaggle, Data Science Slack, and Discord communities.
If you want to go one step further, check out Jay Feng’s story on how he landed 10 job offers by gaining media exposure through Hacker News and local TV stations. See who can benefit from your project and reach out!
A 6-step Framework for Data Projects
Follow this guide to cover every step of the process for a comprehensive and impactful project.
At the end of the day, ALL of this is preparing you to ace the interview!
By working on your portfolio you are building TALKING POINTS for your interviews.
To make the most out of your portfolio project make sure to include the following sentences during your interview:
“I noticed that there was ____(problem)____ in my dataset, so I used ____(solution) because ___X___”. Or “I used ____ framework____ BECAUSE I wanted to ___Y___”. And “In the end, I evaluated the model with ____ ( a metric)____ because ___Z__”
This will set you apart from 80% of applicants.
Happy coding!
If you like our content join our Discord Community for more learning and career opportunities, including speaker sessions and coding challenges.
Stay tuned for Break Into Data’s activities in April:
Check out our resources and events:
Weekly speaker series: This Thursday, April 18th we will have Venkata Naga Sai Kumar Bysani. He will share his journey on landing a Data Analyst job right after graduation as an international student. - Register here! Spots are limited.
We are launching Daily Silent Deep Work sessions on our Discord Voice Channel. Join us every weekday TO ENTER FLOW STATE and FOCUS for 90 min with our cameras on and audio off. We will start on Thursday!
30-Day ML build and deploy challenge. We are working with our team to prepare the best learning experience for free. Stay tuned for the next articles and resources.
If you have any questions, write to us on our Discord server.
Every time I open my feed, it's either on LinkedIn or Substack, I learn new things from you @Meri Nova.
Thank you for providing the beautiful insights from your Journey.
If I wanted to break into Machine Learning in 2024, these are 3 types of projects I would have in my portfolio:
Forget the cookie-cutter approach of "1 LLM-powered chatbot, 1 Pytorch project, and 1 scikit-learn implementation".
Trust me, no one's scanning portfolios for general implementations of popular libraries.
Instead of following the crowd, focus on demonstrating depth and breadth in key ML competencies:
1. An End-to-End ML Pipeline
Skills to demonstrate: Data preprocessing, feature engineering, model selection, hyperparameter tuning, deployment, and monitoring.
Trust me you will learn a LOT by doing this.
2. A State-of-the-Art Model Implementation
Skills to demonstrate: Deep understanding of cutting-edge algorithms, ability to read and implement research papers,by translating math equations into working Python code.
3. Find A Real-World Problem you care about.
Skills to demonstrate: Problem framing, business impact assessment, data acquisition, approach and tech stack selection, ethical considerations, and project documentation.
You could integrate all 3 aspects into one comprehensive project or showcase them separately. The key is demonstrating your ability to tackle real-world ML challenges and make a tangible impact.
Stop endlessly consuming courses and start building!
Your future in ML starts with hands-on experience.
If you want to learn more about how to build a portfolio read this article:
https://lnkd.in/g7sTg8Re
Happy coding!
hashtag#machinelearning