10 Kaggle Competitions Every Young Data Scientist Should Check Out
Unlike high school lessons and exams, competitions can offer some much-needed practical experience. Participating in data science competitions not only allows you to hone your coding and analytical skills but also provides hands-on experience in tackling real-world problems. These competitions act as a dynamic playground where you can put your theoretical knowledge to the test — such as implementing algorithms, analyzing datasets, and developing solutions to complex challenges.
What’s more, the work you do during a competition can be used to create a portfolio and showcased in college applications. This can demonstrate your passion for the subject and dedication to learning, setting you apart from other applications.
Why Kaggle?
Kaggle is a well-known platform for such competitions. It offers a diverse range of challenges that cater to various skill levels. You will be able to explore competitions that align with your expertise as well as other interests like art or sports, examining the intersection between those fields and data science. Further, a lot of the competitions require teamwork and you will leave the experience with new connections within the field.
10 Kaggle competitions for high school students:
Before we dive into the list, please note that the eligibility rules of the competitions include the following statement: “The older of 18 years old or the age of majority in your jurisdiction of residence (unless otherwise agreed to by Competition Sponsor and appropriate parental/guardian consents have been obtained by Competition Sponsor).” If you are below the age of 18, you will likely need your guardian’s written permission to take part in these contests.
Competitions for Beginners:
These competitions fall under Kaggle’s Getting Started category, which is meant for beginners in the field. If you’re looking for a simple competition with low stakes (there is no end date for these contests) and plenty of tutorials, consider one of these. However, do keep in mind that while there is a lot to learn through these contests, there are no monetary prizes or awards available.
Start date: August 30, 2016
End date: Rolling leaderboard
Prize: None
This is an ideal competition for students who have some experience with R or Python and machine learning basics, and are looking to apply that knowledge. As a participant, your goal will be to predict the final prices of residential homes in Ames, Iowa, based on a dataset with 79 explanatory variables. You must work in teams of up to 10 people. Through this exercise, you will gain skills in the areas of creative feature engineering and advanced regression techniques such as random forest and gradient boosting. You will be provided with a starter notebook for quick initiation as well as tutorials on Kaggle Learn to guide you.
Start date: February 23, 2022
End date: Rolling leaderboard
Prize: None
Another of Kaggle’s beginner competitions, Spaceship Titanic sets up a futuristic scenario where a spaceship suffers a collision that sends half its passengers to a different dimension. Using records from the spaceship’s damaged computer system, you must predict which passengers were transported in order to rescue them. Your results will be judged on the basis of their classification accuracy and the percentage of predicted labels that are correct. Taking part in this competition will hone your programming skills, data preprocessing (data cleaning, handling missing values, and more), and the fundamentals of machine learning (understanding classification tasks and predictive modeling).
Another similar competition geared for beginners is the Titanic – Machine Learning from Disaster competition.
Start date: August 28, 2020
End date: Rolling leaderboard
Prize: None
If you’re particularly interested in computer vision and the intersection of data science and visual art, consider participating in this competition. The challenge is to build a Generative Adversarial Network (GAN) — comprising a generator and discriminator model — to generate 7,000 to 10,000 Monet-style images. Your work will be evaluated on the basis of the MiFID (Memorization-informed Fréchet Inception Distance), the smaller it is, the more accurate your images are. Working with GANs will train you in the field of artificial intelligence and its use in image generation while also honing your coding skills in general.
Start date: July 25, 2012
End date: Rolling leaderboard
Prize: None
Another opportunity to learn about computer vision in particular, this competition requires you to correctly identify digits from a dataset of handwritten images. Through the process, you will learn about simple neural networks and classification methods such as SVM and K-nearest neighbors. While working in teams of up to 10 people, you are encouraged to experiment with various algorithms and compare results as you work. Kaggle offers several tutorials to help guide you during the contest.
Start date: July 29, 2020
End date: Rolling leaderboard
Prize: None
In this Kaggle competition, participants are challenged to develop a computer model using Natural Language Processing (NLP) to understand how pairs of sentences relate to each other. The goal is to predict if one sentence implies, contradicts, or has no connection with the other. You will work with a dataset that includes pairs of sentences in fifteen different languages and your model must predict the relationship labels between them using Natural Language Inferencing. This is a good opportunity for beginners to explore NLP, machine learning, and coding skills in the context of language and linguistics.
Advanced Competitions:
You will find more complex problems here that require a higher understanding of coding, data science, or related areas. The high stakes come with high rewards, as these competitions offer significant awards and/or monetary prizes.
Start date: November 1, 2023
End date: January 31, 2024
Prize: $15,000 for the 1st place, $10,000 for the 2nd place, $8,000 for the 3rd place, $7,000 for the 4th place, $5,000 for the 5th place, $5,000 for the 6th place
Organized by an energy company named Enefit, this competition aims to address the challenge of energy imbalance caused by the unpredictable behavior of prosumers—consumers who both use and generate energy. Your goal will be to create an accurate energy prediction model for prosumers, thereby minimizing imbalance costs, by using information about prosumers' energy patterns. You can work in teams of up to 5 people. Students participating should have a basic understanding of time series prediction, and through the competition, they can learn to develop models to address real-world challenges in the energy sector.
Start date: October 31, 2023
End date: January 22, 2024
Prize: $20,000 for the 1st place, $10,000 for the 2nd place, $8,000 for the 3rd place, $7,000 for the 4th place, $5,000 for the 5th place
Organized by Vanderbilt University and The Learning Agency Lab (a non-profit), this competition focuses on the use of Large Language Models (LLM) and AI detection techniques in the real world. As a participant, you must work in teams of up to 5 people to develop a machine learning model to discern whether an essay was written by a student or an LLM. The provided dataset will include student essays and those generated by various LLMs. You can expect to learn more about natural language processing techniques and model efficiency while exploring the influence of AI in the real world. Apart from the leaderboard performance prizes, the contest also offers prizes for the computational efficiency of your model.
Start date: October 31, 2023
End date: January 8, 2024
Prize: The top five submissions will receive $12,500 and an invitation to present at the 2024 NFL Scouting Combine, where the winning team will get an additional $12,500
The National Football League organizes the Big Data Bowl, an analytics contest that looks at statistical innovation in football, every year. The 2024 Big Data Bowl revolves around the players’ tackling performances and you can work in groups of up to 4 people. Your goal will be to assign value to elements of tackling by using a dataset including player tracking data from Weeks 1-9 of the 2022 NFL season.
Participants are encouraged to generate actionable insights, for example predictions of tackle time, tackle range, player evaluation, credit assignment, tackle type, and team/player roles. Aside from the significant monetary prize, you will learn about data wrangling, statistical analysis, feature engineering and more. This is a great opportunity for anyone interested in sports!
Start date: December 5, 2023
End date: January 1, 2024
Prize: Kaggle merchandise
This competition is part of Kaggle’s Playground Series (including slightly easier challenges) and revolves around a multi-class prediction challenge to predict outcomes for patients with cirrhosis. You will be given a dataset with information about the patients and must use that to predict their status, which could be one of three possible outcomes. The competition provides a valuable opportunity for you to learn more about feature engineering, model iteration, and visualization. It’s also a useful opportunity for students with an interest in the intersection of machine learning and medicine/healthcare.
Start date: October 2, 2023
End date: January 9, 2023
Prize: $12,000 for the 1st place, $8,000 for the 2nd place, $5,000 for the 3rd place for the Leaderboard Prizes and $15,000 for the 1st place, $10,000 for the 2nd place, $5,000 for the 3rd place for the Efficiency Prizes
Hosted by Vanderbilt University in collaboration with The Learning Agency Lab, this contest investigates the relationship between learners' writing behaviors and writing performance. The dataset involves about 5000 logs of user inputs, including keystrokes and mouse clicks, recorded while the user was writing an essay. Your challenge is to build a model that will predict the essay’s score based on the data. The result should offer some insight into writing instruction and automated evaluation techniques. This competition provides an opportunity for students to learn about data-driven analysis, predictive model development, and their applications in educational contexts.
If you’re looking to build a project/research paper in the field of AI & ML, consider applying to Veritas AI!
Veritas AI is founded by Harvard graduate students. Through the programs, you get a chance to work 1-1 with mentors from universities like Harvard, Stanford, MIT, and more to create unique, personalized projects. In the past year, we had over 1000 students learn AI & ML with us. You can apply here!
If you’re looking for the opportunity to work on an in-depth research project instead, you could also consider applying to one of the Lumiere Research Scholar Programs, selective online high school programs for students I founded with researchers at Harvard and Oxford. Last year, we had over 4000 students apply for 500 spots in the program! You can find the application form here.
Stephen is one of the founders of Lumiere and a Harvard College graduate. He founded Lumiere as a PhD student at Harvard Business School. Lumiere is a selective research program where students work 1-1 with a research mentor to develop an independent research paper.
Image Source: Kaggle logo