The Earth Data Challenge (EDC) is a semester-long, team-based program hosted by GRI and EEPS, focused on developing skills in AI-native workflows, cloud computing, and Earth data analysis.
The AI-native workflow used in EDC
EDC is designed as a model for AI-native research training that could be replicated across disciplines. In EDC, AI is used as a routine part of the workflow rather than an optional tool. Teams use LLMs to accelerate technical work while producing artifacts that are reviewable and reproducible.
In practice, teams use AI throughout the workflow, including:
- Framing and scoping projects (defining questions, feasibility checks, success criteria)
- Discovering and accessing cloud-hosted Earth datasets
- Writing and debugging Python notebook code with LLM assistance
- Validating results with sanity checks, baselines, and sensitivity tests
- Producing reproducible outputs (version-controlled notebooks, fixed environments, documented provenance)
To keep the work transparent:
- Work is version-controlled in Git/GitHub. Code and analyses are reviewable
- Teams document major AI-assisted decisions and prompts
- Results must be supported by explicit checks and interpretable outputs (not “the model said so”)
For a detailed description of how AI is integrated into the workflow, see the AI-native workflow documentation.
What EDC will cover
- Cloud computing with Google Cloud Platform and Jupyter notebooks, including Vertex AI for hosted model endpoints and scalable notebook workflows
- AI-native research workflow
- Practical LLM-assisted coding workflow for teams
- Google Earth Engine and Python programming
- Data analysis and visualization, including hyperspectral data (Planet Tanager)
- Version control with Git/GitHub
- Reproducible, open-science practices
- Collaborative technical project skills that translate directly to research and industry careers
These are practical, career-relevant skills used in geoscience, geospatial analytics, environmental consulting, government agencies, data science, and many other fields.
Schedule
All sessions are at noon in Rudolph 301.
| Date | Event |
|---|---|
| January 30 | Overview and Exploring Cloud Dataset |
| February 13 | Jupyter and Google Earth Engine |
| February 27 | Git/GitHub, Team Formation, and Mentoring |
| March 20 | EDC Team Updates, Mentoring, and Questions |
| April 24 | EDC Team Presentations and Awards |
How It Works
Form a team of 2–5 students, choose a project, develop a reproducible Jupyter notebook, and present your results at the end of the semester. We provide cloud computing access, example notebooks, curated datasets, and mentorship.
The goal is to get acquainted with modern data workflows, not to produce exhaustive research. Projects are intentionally small-scale and skill-focused. Expect a total time commitment of roughly 10–12 hours over the semester, including the scheduled sessions.
We are excited about Planet Tanager hyperspectral imagery as a new data source, but teams are free to work with any publicly available cloud-hosted Earth datasets that fit their interests.
Projects run on cloud infrastructure (Google Cloud and Vertex AI), allowing teams to use hosted datasets, scalable notebooks, and model endpoints similar to those used in research and industry environments.
Who Can Participate
Open to WashU undergraduates, graduates, and postdocs in any field.
Resources
Contact
Questions? Reach out to Alex Bradley, Tom Stein, Roger Michaelides, Tyler Meng, or Alex Nguyen.