Harvard University: Building Spatial Data Infrastructure (SDI) for Future GIS Research and Education


Harvard University’s Center for Geographic Analysis (CGA) has developed a cutting-edge Spatial Data Infrastructure (SDI) to meet the growing demand for high-performance, trusted, cost-effective geospatial research and education. This initiative, funded by the Office of the Vice Provost of Research (OVPR), leverages cloud-native architecture, high-performance computing (HPC), and open-source technologies to enable advanced spatial analytics, AI integration, and big data processing. Designed with scalability, interoperability, and reproducibility in mind, the SDI supports cross-disciplinary research and aligns with OGC modernization goals, emphasizing automation, open standards, and community engagement.

Key Features
• Cloud-native SDI using NERC and FASRC for HPC and long-term cloud storage

• Advanced spatial analytics integrated with AI and Data Science

• Compliance with FAIR principles for data sharing and reuse

• Workflow-driven computing with KNIME, PostGIS, Jupyter, and containerized systems.

• Open-source tools and training resources for community GIS education.

Benefits and Impact
• High-performance, scalable infrastructure for big data visualization and analysis

• Reproducible workflows and interoperability through open standards.
Educational Outreach


• Cost-effective and sustainable architecture leveraging shared resources

• Community engagement via open-source repositories, workshops, and tutorials

• Global well-being monitoring: TSGI dataset used for UN SDGs

• Healthcare access analysis: RapidRoute applied in public health studies

• Earth Observation: RINX adopted for large-scale EO data processing

• Political research: K-NN tool used for studying partisan segregation

• Infrastructure planning: Optipath applied in pipeline route

• Social media analytics: BOP enables real-time analysis of billions of geotagged tweets

Use Cases
• Standardized datasets: TSGI, Geotweet Archive (10B geo-tagged tweets)

• Software infrastructure: KNIME Business Hub, ArcGIS Enterprise, PostGIS, Heavy.ai, Jupyter

• Open-source applications: RINX, RapidRoute, K-NN, Optipath, BOP

• Education and training: Workshops, tutorials, GitHub repositories, and YouTube resources