C.A. Micchelli, W.L. Miranker
Journal of the ACM
To refine unsupervised geospatial model training, we introduce a novel method emphasizing diverse and clean datasets. Extracting finer-resolution metrics like land use, temperature, and precipitation, we cluster similar statistics to comprehend data distribution comprehensively. Weighted sampling based on cluster size ensures representative data points, with a down-weighting strategy favoring less frequent data for enhanced diversity. This achieves a balanced dataset representation, significantly improving the geospatial foundation model's accuracy. Our study underscores the potential for optimizing geospatial data sampling, enhancing model accuracy, and broadening practical applications.
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Cristina Cornelio, Judy Goldsmith, et al.
JAIR