Prediction Analysis of PM2.5 Concentration Based on Temperature Variables Using XGBoost Algorithm (Case Study: Kemayoran, Central Jakarta)
Main Article Content
Abstract
Improvement in air quality in urban areas like Central Jakarta is a big challenge due to high activities of transport, industry, and dense population. This study aims to predict PM2.5 concentrations by utilising the XGBoost algorithm based on temperature data as the main variable. The data was taken from Kemayoran, Central Jakarta, with an observation time span from 01 January 2017 to 12 February 2017. XGBoost was chosen due to the non-linear and complex nature of the data. Based on the results of the test, it shows that the model performance is far from improved, characterized by a high Mean Squared Error (MSE) value and a small R² score. These performance limitations are driven by the small amount of data and the absence of other supporting variables such as air humidity, wind speed, and rainfall. The high PM2.5 concentration was contributed by the research location in Kemayoran, one of the most densely populated areas with high industrial activity and fossil-fuelled transport. This study provides evidence to support the addition of supporting variables and the extension of the observation time span to enhance model accuracy. Therefore, the XGBoost algorithm can be used as a promising solution for air quality prediction in urban cities where air pollution has reached its peak.
Downloads
References
A. Assayuti, Y. Pujowati, A. Abeng, and D. Kamal, “Impact of air Pollution, Population Density, Land Use, and Transportation on Public Health in Jakarta,” J. Geosains West Sci., vol. 1, pp. 35–43, 2023, doi: 10.58812/jgws.v1i02.391.
B. Haryanto, “Climate Change and Urban Air Pollution Health Impacts in Indonesia,” in Climate Change and Air Pollution: The Impact on Human Health in Developed and Developing Countries, R. Akhtar and C. Palagiano, Eds., Cham: Springer International Publishing, 2018, pp. 215–239. doi: 10.1007/978-3-319-61346-8_14.
A. Masood et al., “Improving PM2.5 prediction in New Delhi using a hybrid extreme learning machine coupled with snake optimization algorithm,” Sci. Rep., vol. 13, no. 1, pp. 1–17, 2023, doi: 10.1038/s41598-023-47492-z.
J. Ma, Z. Yu, Y. Qu, J. Xu, and Y. Cao, “Application of the XGBoost Machine Learning Method in PM2.5 Prediction: A Case Study of Shanghai,” Aerosol Air Qual. Res., vol. 20, no. 1, pp. 128–138, 2020, doi: 10.4209/aaqr.2019.08.0408.
A. X. V. I. Simp and S. Remoto, “PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data Mehdi,” no. 1992, pp. 6425–6432, 2013.
Q. Yang, Q. Yuan, T. Li, H. Shen, and L. Zhang, “The relationships between PM2.5 and meteorological factors in China: Seasonal and regional variations,” Int. J. Environ. Res. Public Health, vol. 14, no. 12, 2017, doi: 10.3390/ijerph14121510.
P. Zhan et al., “Recent abnormal hydrologic behavior of Tibetan lakes observed by multi- mission altimeters,” Remote Sens., vol. 12, no. 18, 2020, doi: 10.3390/RS12182986.
T. Wang et al., “Secondary aerosol formation and its linkage with synoptic conditions during winter haze pollution over eastern China,” Sci. Total Environ., vol. 730, p. 138888, 2020, doi: https://doi.org/10.1016/j.scitotenv.2020.138888.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-August-2016, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
C. Jin, Y. Wang, T. Li, and Q. Yuan, “Global validation and hybrid calibration of CAMS and MERRA-2 PM2.5 reanalysis products based on OpenAQ platform,” Atmos. Environ., vol. 274, p. 118972, 2022, doi: 10.1016/j.atmosenv.2022.118972.
J. Guo et al., “Impact of diurnal variability and meteorological factors on the PM2.5 - AOD relationship: Implications for PM2.5 remote sensing,” Environ. Pollut., vol. 221, pp. 94–104, 2017, doi: https://doi.org/10.1016/j.envpol.2016.11.043.
C.-H. Wu, I.-C. Tsai, P.-C. Tsai, and Y.-S. Tung, “Large–scale seasonal control of air quality in Taiwan,” Atmos. Environ., vol. 214, p. 116868, 2019, doi: https://doi.org/10.1016/j.atmosenv.2019.116868.
G. Shreya, B. Tharun Reddy, and V. S. G. N. Raju, “Air Quality Prediction Using Machine Learning Algorithms,” Lect. Notes Networks Syst., vol. 840, no. 2, pp. 465–473, 2024, doi: 10.1007/978-981-99-8451-0_39.
J. Zhou and Z. Huang, “Recover Missing Sensor Data with Iterative Imputing Network,” CoRR, vol. abs/1711.07878, 2017, [Online]. Available: http://arxiv.org/abs/1711.07878
T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” 2016, pp. 785– 794. doi: 10.1145/2939672.2939785.
Z. Ali and A. Burhan, “Hybrid machine learning approach for construction cost estimation: an evaluation of extreme gradient boosting model,” Asian J. Civ. Eng., vol. 24, pp. 1–16, 2023, doi: 10.1007/s42107-023-00651-z.
G. Shmueli and O. Koppius, “Predictive Analytics in Information Systems Research,” MIS Q., vol. 35, pp. 553–572, 2011, doi: 10.2139/ssrn.1606674.
H. Zheng et al., “Achievements and challenges in improving air quality in China: Analysis of the long-term trends from 2014 to 2022,” Environ. Int., vol. 183, p. 108361, 2024, doi: https://doi.org/10.1016/j.envint.2023.108361.
M. Diao et al., “Methods, availability, and applications of PM(2.5) exposure estimates derived from ground measurements, satellite, and atmospheric models.,” J. Air Waste Manag. Assoc., vol. 69, no. 12, pp. 1391–1414, Dec. 2019, doi: 10.1080/10962247.2019.1668498.
Y. Zhang, S. X. Chen, and L. Bao, “Air pollution estimation under air stagnation—A case study of Beijing,” Environmetrics, vol. 34, no. 6, p. e2819, 2023, doi: https://doi.org/10.1002/env.2819.