PM2.5 Concentration Prediction Model in Jakarta Area Using Random Forest Algorithm
Main Article Content
Abstract
This study predicts PM2.5 concentrations in Jakarta using the Random Forest algorithm with historical air quality data from 2015 to 2024. Hyperparameter tuning was performed to optimize model performance, focusing on parameters such as n_estimators, max_depth, and min_samples_split. The model achieved a Mean Absolute Error (MAE) of 14.44, a Root Mean Square Error (RMSE) of 18.75, and an R² Score of 0.61. While the model captured general PM2.5 fluctuation patterns, deviations at certain points indicate room for improvement. Descriptive analysis showed an average PM2.5 concentration of 94.46 µg/m³, with peaks up to 209 µg/m³, exceeding healthy air quality thresholds. The model can be integrated into real-time monitoring systems and support data-driven policies. Future work could incorporate meteorological variables and evaluate longer-term trends to enhance accuracy.
Downloads
References
Agarwal, Amit. (2015). Proceedings on 2015 1st International Conference on Next Generation Computing Technologies (NGCT): September 4th–5th, 2015, Center for Information Technology, University of Petroleum and Energy Studies, Dehradun. IEEE.
Alyousifi, Y., Othman, M., Sokkalingam, R., Faye, I., & Silva, P. C. L. (2020). Predicting daily air pollution index based on fuzzy time series markov chain model. Symmetry, 12(2). https://doi.org/10.3390/sym12020293
Ameer, S., Shah, M. A., Khan, A., Song, H., Maple, C., Islam, S. U., & Asghar, M. N. (2019). Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities. IEEE Access, 7, 128325–128338. https://doi.org/10.1109/ACCESS.2019.2925082
Brokamp, C., Jandarov, R., Rao, M. B., LeMasters, G., & Ryan, P. (2017). Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches. Atmospheric Environment, 151, 1–11. https://doi.org/10.1016/j.atmosenv.2016.11.066
Chen, M. H., Chen, Y. C., Chou, T. Y., & Ning, F. S. (2023). PM2.5 Concentration Prediction Model: A CNN–RF Ensemble Framework. International Journal of Environmental Research and Public Health, 20(5). https://doi.org/10.3390/ijerph20054077
Dobrea, M., Badicu, A., Barbu, M., Subea, O., Balanescu, M., Suciu, G., Birdici, A., Orza, O., & Dobre, C. (2020). Machine Learning algorithms for air pollutants forecasting. 2020 IEEE 26th International Symposium for Design and Technology in Electronic Packaging (SIITME), 109–113. https://doi.org/10.1109/SIITME50350.2020.9292238
Goyal, K., & Goyal, S. (2024). Predicting PM2.5 Air Quality Using Random Forest Regression Enhanced with Polynomial Features. International IEEE Conference Proceedings, IS, 2024. https://doi.org/10.1109/IS61756.2024.10705219
Grell, G. A., Peckham, S. E., Schmitz, R., McKeen, S. A., Frost, G., Skamarock, W. C., & Eder, B. (2005). Fully coupled “online” chemistry within the WRF model. Atmospheric Environment, 39(37), 6957–6975. https://doi.org/10.1016/j.atmosenv.2005.04.027
Jamal, A., & Nodehi, R. N. (2017). Predicting air quality index based on meteorological data: A comparison of regression analysis, artificial neural networks, and decision tree. Journal of Air Pollution and Health, 2(1). http://japh.tums.ac.ir
Joharestani, M. Z., Cao, C., Ni, X., Bashir, B., & Talebiesfandarani, S. (2019). PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere, 10(7). https://doi.org/10.3390/atmos10070373
Kang, J., Zou, X., Tan, J., Li, J., & Karimian, H. (2023). Short-Term PM2.5 Concentration Changes Prediction: A Comparison of Meteorological and Historical Data. Sustainability (Switzerland), 15(14). https://doi.org/10.3390/su151411408
Liu, Y., Wang, Y., & Zhang, J. (2012). New Machine Learning Algorithm: Random Forest. In LNCS (Vol. 7473).
Livingston, F. (2005). Implementation of Breiman’s Random Forest Machine Learning Algorithm. ECE591Q Machine Learning Journal Paper, Fall.
Lu’, W., Wan&, W., Tileungl, A. Y., Lo’, S.-M., Kyued, R. K., Xu1, Z., & Fan’, H. (2002). Air Pollutant Parameter Forecasting Using Support Vector Machines.
Ma, X., Chen, T., Ge, R., Xv, F., Cui, C., & Li, J. (2023). Prediction of PM2.5 Concentration Using Spatiotemporal Data with Machine Learning Models. Atmosphere, 14(10). https://doi.org/10.3390/atmos14101517
Mauboy, L. M., Raihan Abhirama, M., Salsabila, S., & Kurniawan, R. (2024). Perbandingan Klasifikasi PM2.5 di Daerah Khusus Jakarta Algoritma C5.0, Random Forest, dan SVM. Seminar Nasional Sains Data, 2024.
Segal, M. R. (2003). Machine Learning Benchmarks and Random Forest Regression. UCSF Recent Work.
Sharma, M., Jain, S., Mittal, S., & Sheikh, T. H. (2021). Forecasting and prediction of air pollutants concentrates using machine learning techniques: The case of India. IOP Conference Series: Materials Science and Engineering, 1022(1). https://doi.org/10.1088/1757-899X/1022/1/012123
Uzir, N., Raman, S., Banerjee, S., & Nishant Uzir Sunil R, R. S. (2016). Experimenting XGBoost Algorithm for Prediction and Classification of Different Datasets. International Journal of Control Theory and Applications, 9. https://www.researchgate.net/publication/318132203
Yu, R., Yang, Y., Yang, L., Han, G., & Move, O. A. (2016). RAQ–A random forest approach for predicting air quality in urban sensing systems. Sensors (Switzerland), 16(1). https://doi.org/10.3390/s16010086