PM2.5 Concentration Prediction Model in Jakarta Area Using Random Forest Algorithm

Main Article Content

Muhammad Naufal Afif Al Arsy
Ahmad Meijlan Yasir

Abstract

This study predicts PM2.5 concentrations in Jakarta using the Random Forest algorithm with historical air quality data from 2015 to 2024. Hyperparameter tuning was performed to optimize model performance, focusing on parameters such as n_estimators, max_depth, and min_samples_split. The model achieved a Mean Absolute Error (MAE) of 14.44, a Root Mean Square Error (RMSE) of 18.75, and an R² Score of 0.61. While the model captured general PM2.5 fluctuation patterns, deviations at certain points indicate room for improvement. Descriptive analysis showed an average PM2.5 concentration of 94.46 µg/m³, with peaks up to 209 µg/m³, exceeding healthy air quality thresholds. The model can be integrated into real-time monitoring systems and support data-driven policies. Future work could incorporate meteorological variables and evaluate longer-term trends to enhance accuracy.

Downloads

Download data is not yet available.

References

Agarwal, Amit. (2015). Proceedings on 2015 1st International Conference on Next Generation Computing Technologies (NGCT): September 4th–5th, 2015, Center for Information Technology, University of Petroleum and Energy Studies, Dehradun. IEEE.

Alyousifi, Y., Othman, M., Sokkalingam, R., Faye, I., & Silva, P. C. L. (2020). Predicting daily air pollution index based on fuzzy time series markov chain model. Symmetry, 12(2). https://doi.org/10.3390/sym12020293

Ameer, S., Shah, M. A., Khan, A., Song, H., Maple, C., Islam, S. U., & Asghar, M. N. (2019). Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities. IEEE Access, 7, 128325–128338. https://doi.org/10.1109/ACCESS.2019.2925082

Brokamp, C., Jandarov, R., Rao, M. B., LeMasters, G., & Ryan, P. (2017). Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches. Atmospheric Environment, 151, 1–11. https://doi.org/10.1016/j.atmosenv.2016.11.066

Chen, M. H., Chen, Y. C., Chou, T. Y., & Ning, F. S. (2023). PM2.5 Concentration Prediction Model: A CNN–RF Ensemble Framework. International Journal of Environmental Research and Public Health, 20(5). https://doi.org/10.3390/ijerph20054077

Dobrea, M., Badicu, A., Barbu, M., Subea, O., Balanescu, M., Suciu, G., Birdici, A., Orza, O., & Dobre, C. (2020). Machine Learning algorithms for air pollutants forecasting. 2020 IEEE 26th International Symposium for Design and Technology in Electronic Packaging (SIITME), 109–113. https://doi.org/10.1109/SIITME50350.2020.9292238

Goyal, K., & Goyal, S. (2024). Predicting PM2.5 Air Quality Using Random Forest Regression Enhanced with Polynomial Features. International IEEE Conference Proceedings, IS, 2024. https://doi.org/10.1109/IS61756.2024.10705219

Grell, G. A., Peckham, S. E., Schmitz, R., McKeen, S. A., Frost, G., Skamarock, W. C., & Eder, B. (2005). Fully coupled “online” chemistry within the WRF model. Atmospheric Environment, 39(37), 6957–6975. https://doi.org/10.1016/j.atmosenv.2005.04.027

Jamal, A., & Nodehi, R. N. (2017). Predicting air quality index based on meteorological data: A comparison of regression analysis, artificial neural networks, and decision tree. Journal of Air Pollution and Health, 2(1). http://japh.tums.ac.ir

Joharestani, M. Z., Cao, C., Ni, X., Bashir, B., & Talebiesfandarani, S. (2019). PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere, 10(7). https://doi.org/10.3390/atmos10070373

Kang, J., Zou, X., Tan, J., Li, J., & Karimian, H. (2023). Short-Term PM2.5 Concentration Changes Prediction: A Comparison of Meteorological and Historical Data. Sustainability (Switzerland), 15(14). https://doi.org/10.3390/su151411408

Liu, Y., Wang, Y., & Zhang, J. (2012). New Machine Learning Algorithm: Random Forest. In LNCS (Vol. 7473).

Livingston, F. (2005). Implementation of Breiman’s Random Forest Machine Learning Algorithm. ECE591Q Machine Learning Journal Paper, Fall.

Lu’, W., Wan&, W., Tileungl, A. Y., Lo’, S.-M., Kyued, R. K., Xu1, Z., & Fan’, H. (2002). Air Pollutant Parameter Forecasting Using Support Vector Machines.

Ma, X., Chen, T., Ge, R., Xv, F., Cui, C., & Li, J. (2023). Prediction of PM2.5 Concentration Using Spatiotemporal Data with Machine Learning Models. Atmosphere, 14(10). https://doi.org/10.3390/atmos14101517

Mauboy, L. M., Raihan Abhirama, M., Salsabila, S., & Kurniawan, R. (2024). Perbandingan Klasifikasi PM2.5 di Daerah Khusus Jakarta Algoritma C5.0, Random Forest, dan SVM. Seminar Nasional Sains Data, 2024.

Segal, M. R. (2003). Machine Learning Benchmarks and Random Forest Regression. UCSF Recent Work.

Sharma, M., Jain, S., Mittal, S., & Sheikh, T. H. (2021). Forecasting and prediction of air pollutants concentrates using machine learning techniques: The case of India. IOP Conference Series: Materials Science and Engineering, 1022(1). https://doi.org/10.1088/1757-899X/1022/1/012123

Uzir, N., Raman, S., Banerjee, S., & Nishant Uzir Sunil R, R. S. (2016). Experimenting XGBoost Algorithm for Prediction and Classification of Different Datasets. International Journal of Control Theory and Applications, 9. https://www.researchgate.net/publication/318132203

Yu, R., Yang, Y., Yang, L., Han, G., & Move, O. A. (2016). RAQ–A random forest approach for predicting air quality in urban sensing systems. Sensors (Switzerland), 16(1). https://doi.org/10.3390/s16010086

Most read articles by the same author(s)

Similar Articles

You may also start an advanced similarity search for this article.