عنوان مقاله [English]
The aim of this paper is to predict passenger train delays in Iranian Railways using data mining techniques. The results of this research are used to design train timetables. The data used in this research includes a database of passenger train delays from year 2013 to 2018, including 380,748 records. Independent variables for prediction model include year, month, day, day of the week, departure time, axis, train type, car type, origin and destination of the train and the train owner. In order to model prediction of train delay, two kind of prediction, named Numerical and Classification are used on entire database. Neural network and C5.0 methods are used for classification prediction. The Twostep clustering method is used to divide the delay field into three labels. Regression, CHAID and neural network methods are used for numerical prediction. To evaluate prediction results, we divide existing data set into two subsets called training set and test set, in which delays from year 2013 to 2016 are the training set and delays of year 2017 are the test set. By evaluating the prediction methods, the results show that in numerical prediction, neural network method and in prediction by classification, C5.0 method has higher accuracy than other methods. Therefore, these two techniques have been used to predict the train delays of year 2018. Numerical prediction is used by grouping some database fields. The results show that the prediction by grouping has higher accuracy than the prediction on the entire database.
- Van Oort, N. (2011) "Service reliability and urban public transport design service reliability", Ph.D. thesis TRAIL Research School.
- Wen, C., Li, Z., Lessan, J., Fu, L., Huang, P., & Jiang, C. (2017) "Statistical investigation on train primary delay based on real records: Evidence from Wuhan–Guangzhou HSR", International Journal of Rail Transportation, Vol. 5, No.3, pp.170-189
- Zhang, H., Li, S., & Yang, L. (2018) "Real-time optimal train regulation design for metro lines with energy-saving", Computers & Industrial Engineering, Vol. 127, pp.1282-1296.
- IBM SPSS Modeler 18.0 Algorithms Guide, (2016).
- Chen, M., Liu, X., Xia, J., Chien, S. (2004) "A Dynamic Bus-Arrival Time Prediction Model Based on APC Data", Computer-Aided Civil and Infrastructure Engineering, Vol. 19, No.5, p.p. 364–376.
- Yuan, J. (2006) "Stochastic Modelling of Train Delays and Delay Propagation in stations", PhD dissertation, Delft University of Technology, Faculty of Civil Engineering and Geosciences, Department of Transportation and Planning.
- Meer, D.J., Goverde, R.M.P., Hansen, I.A. (2010) "prediction of Train running Times and conflicting using track occupation data", 12th WCTR-World Congress of Transportation Research, Lisbon, Portugal, July 2010.
- Hansen, Ingo A., Rob MP Goverde, and Dirk J. van der Meer. (2010) "Online train delay recognition and running time prediction", 13th International IEEE Conference on Intelligent Transportation Systems, pp. 1783-1788.
- Clue, B., Goethals, B., Tassenoy, S., & Verboven, S. (2011) "Mining train delays", International Symposium on Intelligent Data Analysis, pp. 113-124.
- Yaghini, M., Khoshraftar, M. M., & Seyedabadi, M. (2013) "Railway passenger train delay prediction via neural network model", Journal of advanced transportation, Vol. 47, No.3, pp.355-368.
- Kecman, P., & Goverde, R. M. (2015b) "Predictive modelling of running and dwell times in railway traffic", Public Transport, Vol. 7, No.3, pp.295–319.
- Lessan, J., Fu, L., & Wen, C. (2018) "A hybrid Bayesian network model for predicting delays in train operations", Computers & Industrial Engineering, Vol. 127, pp.1214-1222.
- Seo, S. (2006) "A review and comparison of methods for detecting outliers in univariate data sets" (Doctoral dissertation, University of Pittsburgh).