Posts

Showing posts from July, 2020

taxi demand prediction II

Image
Now after cleaning, preprocessing we have data of jan 2015 and jan, feb, and march 2016. Now we will use some base line models to predict pickups in the next 10 minutes. 1. Simple Moving Average of ratio: 2.  Using Previous known values of the 2016 data itself to predict the future values Rt = Pt(2016) / Pt(2015) The First Model used is the Moving Averages Model which uses the previous n values in order to predict the next value for predicting pickup value of next 10 minutes, we have to take ratio of previous n 10 minutes bins. where n is hyper parameter. in ratio, we use both 2015 data and 2016 data. next is moving average of only 2016 values In the here also n is hyperparameter. we observed that using n=1 i.e just using previous value, we are getting minimum error. In the previous know values, we use only 2016 data. 3. Weighted moving average of ratio: Here N is the hyper parameter. 4. Weighted moving average of previous values: 5. ...

Taxi demand prediction

* Problem statement: Here we want to predict demand of taxi in next 10 minutes in each cluster. * Solution: Here data is huge hence we used dask library (dask divide large data into smaller chunks and later combine them) ['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime', 'passenger_count', 'trip_distance', 'pickup_longitude', 'pickup_latitude', 'RateCodeID', 'store_and_fwd_flag', 'dropoff_longitude', 'dropoff_latitude', 'payment_type', 'fare_amount', 'extra', 'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge', 'total_amount'] we have this data. * Metrics: Mean absolute percentage error and Mean squared error. * data cleaning: 1. In the data cleaning, we removed the records of coordinates outside the New-York city and only kept data inside of New-York city. 2. t...