Vanecus
  • Login
  • Register
  • Home
  • Big Data
  • Java
  • Scala
  • Python
  • MSBI
  • Airflow
  • About

Running RandomForestClassifier and RandomForestRegressor on a Hadoop Cluster using Apache Spark's PySpark API


PySpark serves as the powerful Python API for Apache Spark, seamlessly bridging the gap between Python's simplicity and Spark's distributed processing prowess. It empowers data scientists and engineers to write robust, large-scale data applications using familiar Python libraries while leveraging the full might of a Spark cluster. Beyond application development, PySpark offers an interactive shell for real-time, exploratory data analysis across distributed datasets, making it an indispensable tool for modern data exploration.

The framework provides comprehensive support for the entire Spark ecosystem, including:

  • Spark SQL & DataFrames: For structured data processing and querying using optimized execution plans.
  • Structured Streaming: For building scalable and fault-tolerant streaming applications.
  • MLlib (Machine Learning): For creating end-to-end, distributed machine learning pipelines.
  • Spark Core: The underlying execution engine that handles scheduling, task dispatching, and I/O operations.
In the following sections, we will demonstrate how to utilize PySpark and the scalable MLlib package to perform supervised machine learning tasks. This will be accomplished by running our analysis on a Hadoop cluster, using Hadoop Distributed File System (HDFS) for reliable data storage, thereby showcasing a complete big data analytics workflow.

This project outlines the development of predictive Classification and regression models using a modern big data stack We will leverage the scalability of Hadoop 3.3.0 for distributed storage and the high-performance processing engine of Spark 3.1.1 for in-memory analytics. Our goal is to build and evaluate both a Random forest Classification and a Random forest Regression model using a classic dataset: the yellow_tripdata_2014-08.csv file, which contains records of New York City yellow taxi trips for August 2014.


Image from Pixabay

If you are interested with the data you can collect it from here Click the link. 2014 Yellow NYC taxi trip Data . For Classification method, our task is to implement a model to predict for a given taxi trip, if a tip will be paid or not for a trip. And for Regression method, our task is to implement a model to predict for a given taxi trip, what is the expected tip amount for a trip. Our hadoop environment is three nodes cluster, one namenode and two datanodes.

Here we will use Python 3.8.5 version and PyCharm edition 2020.3 for this module.

Import libraries

The Spark, ML, and other libraries we'll need by using the following lines of code

        import os

        from pyspark.sql import SparkSession
        from pyspark.ml.feature import StringIndexer, OneHotEncoder
        from pyspark.mllib.evaluation import BinaryClassificationMetrics, RegressionMetrics
        from pyspark.mllib.regression import LabeledPoint
        from pyspark.mllib.tree import RandomForest
        from pyspark.sql.types import *
        import matplotlib.pyplot as plt
        import numpy as np

        os.environ['SPARK_HOME'] = '/usr/local/spark'
        SPARK_HOME = os.environ['SPARK_HOME']
    

Data Exploration

At first we ingest the data that we want to analyze. The data is brought from external sources or systems where it resides into data exploration and modeling environment. The data exploration and modeling environment is Spark. Firstly we need to make sure our source of data that is our dataset files are present in HDFS where we expect to read them from our spark jobs. To put the files in HDFS first bring the files to the operating system i.e. Linux in our case and from Linux we copy them to HDFS using the following command

hdfs dfs -put <localsrc> ... <HDFS_dest_Path> Here we have used -put command to put a local file to hdfs.

We will first load dataset using Apache Spark and see the total numbers of rows, header, and first 5 rows of our dataset. For these our lines of codes are shown bellow

        spark = SparkSession \
            .builder\
            .master("local[*]")\
            .appName("NycApp")\
            .getOrCreate()

        dataRaw = spark.sparkContext\
            .textFile("hdfs://master:9000/data/data/yellow_tripdata_2014-08.csv")
        header = dataRaw.first();
    

Now we print the header of the dataset.

        print(header)
    

Output:

        vendor_id, pickup_datetime, dropoff_datetime, passenger_count, trip_distance, pickup_longitude, pickup_latitude, rate_code, store_and_fwd_flag, dropoff_longitude, dropoff_latitude, payment_type, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount
    

Total numbers of records of Dataset

        print("Total Records : " + str(dataRaw.count()))
    

Output:

        Total Records : 12688879
    

The code of first 5 rows of the dataset is as follows

        for x in dataRaw.take(5):
            print(x)
    

OutPut:

        vendor_id, pickup_datetime, dropoff_datetime, passenger_count, trip_distance, pickup_longitude, pickup_latitude, rate_code, store_and_fwd_flag, dropoff_longitude, dropoff_latitude, payment_type, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount

        CMT,2014-08-16 14:58:49,2014-08-16 15:15:59,1,2.7000000000000002,-73.946537000000006,40.776812999999997,1,N,-73.976192999999995,40.755625000000002,CSH,14,0,0.5,0,0,14.5
        CMT,2014-08-16 08:10:48,2014-08-16 08:58:16,3,20.399999999999999,-73.776857000000007,40.645099000000002,1,Y,-73.916248999999993,40.837356999999997,CSH,58.5,0,0.5,0,5.3300000000000001,64.329999999999998
        CMT,2014-08-16 09:44:07,2014-08-16 09:54:37,1,2.1000000000000001,-73.986585000000005,40.725847999999999,1,N,-73.977157000000005,40.751961000000001,CSH,9.5,0,0.5,0,0,10
    

Now we can see the dataset rows printed above, We loaded the dataset from an HDFS location and stored it in an RDD of strings. Fortunately, the dataset is relatively clean and has one row per data item but it contains a empty row. Next, we remove the header, delete the empty row and again print 5 rows from the dataset, we run the codes following

        dataLines = dataRaw.filter(lambda x: x != header).filter(lambda y: y !="")
        for y in dataLines.take(5):
            print(y)
    

Output:

        CMT,2014-08-16 14:58:49,2014-08-16 15:15:59,1,2.7000000000000002,-73.946537000000006,40.776812999999997,1,N,-73.976192999999995,40.755625000000002,CSH,14,0,0.5,0,0,14.5
        CMT,2014-08-16 08:10:48,2014-08-16 08:58:16,3,20.399999999999999,-73.776857000000007,40.645099000000002,1,Y,-73.916248999999993,40.837356999999997,CSH,58.5,0,0.5,0,5.3300000000000001,64.329999999999998
        CMT,2014-08-16 09:44:07,2014-08-16 09:54:37,1,2.1000000000000001,-73.986585000000005,40.725847999999999,1,N,-73.977157000000005,40.751961000000001,CSH,9.5,0,0.5,0,0,10
        CMT,2014-08-16 10:46:13,2014-08-16 10:51:25,1,1.3,-73.976290000000006,40.765231,1,N,-73.961484999999996,40.777889000000002,CSH,6,0,0.5,0,0,6.5
        CMT,2014-08-16 09:27:23,2014-08-16 09:39:37,2,1.7,-73.995248000000004,40.754646000000001,1,Y,-73.995902999999998,40.769201000000002,CSH,10.5,0,0.5,0,0,11
    

As seen, the rows of dataset are fine, now we generate schema based on the column strings of the header of the dataset, cast variables according to the schema, and create an initial dataframe and lastly see 10 rows of the dataset, for these we run the following codes

        fields = [StructField(field_name, StringType(), True) for field_name in header.split(', ')]

        fields[0].dataType = StringType() #vendor_id
        fields[1].dataType = StringType() #pickup_datetime
        fields[2].dataType = StringType() #dropoff_datetime
        fields[3].dataType = FloatType() #passenger_count
        fields[4].dataType = FloatType() # trip_distance
        fields[5].dataType = FloatType() # pickup_longitude
        fields[6].dataType = FloatType() # pickup_latitude
        fields[7].dataType = FloatType() # rate_code
        fields[8].dataType = StringType() # store_and_fwd_flag
        fields[9].dataType = FloatType() # dropoff_longitude
        fields[10].dataType = FloatType() # dropoff_latitude
        fields[11].dataType = StringType() # payment_type
        fields[12].dataType = FloatType() # fare_amount
        fields[13].dataType = FloatType() # surcharge
        fields[14].dataType = FloatType() # mta_tax
        fields[15].dataType = FloatType() # tip_amount
        fields[16].dataType = FloatType() # tolls_amount
        fields[17].dataType = FloatType() # total_amount
        schema = StructType(fields)
        rowRDD = dataLines.map(lambda x: x.split(",")) \
            .map(lambda r: (r[0], r[1], r[2], float(r[3]), float(r[4]), float(r[5]), float(r[6]), float(r[7]), r[8],
                            float(r[9]), float(r[10]), r[11], float(r[12]), float(r[13]), float(r[14]), float(r[15]),
                            float(r[16]), float(r[17])
                            ))
        dataDF = spark.createDataFrame(rowRDD, schema)
        dataDF.show(10)
    

Output:

        +---------+-------------------+-------------------+---------------+-------------+----------------+---------------+---------+------------------+-----------------+----------------+------------+-----------+---------+-------+----------+------------+------------+
        |vendor_id|    pickup_datetime|   dropoff_datetime|passenger_count|trip_distance|pickup_longitude|pickup_latitude|rate_code|store_and_fwd_flag|dropoff_longitude|dropoff_latitude|payment_type|fare_amount|surcharge|mta_tax|tip_amount|tolls_amount|total_amount|
        +---------+-------------------+-------------------+---------------+-------------+----------------+---------------+---------+------------------+-----------------+----------------+------------+-----------+---------+-------+----------+------------+------------+
        |      CMT|2014-08-16 14:58:49|2014-08-16 15:15:59|            1.0|          2.7|      -73.946537|      40.776813|      1.0|                 N|       -73.976193|       40.755625|         CSH|       14.0|      0.0|    0.5|       0.0|         0.0|        14.5|
        |      CMT|2014-08-16 08:10:48|2014-08-16 08:58:16|            3.0|         20.4|      -73.776857|      40.645099|      1.0|                 Y|       -73.916249|       40.837357|         CSH|       58.5|      0.0|    0.5|       0.0|        5.33|       64.33|
        |      CMT|2014-08-16 09:44:07|2014-08-16 09:54:37|            1.0|          2.1|      -73.986585|      40.725848|      1.0|                 N|       -73.977157|       40.751961|         CSH|        9.5|      0.0|    0.5|       0.0|         0.0|        10.0|
        |      CMT|2014-08-16 10:46:13|2014-08-16 10:51:25|            1.0|          1.3|       -73.97629|      40.765231|      1.0|                 N|       -73.961485|       40.777889|         CSH|        6.0|      0.0|    0.5|       0.0|         0.0|         6.5|
        |      CMT|2014-08-16 09:27:23|2014-08-16 09:39:37|            2.0|          1.7|      -73.995248|      40.754646|      1.0|                 Y|       -73.995903|       40.769201|         CSH|       10.5|      0.0|    0.5|       0.0|         0.0|        11.0|
        |      CMT|2014-08-16 14:14:16|2014-08-16 14:25:33|            2.0|          1.7|      -73.991535|      40.759863|      1.0|                 N|       -74.005722|       40.737558|         CSH|       10.0|      0.0|    0.5|       0.0|         0.0|        10.5|
        |      CMT|2014-08-16 15:55:16|2014-08-16 16:00:10|            1.0|          1.0|      -73.972307|      40.794076|      1.0|                 N|       -73.963865|       40.807858|         CSH|        6.0|      0.0|    0.5|       0.0|         0.0|         6.5|
        |      CMT|2014-08-16 14:08:29|2014-08-16 14:32:03|            1.0|          9.2|      -73.967338|      40.766009|      1.0|                 N|       -73.872972|       40.774487|         CSH|       28.5|      0.0|    0.5|       0.0|         0.0|        29.0|
        |      CMT|2014-08-16 11:11:21|2014-08-16 11:23:48|            1.0|          2.6|      -73.973775|      40.794591|      1.0|                 N|       -73.970561|       40.768086|         CSH|       11.5|      0.0|    0.5|       0.0|         0.0|        12.0|
        |      CMT|2014-08-16 07:44:56|2014-08-16 07:49:26|            1.0|          1.4|       -73.98636|      40.737913|      1.0|                 N|       -73.977117|       40.751126|         CSH|        6.0|      0.0|    0.5|       0.0|         0.0|         6.5|
        +---------+-------------------+-------------------+---------------+-------------+----------------+---------------+---------+------------------+-----------------+----------------+------------+-----------+---------+-------+----------+------------+------------+
        only showing top 10 rows
    

We are not interested with all the columns of the dataset, we create a cleaned data frame by droping unwanted columns and filtering unwanted values, cache and materialize the data frame in memory, and register the cleaned data frame as a temporary table in sqlcontext.

        dataDF_cleaned = dataDF.drop('store_and_fwd_flag').drop('pickup_datetime').drop('dropoff_datetime').drop('pickup_longitude')\
            .drop('pickup_latitude').drop('dropoff_longitude').drop('dropoff_latitude').drop('surcharge')\
            .drop('mta_tax').drop('tolls_amount').drop('total_amount')\
            .filter("passenger_count > 0 AND fare_amount >= 1 AND trip_distance > 0")
        dataDF_cleaned.createOrReplaceTempView("tempView")
        dataDF_cleaned.cache
    

Data Visualization

In this section, we examine the data by using SQL queries and import the results into a data frame to plot the target variables and prospective features for visual inspection by using the automatic visualization.

Counts of trips by passenger

        plotDF1 = spark.sql("Select passenger_count, COUNT(*) AS trip_counts " +
                    "FROM tempView " +
                    "WHERE passenger_count > 0 and passenger_count < 7 " +
                    "GROUP BY passenger_count Order by passenger_count")
        
        plotDF1P = plotDF1.toPandas()

        x_labels = plotDF1P['passenger_count'].values
        fig = plotDF1P['trip_counts'].plot(kind='bar', facecolor='lightblue')
        fig.set_xticklabels(x_labels)
        fig.set_title('Counts of trips by Passenger count')
        fig.set_xlabel('Passenger count in Trips')
        fig.set_ylabel('Trip Counts')
        plt.show()
    

Output:

    



    Image of Seal



    

SQL Query and Data frame:

        plotDF2 = spark.sql("SELECT fare_amount, passenger_count, tip_amount " +
                    "FROM tempView " +
                    "WHERE passenger_count > 0 AND passenger_count < 7 AND " +
                    "fare_amount > 0 AND fare_amount < 200 AND payment_type in ('CSH', 'CRD') AND " +
                    "tip_amount > 0 AND tip_amount < 25")

        #plotDF2.show()
        plotDF2P = plotDF2.toPandas()
    

Histogram of tip amount

        ax1 = plotDF2P[['tip_amount']].plot(kind='hist', bins=25, facecolor='lightblue')
        ax1.set_title('Tip amount distribution')
        ax1.set_xlabel('Tip Amount ($)')
        ax1.set_ylabel('Counts')
        plt.suptitle('')
        plt.show()
    

Output:

    



    

Relationship between tip amount and Passenger Count

        ax2 = plotDF2P.boxplot(column=['tip_amount'], by=['passenger_count'])
        ax2.set_title('Tip amount by Passenger count')
        ax2.set_xlabel('Passenger count')
        ax2.set_ylabel('Tip Amount ($)')
        plt.suptitle('')
        plt.show()
    

Output:

    



    

Relationship between tip amount and Fare Amount

        ax = plotDF2P.plot(kind='scatter', x= 'fare_amount', y = 'tip_amount', c='blue', alpha = 0.01, s=2*(plotDF2P.passenger_count))
        ax.set_title('Tip amount by Fare amount')
        ax.set_xlabel('Fare Amount ($)')
        ax.set_ylabel('Tip Amount ($)')
        plt.axis([-2, 80, -2, 20])
        plt.show()
    

Output:

    



    

Feature engineering, transformation and data preparation for modeling

Next, we create a new feature tipped, if the tip_amount is non-zero, then this returns 1, else 0 in our case. We build a classifier with this target value later on.

        sqlQuery = "SELECT *, CASE  WHEN tip_amount > 0 THEN CAST(1.0 as Double) ELSE CAST(0.0 as Double) END AS tipped FROM tempView"
        data_NewFeature = spark.sql(sqlQuery)
        data_NewFeature.show()
    

Output:

        +---------+---------------+-------------+---------+------------+-----------+----------+------+
        |vendor_id|passenger_count|trip_distance|rate_code|payment_type|fare_amount|tip_amount|tipped|
        +---------+---------------+-------------+---------+------------+-----------+----------+------+
        |      CMT|            1.0|          2.7|      1.0|         CSH|       14.0|       0.0|   0.0|
        |      CMT|            3.0|         20.4|      1.0|         CSH|       58.5|       0.0|   0.0|
        |      CMT|            1.0|          2.1|      1.0|         CSH|        9.5|       0.0|   0.0|
        |      CMT|            1.0|          1.3|      1.0|         CSH|        6.0|       0.0|   0.0|
        |      CMT|            2.0|          1.7|      1.0|         CSH|       10.5|       0.0|   0.0|
        |      CMT|            2.0|          1.7|      1.0|         CSH|       10.0|       0.0|   0.0|
        |      CMT|            1.0|          1.0|      1.0|         CSH|        6.0|       0.0|   0.0|
        |      CMT|            1.0|          9.2|      1.0|         CSH|       28.5|       0.0|   0.0|
        |      CMT|            1.0|          2.6|      1.0|         CSH|       11.5|       0.0|   0.0|
        |      CMT|            1.0|          1.4|      1.0|         CSH|        6.0|       0.0|   0.0|
        |      CMT|            4.0|          3.2|      1.0|         CSH|       13.0|       0.0|   0.0|
        |      CMT|            1.0|          7.8|      1.0|         CSH|       25.0|       0.0|   0.0|
        |      CMT|            1.0|          1.1|      1.0|         CSH|        5.5|       0.0|   0.0|
        |      CMT|            1.0|          3.3|      1.0|         CSH|       15.5|       0.0|   0.0|
        |      CMT|            1.0|          5.3|      1.0|         CSH|       19.5|       0.0|   0.0|
        |      CMT|            1.0|          6.2|      1.0|         CSH|       19.5|       0.0|   0.0|
        |      CMT|            1.0|         15.6|      2.0|         CSH|       52.0|       0.0|   0.0|
        |      CMT|            2.0|          0.9|      1.0|         CSH|        6.0|       0.0|   0.0|
        |      CMT|            1.0|          1.4|      1.0|         CSH|        9.0|       0.0|   0.0|
        |      CMT|            2.0|          1.2|      1.0|         CSH|        7.0|       0.0|   0.0|
        +---------+---------------+-------------+---------+------------+-----------+----------+------+
        only showing top 20 rows
    

Now, we figure out the average, minimum, maximum, etc. of columns, as this give us general idea about the range of values and other statistics. Apache Spark SQL provides us with a handy describe method that will help us to calculate these values.

        data_NewFeature.describe("passenger_count","trip_distance","rate_code","fare_amount","tip_amount").show()
    

Output:

        +-------+------------------+------------------+-------------------+------------------+------------------+
        |summary|   passenger_count|     trip_distance|          rate_code|       fare_amount|        tip_amount|
        +-------+------------------+------------------+-------------------+------------------+------------------+
        |  count|          12612051|          12612051|           12612051|          12612051|          12612051|
        |   mean| 1.711963105762893| 3.081084886986297| 1.0316361708337525|12.782634287634899|1.4699661752082924|
        | stddev|1.3616351834672402|3.6140504506213844|0.28838467968053266|10.461028944648847| 2.272277866399201|
        |    min|               1.0|              0.01|                0.0|               2.5|               0.0|
        |    max|               9.0|             100.0|              221.0|             500.0|             200.0|
        +-------+------------------+------------------+-------------------+------------------+------------------+
    

For modeling function from ML and MLlib, requires to prepare target and features by using a variety of techniques, such as indexing, one-hot encoding, and vectorization etc. Here are the procedures to follow in this section.

The dataset contains categorical fields: vendor_id, rate_code, and payment_type. Therefore, we need to convert these into indexed fields, because our models are mathematical and understand only numerical values.To do this, for indexing, we use StringIndexer() functions. Here is the code to index categorical features.

        vendor_idIndexer = StringIndexer()\
            .setInputCol("vendor_id")\
            .setOutputCol("vendor_idIndex")
        Indexedvendor_id = vendor_idIndexer.fit(data_NewFeature).transform(data_NewFeature)
        #Indexedvendor_id.show()

        rate_codeIndexer = StringIndexer()\
            .setInputCol("rate_code")\
            .setOutputCol("rate_codeIndex")
        Indexedrate_code = rate_codeIndexer.fit(Indexedvendor_id).transform(Indexedvendor_id)


        payment_typeIndexer = StringIndexer()\
            .setInputCol("payment_type")\
            .setOutputCol("payment_typeIndex")
        IndexedFinal = payment_typeIndexer.fit(Indexedrate_code).transform(Indexedrate_code)
        IndexedFinal.show()
    

Output:

        +---------+---------------+-------------+---------+------------+-----------+----------+------+--------------+--------------+-----------------+
        |vendor_id|passenger_count|trip_distance|rate_code|payment_type|fare_amount|tip_amount|tipped|vendor_idIndex|rate_codeIndex|payment_typeIndex|
        +---------+---------------+-------------+---------+------------+-----------+----------+------+--------------+--------------+-----------------+
        |      CMT|            1.0|          2.7|      1.0|         CSH|       14.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            3.0|         20.4|      1.0|         CSH|       58.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          2.1|      1.0|         CSH|        9.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          1.3|      1.0|         CSH|        6.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            2.0|          1.7|      1.0|         CSH|       10.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            2.0|          1.7|      1.0|         CSH|       10.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          1.0|      1.0|         CSH|        6.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          9.2|      1.0|         CSH|       28.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          2.6|      1.0|         CSH|       11.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          1.4|      1.0|         CSH|        6.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            4.0|          3.2|      1.0|         CSH|       13.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          7.8|      1.0|         CSH|       25.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          1.1|      1.0|         CSH|        5.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          3.3|      1.0|         CSH|       15.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          5.3|      1.0|         CSH|       19.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          6.2|      1.0|         CSH|       19.5|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|         15.6|      2.0|         CSH|       52.0|       0.0|   0.0|           1.0|           1.0|              1.0|
        |      CMT|            2.0|          0.9|      1.0|         CSH|        6.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            1.0|          1.4|      1.0|         CSH|        9.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        |      CMT|            2.0|          1.2|      1.0|         CSH|        7.0|       0.0|   0.0|           1.0|           0.0|              1.0|
        +---------+---------------+-------------+---------+------------+-----------+----------+------+--------------+--------------+-----------------+
        only showing top 20 rows
    

Functions for classification and regression:

         def parseRowIndexingClassification(line):
             features = np.array([line.vendor_idIndex, line.rate_codeIndex, line.payment_typeIndex,
                                  line.passenger_count, line.trip_distance, line.fare_amount])
             labPt = LabeledPoint(line.tipped, features)
             return  labPt
        
        
         def parseRowIndexingRegression(line):
             features = np.array([line.vendor_idIndex, line.rate_codeIndex, line.payment_typeIndex,
                                  line.passenger_count, line.trip_distance, line.fare_amount])
             labPt = LabeledPoint(line.tip_amount, features)
             return  labPt
    

Now, we create a random sampling of the data, as needed (25% is used here). This can save some time while training models. Then, split into train/test, and create indexed train/test LabeledPoint data objects for input into MLlib for classification and regression modeling.

        trainData, testData = FinalSampled.randomSplit([trainingFraction, testingFraction], seed=seed);

        #print("Train : " + str(trainData.count()) + " test : " + str(testData.count()))
    

Random Forests Classification:

        indexedTRAINClassification = trainData.rdd.map(parseRowIndexingClassification)
        indexedTESTClassification = testData.rdd.map(parseRowIndexingClassification)
    

Finally, we will train the random forsets Classification model specifying the number of categories for the categorical featues and print the trees.

        categoricalFeaturesInfo={0:2, 1:12, 2:5}

        rfModel_Classification = RandomForest.trainClassifier(indexedTRAINClassification, numClasses=2,
                                               categoricalFeaturesInfo=categoricalFeaturesInfo,
                                               numTrees=25, featureSubsetStrategy="auto",
                                               impurity='gini', maxDepth=5, maxBins=32)

        print('Learned classification forest model:')
        print(rfModel_Classification.toDebugString())
    

Output:

        Learned classification forest model:
        TreeEnsembleModel classifier with 25 trees

          Tree 0:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 1:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 2:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 0 in {1.0})
              If (feature 5 <= 20.25)
               If (feature 4 <= 0.48499999940395355)
                If (feature 1 in {1.0})
                 Predict: 0.0
                Else (feature 1 not in {1.0})
                 Predict: 1.0
               Else (feature 4 > 0.48499999940395355)
                Predict: 1.0
              Else (feature 5 > 20.25)
               Predict: 1.0
             Else (feature 0 not in {1.0})
              Predict: 1.0
          Tree 3:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 1 in {6.0,5.0,4.0,3.0})
              If (feature 1 in {6.0,5.0,4.0})
               Predict: 1.0
              Else (feature 1 not in {6.0,5.0,4.0})
               If (feature 4 <= 0.8449999988079071)
                If (feature 4 <= 0.7950000166893005)
                 Predict: 1.0
                Else (feature 4 > 0.7950000166893005)
                 Predict: 0.0
               Else (feature 4 > 0.8449999988079071)
                Predict: 1.0
             Else (feature 1 not in {6.0,5.0,4.0,3.0})
              Predict: 1.0
          Tree 4:
            If (feature 5 <= 5.75)
             If (feature 3 <= 1.5)
              If (feature 2 in {1.0,3.0,4.0})
               Predict: 0.0
              Else (feature 2 not in {1.0,3.0,4.0})
               Predict: 1.0
             Else (feature 3 > 1.5)
              If (feature 4 <= 0.48499999940395355)
               If (feature 2 in {1.0,3.0,4.0})
                Predict: 0.0
               Else (feature 2 not in {1.0,3.0,4.0})
                Predict: 1.0
              Else (feature 4 > 0.48499999940395355)
               If (feature 2 in {1.0,3.0,4.0})
                Predict: 0.0
               Else (feature 2 not in {1.0,3.0,4.0})
                Predict: 1.0
            Else (feature 5 > 5.75)
             If (feature 3 <= 1.5)
              If (feature 4 <= 1.6950000524520874)
               If (feature 0 in {0.0})
                If (feature 2 in {1.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0})
                 Predict: 1.0
               Else (feature 0 not in {0.0})
                If (feature 2 in {1.0,3.0,4.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0,3.0,4.0})
                 Predict: 1.0
              Else (feature 4 > 1.6950000524520874)
               Predict: 1.0
             Else (feature 3 > 1.5)
              If (feature 2 in {1.0,3.0,4.0})
               Predict: 0.0
              Else (feature 2 not in {1.0,3.0,4.0})
               Predict: 1.0
          Tree 5:
            If (feature 4 <= 1.1449999809265137)
             If (feature 2 in {1.0,3.0,4.0})
              Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              If (feature 5 <= 20.25)
               Predict: 1.0
              Else (feature 5 > 20.25)
               If (feature 5 <= 59.75)
                If (feature 2 in {2.0})
                 Predict: 0.0
                Else (feature 2 not in {2.0})
                 Predict: 1.0
               Else (feature 5 > 59.75)
                Predict: 1.0
            Else (feature 4 > 1.1449999809265137)
             If (feature 3 <= 1.5)
              If (feature 4 <= 1.6950000524520874)
               If (feature 2 in {1.0,3.0,4.0})
                Predict: 0.0
               Else (feature 2 not in {1.0,3.0,4.0})
                Predict: 1.0
              Else (feature 4 > 1.6950000524520874)
               If (feature 2 in {1.0,3.0,4.0})
                Predict: 0.0
               Else (feature 2 not in {1.0,3.0,4.0})
                Predict: 1.0
             Else (feature 3 > 1.5)
              If (feature 3 <= 4.5)
               If (feature 0 in {1.0})
                If (feature 2 in {1.0,3.0,4.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0,3.0,4.0})
                 Predict: 1.0
               Else (feature 0 not in {1.0})
                If (feature 2 in {1.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0})
                 Predict: 1.0
              Else (feature 3 > 4.5)
               If (feature 4 <= 2.084999918937683)
                If (feature 0 in {1.0})
                 Predict: 0.0
                Else (feature 0 not in {1.0})
                 Predict: 1.0
               Else (feature 4 > 2.084999918937683)
                Predict: 1.0
          Tree 6:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 0 in {1.0})
              If (feature 1 in {6.0,5.0,4.0,3.0})
               If (feature 4 <= 0.5949999988079071)
                If (feature 4 <= 0.48499999940395355)
                 Predict: 1.0
                Else (feature 4 > 0.48499999940395355)
                 Predict: 0.0
               Else (feature 4 > 0.5949999988079071)
                Predict: 1.0
              Else (feature 1 not in {6.0,5.0,4.0,3.0})
               Predict: 1.0
             Else (feature 0 not in {1.0})
              Predict: 1.0
          Tree 7:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 1 in {5.0,4.0,3.0})
              If (feature 4 <= 0.5949999988079071)
               If (feature 3 <= 2.5)
                Predict: 1.0
               Else (feature 3 > 2.5)
                If (feature 5 <= 59.75)
                 Predict: 1.0
                Else (feature 5 > 59.75)
                 Predict: 0.0
              Else (feature 4 > 0.5949999988079071)
               Predict: 1.0
             Else (feature 1 not in {5.0,4.0,3.0})
              Predict: 1.0
          Tree 8:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 9:
            If (feature 2 in {1.0,3.0,4.0})
             If (feature 4 <= 9.224999904632568)
              Predict: 0.0
             Else (feature 4 > 9.224999904632568)
              If (feature 1 in {4.0,5.0,0.0})
               Predict: 0.0
              Else (feature 1 not in {4.0,5.0,0.0})
               If (feature 5 <= 40.75)
                If (feature 1 in {2.0,3.0})
                 Predict: 0.0
                Else (feature 1 not in {2.0,3.0})
                 Predict: 1.0
               Else (feature 5 > 40.75)
                Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 1 in {5.0,4.0,3.0})
              If (feature 3 <= 4.5)
               Predict: 1.0
              Else (feature 3 > 4.5)
               If (feature 0 in {1.0})
                If (feature 4 <= 0.7950000166893005)
                 Predict: 0.0
                Else (feature 4 > 0.7950000166893005)
                 Predict: 1.0
               Else (feature 0 not in {1.0})
                Predict: 1.0
             Else (feature 1 not in {5.0,4.0,3.0})
              Predict: 1.0
          Tree 10:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 1 in {5.0,4.0,3.0})
              If (feature 4 <= 0.5949999988079071)
               If (feature 3 <= 3.5)
                Predict: 1.0
               Else (feature 3 > 3.5)
                Predict: 0.0
              Else (feature 4 > 0.5949999988079071)
               Predict: 1.0
             Else (feature 1 not in {5.0,4.0,3.0})
              Predict: 1.0
          Tree 11:
            If (feature 2 in {1.0,3.0,4.0})
             If (feature 1 in {5.0,1.0,6.0,9.0,2.0,3.0,4.0})
              If (feature 5 <= 40.75)
               If (feature 3 <= 3.5)
                Predict: 0.0
               Else (feature 3 > 3.5)
                If (feature 1 in {2.0,3.0,4.0})
                 Predict: 0.0
                Else (feature 1 not in {2.0,3.0,4.0})
                 Predict: 1.0
              Else (feature 5 > 40.75)
               Predict: 0.0
             Else (feature 1 not in {5.0,1.0,6.0,9.0,2.0,3.0,4.0})
              Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 0 in {1.0})
              If (feature 1 in {5.0,4.0,3.0})
               If (feature 4 <= 0.7950000166893005)
                If (feature 3 <= 4.5)
                 Predict: 1.0
                Else (feature 3 > 4.5)
                 Predict: 0.0
               Else (feature 4 > 0.7950000166893005)
                Predict: 1.0
              Else (feature 1 not in {5.0,4.0,3.0})
               Predict: 1.0
             Else (feature 0 not in {1.0})
              Predict: 1.0
          Tree 12:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 1 in {6.0,5.0,4.0,3.0})
              If (feature 4 <= 1.6950000524520874)
               If (feature 1 in {5.0,4.0})
                Predict: 1.0
               Else (feature 1 not in {5.0,4.0})
                If (feature 5 <= 5.25)
                 Predict: 0.0
                Else (feature 5 > 5.25)
                 Predict: 1.0
              Else (feature 4 > 1.6950000524520874)
               Predict: 1.0
             Else (feature 1 not in {6.0,5.0,4.0,3.0})
              Predict: 1.0
          Tree 13:
            If (feature 4 <= 1.1449999809265137)
             If (feature 4 <= 0.5949999988079071)
              If (feature 2 in {1.0,3.0,4.0})
               Predict: 0.0
              Else (feature 2 not in {1.0,3.0,4.0})
               Predict: 1.0
             Else (feature 4 > 0.5949999988079071)
              If (feature 3 <= 1.5)
               If (feature 5 <= 5.25)
                If (feature 2 in {1.0,3.0,4.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0,3.0,4.0})
                 Predict: 1.0
               Else (feature 5 > 5.25)
                If (feature 4 <= 0.8449999988079071)
                 Predict: 0.0
                Else (feature 4 > 0.8449999988079071)
                 Predict: 1.0
              Else (feature 3 > 1.5)
               If (feature 0 in {1.0})
                If (feature 2 in {1.0,3.0,4.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0,3.0,4.0})
                 Predict: 1.0
               Else (feature 0 not in {1.0})
                If (feature 4 <= 0.7950000166893005)
                 Predict: 0.0
                Else (feature 4 > 0.7950000166893005)
                 Predict: 1.0
            Else (feature 4 > 1.1449999809265137)
             If (feature 2 in {1.0,3.0,4.0})
              Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              Predict: 1.0
          Tree 14:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 15:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 16:
            If (feature 4 <= 1.1449999809265137)
             If (feature 2 in {1.0,3.0,4.0})
              Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              If (feature 0 in {1.0})
               If (feature 5 <= 20.25)
                Predict: 1.0
               Else (feature 5 > 20.25)
                If (feature 5 <= 40.75)
                 Predict: 0.0
                Else (feature 5 > 40.75)
                 Predict: 1.0
              Else (feature 0 not in {1.0})
               Predict: 1.0
            Else (feature 4 > 1.1449999809265137)
             If (feature 2 in {1.0,3.0,4.0})
              Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              Predict: 1.0
          Tree 17:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 18:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 0 in {1.0})
              If (feature 1 in {5.0,4.0,3.0})
               If (feature 4 <= 3.5749999284744263)
                If (feature 5 <= 4.75)
                 Predict: 0.0
                Else (feature 5 > 4.75)
                 Predict: 1.0
               Else (feature 4 > 3.5749999284744263)
                Predict: 1.0
              Else (feature 1 not in {5.0,4.0,3.0})
               Predict: 1.0
             Else (feature 0 not in {1.0})
              Predict: 1.0
          Tree 19:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 4 <= 5.454999923706055)
              If (feature 0 in {1.0})
               Predict: 1.0
              Else (feature 0 not in {1.0})
               If (feature 1 in {4.0,2.0,1.0,3.0})
                If (feature 5 <= 59.75)
                 Predict: 1.0
                Else (feature 5 > 59.75)
                 Predict: 0.0
               Else (feature 1 not in {4.0,2.0,1.0,3.0})
                Predict: 1.0
             Else (feature 4 > 5.454999923706055)
              Predict: 1.0
          Tree 20:
            If (feature 4 <= 1.1449999809265137)
             If (feature 2 in {1.0,3.0,4.0})
              Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              If (feature 0 in {1.0})
               Predict: 1.0
              Else (feature 0 not in {1.0})
               If (feature 1 in {2.0,4.0,3.0})
                If (feature 1 in {2.0})
                 Predict: 0.0
                Else (feature 1 not in {2.0})
                 Predict: 1.0
               Else (feature 1 not in {2.0,4.0,3.0})
                Predict: 1.0
            Else (feature 4 > 1.1449999809265137)
             If (feature 3 <= 1.5)
              If (feature 2 in {1.0,3.0,4.0})
               Predict: 0.0
              Else (feature 2 not in {1.0,3.0,4.0})
               Predict: 1.0
             Else (feature 3 > 1.5)
              If (feature 0 in {1.0})
               If (feature 4 <= 2.084999918937683)
                Predict: 0.0
               Else (feature 4 > 2.084999918937683)
                If (feature 3 <= 2.5)
                 Predict: 1.0
                Else (feature 3 > 2.5)
                 Predict: 0.0
              Else (feature 0 not in {1.0})
               If (feature 5 <= 9.25)
                If (feature 2 in {1.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0})
                 Predict: 1.0
               Else (feature 5 > 9.25)
                If (feature 2 in {1.0})
                 Predict: 0.0
                Else (feature 2 not in {1.0})
                 Predict: 1.0
          Tree 21:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 22:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             Predict: 1.0
          Tree 23:
            If (feature 4 <= 1.1449999809265137)
             If (feature 2 in {1.0,3.0,4.0})
              Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              Predict: 1.0
            Else (feature 4 > 1.1449999809265137)
             If (feature 2 in {1.0,3.0,4.0})
              Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              Predict: 1.0
          Tree 24:
            If (feature 2 in {1.0,3.0,4.0})
             Predict: 0.0
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 0 in {1.0})
              Predict: 1.0
             Else (feature 0 not in {1.0})
              If (feature 1 in {4.0,3.0})
               If (feature 5 <= 59.75)
                Predict: 1.0
               Else (feature 5 > 59.75)
                If (feature 4 <= 3.5749999284744263)
                 Predict: 0.0
                Else (feature 4 > 3.5749999284744263)
                 Predict: 1.0
              Else (feature 1 not in {4.0,3.0})
               Predict: 1.0
    

Prediction and Evaluation on test data:

        predictions_classification = rfModel_Classification.predict(indexedTESTClassification.map(lambda x: x.features))
        predictionAndLabels_classification = indexedTESTClassification.map(lambda lp: lp.label).zip(predictions_classification)

        # Area under ROC curve
        metrics_classification = BinaryClassificationMetrics(predictionAndLabels_classification)
        print("Area under ROC = %s" % metrics_classification.areaUnderROC)
    

Output:

        Area under ROC = 0.981367320801355
    

Random Forests Regression:

        indexedTRAINRegression = trainData.rdd.map(parseRowIndexingRegression)
        indexedTESTRegression = testData.rdd.map(parseRowIndexingRegression)
    

Finally, we will train the random forsets Regression model specifying the number of categories for the categorical featues and print the trees.

        categoricalFeaturesInfo={0:2, 1:12, 2:5}

        rfModel_Regression = RandomForest.trainRegressor(indexedTRAINRegression,
                                       categoricalFeaturesInfo=categoricalFeaturesInfo,
                                       numTrees=25, featureSubsetStrategy="auto",
                                       impurity='variance', maxDepth=10, maxBins=32)

        print('Learned regression forest model:')
        print(rfModel_Regression.toDebugString())
    

Output:

        Learned regression forest model:
        TreeEnsembleModel regressor with 25 trees

          Tree 0:
            If (feature 5 <= 24.25)
             If (feature 2 in {1.0,3.0,4.0})
              If (feature 4 <= 12.700000286102295)
               If (feature 0 in {0.0})
                Predict: 0.0
               Else (feature 0 not in {0.0})
                If (feature 5 <= 12.75)
                 If (feature 3 <= 1.5)
                  If (feature 4 <= 0.5949999988079071)
                   If (feature 1 in {1.0,3.0,5.0,6.0})
                    Predict: 0.0
                   Else (feature 1 not in {1.0,3.0,5.0,6.0})
                    If (feature 4 <= 0.4749999940395355)
                     If (feature 5 <= 5.25)
                      Predict: 0.0
                     Else (feature 5 > 5.25)
                      Predict: 0.001221166860458809
                    Else (feature 4 > 0.4749999940395355)
                     If (feature 5 <= 4.25)
                      Predict: 0.0
                     Else (feature 5 > 4.25)
                      Predict: 8.884501413504843E-4
                  Else (feature 4 > 0.5949999988079071)
                   If (feature 5 <= 8.25)
                    If (feature 5 <= 5.75)
                     Predict: 0.0
                    Else (feature 5 > 5.75)
                     If (feature 4 <= 2.7649999856948853)
                      Predict: 1.6123760576432068E-4
                     Else (feature 4 > 2.7649999856948853)
                      Predict: 0.04186046400735544
                   Else (feature 5 > 8.25)
                    Predict: 0.0
                 Else (feature 3 > 1.5)
                  If (feature 3 <= 2.5)
                   If (feature 4 <= 0.5949999988079071)
                    If (feature 4 <= 0.4749999940395355)
                     Predict: 0.0
                    Else (feature 4 > 0.4749999940395355)
                     If (feature 5 <= 9.25)
                      Predict: 0.0
                     Else (feature 5 > 9.25)
                      Predict: 0.34999998410542804
                   Else (feature 4 > 0.5949999988079071)
                    If (feature 5 <= 8.75)
                     If (feature 4 <= 0.7950000166893005)
                      Predict: 3.9556962025316455E-4
                     Else (feature 4 > 0.7950000166893005)
                      Predict: 7.848320773919378E-5
                    Else (feature 5 > 8.75)
                     Predict: 0.0
                  Else (feature 3 > 2.5)
                   If (feature 5 <= 6.75)
                    If (feature 1 in {1.0,4.0})
                     Predict: 0.0
                    Else (feature 1 not in {1.0,4.0})
                     If (feature 5 <= 5.75)
                      Predict: 0.0
                     Else (feature 5 > 5.75)
                      Predict: 0.001201569361684368
                   Else (feature 5 > 6.75)
                    Predict: 0.0
                Else (feature 5 > 12.75)
                 If (feature 5 <= 20.75)
                  If (feature 5 <= 14.75)
                   If (feature 4 <= 1.8049999475479126)
                    If (feature 5 <= 13.75)
                     If (feature 5 <= 13.25)
                      Predict: 0.0
                     Else (feature 5 > 13.25)
                      Predict: 0.007623888182973317
                    Else (feature 5 > 13.75)
                     Predict: 0.0
                   Else (feature 4 > 1.8049999475479126)
                    If (feature 4 <= 3.625)
                     If (feature 4 <= 2.5749999284744263)
                      Predict: 5.362426211848061E-4
                     Else (feature 4 > 2.5749999284744263)
                      Predict: 0.0
                    Else (feature 4 > 3.625)
                     If (feature 4 <= 4.0950000286102295)
                      Predict: 0.001181102340118705
                     Else (feature 4 > 4.0950000286102295)
                      Predict: 0.0
                  Else (feature 5 > 14.75)
                   If (feature 4 <= 4.0950000286102295)
                    If (feature 3 <= 1.5)
                     If (feature 5 <= 15.75)
                      Predict: 0.0
                     Else (feature 5 > 15.75)
                      Predict: 0.0014655350701109615
                    Else (feature 3 > 1.5)
                     Predict: 0.0
                   Else (feature 4 > 4.0950000286102295)
                    Predict: 0.0
                 Else (feature 5 > 20.75)
                  Predict: 0.0
              Else (feature 4 > 12.700000286102295)
               If (feature 1 in {0.0,2.0})
                Predict: 0.0
               Else (feature 1 not in {0.0,2.0})
                If (feature 3 <= 1.5)
                 Predict: 8.5
                Else (feature 3 > 1.5)
                 Predict: 0.0
             Else (feature 2 not in {1.0,3.0,4.0})
              If (feature 0 in {0.0})
               If (feature 1 in {0.0,3.0})
                If (feature 3 <= 3.5)
                 If (feature 5 <= 12.25)
                  If (feature 5 <= 7.75)
                   If (feature 5 <= 5.75)
                    If (feature 2 in {2.0})
                     If (feature 5 <= 5.25)
                      Predict: 0.8672591599268052
                     Else (feature 5 > 5.25)
                      Predict: 1.070707080039111
                    Else (feature 2 not in {2.0})
                     If (feature 3 <= 1.5)
                      Predict: 1.0942474052490003
                     Else (feature 3 > 1.5)
                      Predict: 1.080579636187869
                   Else (feature 5 > 5.75)
                    If (feature 4 <= 1.3350000381469727)
                     If (feature 4 <= 1.0950000286102295)
                      Predict: 1.330977781414295
                     Else (feature 4 > 1.0950000286102295)
                      Predict: 1.3618516748054432
                    Else (feature 4 > 1.3350000381469727)
                     Predict: 1.4236676546550344
                  Else (feature 5 > 7.75)
                   If (feature 4 <= 2.215000033378601)
                    If (feature 5 <= 9.75)
                     If (feature 1 in {0.0})
                      Predict: 1.684913027016892
                     Else (feature 1 not in {0.0})
                      Predict: 1.899999976158142
                    Else (feature 5 > 9.75)
                     If (feature 4 <= 1.5049999952316284)
                      Predict: 2.029793757472228
                     Else (feature 4 > 1.5049999952316284)
                      Predict: 1.9889401308375827
                   Else (feature 4 > 2.215000033378601)
                    If (feature 4 <= 2.7649999856948853)
                     If (feature 5 <= 10.25)
                      Predict: 1.826720022970644
                     Else (feature 5 > 10.25)
                      Predict: 2.062994081223262
                    Else (feature 4 > 2.7649999856948853)
                     If (feature 4 <= 4.0950000286102295)
                      Predict: 2.1248015292439884
                     Else (feature 4 > 4.0950000286102295)
                      Predict: 9.0
                 Else (feature 5 > 12.25)
                  If (feature 2 in {2.0})
                   If (feature 4 <= 4.704999923706055)
                    If (feature 4 <= 4.0950000286102295)
                     Predict: 2.5793899714773603
                    Else (feature 4 > 4.0950000286102295)
                     Predict: 3.049680856948203
                   Else (feature 4 > 4.704999923706055)
                    If (feature 5 <= 20.75)
                     If (feature 4 <= 5.515000104904175)
                      Predict: 3.3192523428212817
                     Else (feature 4 > 5.515000104904175)
                      Predict: 3.517008506334745
                    Else (feature 5 > 20.75)
                     If (feature 4 <= 6.855000019073486)
                      Predict: 4.001401291531362
                     Else (feature 4 > 6.855000019073486)
                      Predict: 4.612261916909899
                  Else (feature 2 not in {2.0})
                   If (feature 5 <= 18.25)
                    If (feature 3 <= 2.5)
                     If (feature 5 <= 15.25)
                      Predict: 2.4789619753114223
                     Else (feature 5 > 15.25)
                      Predict: 2.9870010042250867
                    Else (feature 3 > 2.5)
                     If (feature 5 <= 15.75)
                      Predict: 2.496179523222373
                     Else (feature 5 > 15.75)
                      Predict: 3.1307795254958966
                   Else (feature 5 > 18.25)
                    If (feature 5 <= 20.75)
                     Predict: 3.4956710842848784
                    Else (feature 5 > 20.75)
                     If (feature 4 <= 6.855000019073486)
                      Predict: 3.914033086576562
                     Else (feature 4 > 6.855000019073486)
                      Predict: 4.245271024534371
                Else (feature 3 > 3.5)
                 If (feature 3 <= 5.5)
                  If (feature 5 <= 12.25)
                   If (feature 2 in {2.0})
                    If (feature 3 <= 4.5)
                     Predict: 1.4892771083367877
                    Else (feature 3 > 4.5)
                     Predict: 1.4200277009829259
                   Else (feature 2 not in {2.0})
                    If (feature 5 <= 7.75)
                     If (feature 3 <= 4.5)
                      Predict: 1.2671493111397083
                     Else (feature 3 > 4.5)
                      Predict: 1.2619366881757208
                    Else (feature 5 > 7.75)
                     Predict: 1.8598017935057558
                  Else (feature 5 > 12.25)
                   If (feature 5 <= 16.75)
                    If (feature 5 <= 14.25)
                     If (feature 5 <= 13.25)
                      Predict: 2.352012153736649
                     Else (feature 5 > 13.25)
                      Predict: 2.4903259697178055
                    Else (feature 5 > 14.25)
                     If (feature 4 <= 4.0950000286102295)
                      Predict: 2.722898017668372
                     Else (feature 4 > 4.0950000286102295)
                      Predict: 2.833566433052361
                   Else (feature 5 > 16.75)
                    If (feature 3 <= 4.5)
                     If (feature 5 <= 20.75)
                      Predict: 3.366045410974813
                     Else (feature 5 > 20.75)
                      Predict: 4.230404047645402
                    Else (feature 3 > 4.5)
                     If (feature 5 <= 20.75)
                      Predict: 3.3705701882795682
                     Else (feature 5 > 20.75)
                      Predict: 4.0339489801256985
                 Else (feature 3 > 5.5)
                  If (feature 4 <= 2.7649999856948853)
                   If (feature 5 <= 8.25)
                    If (feature 5 <= 5.75)
                     If (feature 4 <= 1.3350000381469727)
                      Predict: 1.08850128347418
                     Else (feature 4 > 1.3350000381469727)
                      Predict: 2.0400000333786013
                    Else (feature 5 > 5.75)
                     If (feature 2 in {2.0})
                      Predict: 1.2796721285809585
                     Else (feature 2 not in {2.0})
                      Predict: 1.404713584225799
                   Else (feature 5 > 8.25)
                    If (feature 4 <= 2.09499990940094)
                     Predict: 1.8782565547312495
                    Else (feature 4 > 2.09499990940094)
                     If (feature 5 <= 12.25)
                      Predict: 1.9597463214663833
                     Else (feature 5 > 12.25)
                      Predict: 2.6016171588215102
                  Else (feature 4 > 2.7649999856948853)
                   If (feature 4 <= 4.704999923706055)
                    If (feature 2 in {2.0})
                     If (feature 4 <= 4.0950000286102295)
                      Predict: 2.3561855566870307
                     Else (feature 4 > 4.0950000286102295)
                      Predict: 3.266000008583069
                    Else (feature 2 not in {2.0})
                     If (feature 4 <= 3.625)
                      Predict: 2.4618393616206853
                     Else (feature 4 > 3.625)
                      Predict: 2.899717037529601
                   Else (feature 4 > 4.704999923706055)
                    If (feature 4 <= 6.855000019073486)
                     If (feature 5 <= 18.25)
                      Predict: 2.988068719398919
                     Else (feature 5 > 18.25)
                      Predict: 3.7172788180630585
                    Else (feature 4 > 6.855000019073486)
                     Predict: 4.346206889275847
               Else (feature 1 not in {0.0,3.0})
                If (feature 5 <= 20.75)
                 If (feature 3 <= 1.5)
                  Predict: 1.5
                 Else (feature 3 > 1.5)
                  Predict: 6.300000190734863
                Else (feature 5 > 20.75)
                 If (feature 3 <= 1.5)
                  Predict: 78.0
                 Else (feature 3 > 1.5)
                  If (feature 4 <= 3.625)
                   Predict: 6.800000190734863
                  Else (feature 4 > 3.625)
                   Predict: 7.050000190734863
              Else (feature 0 not in {0.0})
               If (feature 5 <= 12.25)
                If (feature 5 <= 8.25)
                 If (feature 3 <= 3.5)
                  If (feature 4 <= 2.5749999284744263)
                   If (feature 1 in {3.0,2.0,5.0})
                    If (feature 4 <= 0.7950000166893005)
                     If (feature 1 in {3.0})
                      Predict: 0.7333333333333333
                     Else (feature 1 not in {3.0})
                      Predict: 0.9799999952316284
                    Else (feature 4 > 0.7950000166893005)
                     Predict: 1.875
                   Else (feature 1 not in {3.0,2.0,5.0})
                    If (feature 1 in {4.0,1.0})
                     If (feature 4 <= 0.5949999988079071)
                      Predict: 2.0333333015441895
                     Else (feature 4 > 0.5949999988079071)
                      Predict: 1.047142846243722
                    Else (feature 1 not in {4.0,1.0})
                     If (feature 3 <= 1.5)
                      Predict: 1.3781561769193391
                     Else (feature 3 > 1.5)
                      Predict: 1.3995485300195203
                  Else (feature 4 > 2.5749999284744263)
                   Predict: 6.3971428709598825
                 Else (feature 3 > 3.5)
                  If (feature 4 <= 1.0049999952316284)
                   Predict: 1.30311386381571
                  Else (feature 4 > 1.0049999952316284)
                   If (feature 4 <= 1.1950000524520874)
                    If (feature 3 <= 5.5)
                     If (feature 3 <= 4.5)
                      Predict: 1.5036051452415695
                     Else (feature 3 > 4.5)
                      Predict: 1.4000000026490953
                    Else (feature 3 > 5.5)
                     Predict: 1.0
                   Else (feature 4 > 1.1950000524520874)
                    If (feature 4 <= 2.09499990940094)
                     If (feature 4 <= 1.8049999475479126)
                      Predict: 1.6484381126053265
                     Else (feature 4 > 1.8049999475479126)
                      Predict: 1.893199965953827
                    Else (feature 4 > 2.09499990940094)
                     If (feature 4 <= 6.855000019073486)
                      Predict: 1.3423076776357799
                     Else (feature 4 > 6.855000019073486)
                      Predict: 1.600000023841858
                Else (feature 5 > 8.25)
                 Predict: 1.976090599779964
               Else (feature 5 > 12.25)
                If (feature 5 <= 16.75)
                 If (feature 1 in {5.0,3.0})
                  Predict: 1.953333314259847
                 Else (feature 1 not in {5.0,3.0})
                  If (feature 5 <= 14.25)
                   Predict: 2.484395947335571
                  Else (feature 5 > 14.25)
                   If (feature 5 <= 15.25)
                    Predict: 2.732694543128903
                   Else (feature 5 > 15.25)
                    If (feature 1 in {0.0})
                     If (feature 3 <= 5.5)
                      Predict: 2.9364147325522545
                     Else (feature 3 > 5.5)
                      Predict: 4.8125
                    Else (feature 1 not in {0.0})
                     Predict: 3.0
                Else (feature 5 > 16.75)
                 If (feature 1 in {0.0,5.0,1.0,2.0,3.0})
                  Predict: 3.630627151993512
                 Else (feature 1 not in {0.0,5.0,1.0,2.0,3.0})
                  Predict: 6.5
            Else (feature 5 > 24.25)
             If (feature 2 in {1.0,3.0,4.0})
              If (feature 4 <= 12.700000286102295)
               If (feature 4 <= 3.625)
                If (feature 4 <= 3.2649999856948853)
                 Predict: 0.0
                Else (feature 4 > 3.2649999856948853)
                 If (feature 2 in {3.0,4.0})
                  Predict: 0.0
                 Else (feature 2 not in {3.0,4.0})
                  If (feature 5 <= 29.25)
                   If (feature 3 <= 2.5)
                    Predict: 0.0
                   Else (feature 3 > 2.5)
                    Predict: 0.20769231136028582
                  Else (feature 5 > 29.25)
                   Predict: 0.0
               Else (feature 4 > 3.625)
                If (feature 1 in {0.0,5.0,2.0,3.0,4.0})
                 If (feature 3 <= 1.5)
                  If (feature 0 in {0.0})
                   Predict: 0.0
                  Else (feature 0 not in {0.0})
                   If (feature 5 <= 40.25)
                    If (feature 2 in {3.0,4.0})
                     Predict: 0.0
                    Else (feature 2 not in {3.0,4.0})
                     If (feature 5 <= 29.25)
                      Predict: 5.226708480334509E-4
                     Else (feature 5 > 29.25)
                      Predict: 7.766990291262136E-4
                   Else (feature 5 > 40.25)
                    Predict: 0.0
                 Else (feature 3 > 1.5)
                  Predict: 0.0
                Else (feature 1 not in {0.0,5.0,2.0,3.0,4.0})
                 If (feature 5 <= 40.25)
                  Predict: 7.800000190734863
                 Else (feature 5 > 40.25)
                  Predict: 0.0
              Else (feature 4 > 12.700000286102295)
               If (feature 2 in {4.0,1.0})
                If (feature 5 <= 72.75)
                 If (feature 0 in {0.0})
                  Predict: 0.0
                 Else (feature 0 not in {0.0})
                  If (feature 1 in {0.0,3.0,4.0,1.0})
                   If (feature 3 <= 1.5)
                    If (feature 4 <= 22.6850004196167)
                     If (feature 1 in {0.0,3.0,4.0})
                      Predict: 0.0
                     Else (feature 1 not in {0.0,3.0,4.0})
                      Predict: 0.006884295543135039
                    Else (feature 4 > 22.6850004196167)
                     Predict: 0.0
                   Else (feature 3 > 1.5)
                    Predict: 0.0
                  Else (feature 1 not in {0.0,3.0,4.0,1.0})
                   Predict: 0.10271605032461661
                Else (feature 5 > 72.75)
                 If (feature 3 <= 1.5)
                  Predict: 0.0
                 Else (feature 3 > 1.5)
                  If (feature 1 in {0.0,2.0,4.0})
                   Predict: 0.0
                  Else (feature 1 not in {0.0,2.0,4.0})
                   If (feature 0 in {0.0})
                    Predict: 0.0
                   Else (feature 0 not in {0.0})
                    Predict: 0.3225806451612903
               Else (feature 2 not in {4.0,1.0})
                If (feature 1 in {0.0,2.0,3.0})
                 Predict: 0.0
                Else (feature 1 not in {0.0,2.0,3.0})
                 If (feature 3 <= 3.5)
                  Predict: 0.0
                 Else (feature 3 > 3.5)
                  Predict: 6.666666666666667
             Else (feature 2 not in {1.0,3.0,4.0})
              If (feature 1 in {0.0})
               If (feature 5 <= 29.25)
                If (feature 2 in {2.0})
                 Predict: 4.765992650214364
                Else (feature 2 not in {2.0})
                 If (feature 3 <= 1.5)
                  If (feature 4 <= 9.095000267028809)
                   If (feature 4 <= 6.855000019073486)
                    Predict: 4.628108190893623
                   Else (feature 4 > 6.855000019073486)
                    Predict: 5.109736812340437
                  Else (feature 4 > 9.095000267028809)
                   Predict: 5.894786603555816
                 Else (feature 3 > 1.5)
                  Predict: 5.113387489034704
               Else (feature 5 > 29.25)
                If (feature 0 in {1.0})
                 Predict: 7.092535707769465
                Else (feature 0 not in {1.0})
                 If (feature 2 in {2.0})
                  If (feature 5 <= 72.75)
                   If (feature 4 <= 12.700000286102295)
                    Predict: 6.139399127387182
                   Else (feature 4 > 12.700000286102295)
                    If (feature 5 <= 40.25)
                     Predict: 5.728181882338091
                    Else (feature 5 > 40.25)
                     Predict: 8.758620697876502
                  Else (feature 5 > 72.75)
                   Predict: 30.0
                 Else (feature 2 not in {2.0})
                  If (feature 4 <= 12.700000286102295)
                   If (feature 5 <= 40.25)
                    Predict: 6.622233947063481
                   Else (feature 5 > 40.25)
                    Predict: 8.266169436368115
                  Else (feature 4 > 12.700000286102295)
                   If (feature 4 <= 22.6850004196167)
                    Predict: 8.300156427017264
                   Else (feature 4 > 22.6850004196167)
                    Predict: 13.641867407833237
              Else (feature 1 not in {0.0})
               If (feature 5 <= 72.75)
                If (feature 4 <= 12.700000286102295)
                 If (feature 1 in {4.0,3.0})
                  If (feature 5 <= 40.25)
                   If (feature 5 <= 29.25)
                    If (feature 0 in {1.0})
                     If (feature 1 in {4.0})
                      Predict: 3.1000000146719127
                     Else (feature 1 not in {4.0})
                      Predict: 3.5200000047683715
                    Else (feature 0 not in {1.0})
                     If (feature 4 <= 9.095000267028809)
                      Predict: 5.929999987284343
                     Else (feature 4 > 9.095000267028809)
                      Predict: 1.0
                   Else (feature 5 > 29.25)
                    If (feature 1 in {3.0})
                     If (feature 4 <= 1.4049999713897705)
                      Predict: 3.8662790808566783
                     Else (feature 4 > 1.4049999713897705)
                      Predict: 6.578249990940094
                    Else (feature 1 not in {3.0})
                     If (feature 3 <= 1.5)
                      Predict: 5.707916716734569
                     Else (feature 3 > 1.5)
                      Predict: 8.6
                  Else (feature 5 > 40.25)
                   If (feature 1 in {4.0})
                    If (feature 4 <= 6.855000019073486)
                     Predict: 11.5
                    Else (feature 4 > 6.855000019073486)
                     If (feature 3 <= 1.5)
                      Predict: 5.032631472537392
                     Else (feature 3 > 1.5)
                      Predict: 11.434999942779541
                   Else (feature 1 not in {4.0})
                    If (feature 0 in {0.0})
                     If (feature 3 <= 2.5)
                      Predict: 7.534161168196557
                     Else (feature 3 > 2.5)
                      Predict: 7.9190476054237005
                    Else (feature 0 not in {0.0})
                     Predict: 7.860710682205591
                 Else (feature 1 not in {4.0,3.0})
                  If (feature 3 <= 1.5)
                   If (feature 1 in {1.0})
                    If (feature 0 in {0.0})
                     If (feature 2 in {0.0})
                      Predict: 9.485638695122018
                     Else (feature 2 not in {0.0})
                      Predict: 10.399999618530273
                    Else (feature 0 not in {0.0})
                     Predict: 10.209748482835368
                   Else (feature 1 not in {1.0})
                    If (feature 2 in {2.0})
                     Predict: 10.199999809265137
                    Else (feature 2 not in {2.0})
                     If (feature 5 <= 40.25)
                      Predict: 11.458571468080793
                     Else (feature 5 > 40.25)
                      Predict: 10.256923137566982
                  Else (feature 3 > 1.5)
                   If (feature 5 <= 40.25)
                    If (feature 4 <= 5.515000104904175)
                     Predict: 8.172500133514404
                    Else (feature 4 > 5.515000104904175)
                     Predict: 5.0
                   Else (feature 5 > 40.25)
                    If (feature 2 in {2.0})
                     If (feature 4 <= 0.4749999940395355)
                      Predict: 10.399999618530273
                     Else (feature 4 > 0.4749999940395355)
                      Predict: 5.199999809265137
                    Else (feature 2 not in {2.0})
                     Predict: 10.55013706912733
                Else (feature 4 > 12.700000286102295)
                 If (feature 1 in {3.0,5.0,1.0,4.0})
                  If (feature 5 <= 40.25)
                   If (feature 0 in {0.0})
                    Predict: 0.0
                   Else (feature 0 not in {0.0})
                    Predict: 1.4
                  Else (feature 5 > 40.25)
                   If (feature 3 <= 4.5)
                    If (feature 1 in {5.0,1.0,3.0})
                     If (feature 0 in {0.0})
                      Predict: 9.843881849969373
                     Else (feature 0 not in {0.0})
                      Predict: 9.937772512619807
                    Else (feature 1 not in {5.0,1.0,3.0})
                     If (feature 4 <= 22.6850004196167)
                      Predict: 10.81983741124471
                     Else (feature 4 > 22.6850004196167)
                      Predict: 0.0
                   Else (feature 3 > 4.5)
                    If (feature 2 in {2.0})
                     If (feature 3 <= 5.5)
                      Predict: 8.867368246379652
                     Else (feature 3 > 5.5)
                      Predict: 8.728571278708321
                    Else (feature 2 not in {2.0})
                     If (feature 1 in {1.0})
                      Predict: 10.164691484895744
                     Else (feature 1 not in {1.0})
                      Predict: 12.015555487738716
                 Else (feature 1 not in {3.0,5.0,1.0,4.0})
                  If (feature 2 in {2.0})
                   Predict: 8.871428549289703
                  Else (feature 2 not in {2.0})
                   If (feature 0 in {0.0})
                    Predict: 12.782215757239374
                   Else (feature 0 not in {0.0})
                    Predict: 12.95225352370683
               Else (feature 5 > 72.75)
                If (feature 1 in {3.0})
                 If (feature 4 <= 12.700000286102295)
                  If (feature 3 <= 1.5)
                   If (feature 4 <= 2.215000033378601)
                    If (feature 4 <= 1.4049999713897705)
                     If (feature 4 <= 1.2649999856948853)
                      Predict: 7.538640763267007
                     Else (feature 4 > 1.2649999856948853)
                      Predict: 36.619998931884766
                    Else (feature 4 > 1.4049999713897705)
                     Predict: 0.0
                   Else (feature 4 > 2.215000033378601)
                    If (feature 4 <= 2.9950000047683716)
                     Predict: 24.900000381469727
                    Else (feature 4 > 2.9950000047683716)
                     Predict: 9.901343274472366
                  Else (feature 3 > 1.5)
                   If (feature 4 <= 0.925000011920929)
                    If (feature 3 <= 2.5)
                     If (feature 0 in {0.0})
                      Predict: 10.472222222222221
                     Else (feature 0 not in {0.0})
                      Predict: 14.577777650621202
                    Else (feature 3 > 2.5)
                     Predict: 13.0
                   Else (feature 4 > 0.925000011920929)
                    If (feature 0 in {1.0})
                     Predict: 3.9753571837209165
                    Else (feature 0 not in {1.0})
                     Predict: 15.563333511352539
                 Else (feature 4 > 12.700000286102295)
                  If (feature 0 in {1.0})
                   Predict: 14.524624608099103
                  Else (feature 0 not in {1.0})
                   If (feature 3 <= 3.5)
                    If (feature 4 <= 22.6850004196167)
                     If (feature 3 <= 2.5)
                      Predict: 13.6980770276143
                     Else (feature 3 > 2.5)
                      Predict: 19.696666717529297
                    Else (feature 4 > 22.6850004196167)
                     Predict: 16.412744252626286
                   Else (feature 3 > 3.5)
                    Predict: 14.374444749620226
                Else (feature 1 not in {3.0})
                 If (feature 4 <= 22.6850004196167)
                  If (feature 4 <= 12.700000286102295)
                   Predict: 24.700000762939453
                  Else (feature 4 > 12.700000286102295)
                   If (feature 3 <= 1.5)
                    Predict: 15.145885282690797
                   Else (feature 3 > 1.5)
                    If (feature 3 <= 2.5)
                     If (feature 1 in {2.0})
                      Predict: 12.50358021112136
                     Else (feature 1 not in {2.0})
                      Predict: 13.649999965320934
                    Else (feature 3 > 2.5)
                     If (feature 3 <= 3.5)
                      Predict: 17.760606187762637
                     Else (feature 3 > 3.5)
                      Predict: 14.239862990705934
                 Else (feature 4 > 22.6850004196167)
                  If (feature 1 in {2.0})
                   Predict: 13.870686205724875
                  Else (feature 1 not in {2.0})
                   If (feature 0 in {0.0})
                    Predict: 16.782051098652374
                   Else (feature 0 not in {0.0})
                    Predict: 19.368235283038196
          Tree 1:
            If (feature 2 in {1.0,3.0,4.0})
             If (feature 4 <= 12.700000286102295)
              If (feature 1 in {0.0,5.0,6.0,2.0,3.0,4.0})
               If (feature 0 in {0.0})
                Predict: 0.0
               Else (feature 0 not in {0.0})
                If (feature 1 in {5.0,6.0,2.0,3.0,4.0})
                 Predict: 0.0
                Else (feature 1 not in {5.0,6.0,2.0,3.0,4.0})
                 If (feature 5 <= 18.25)
                  If (feature 5 <= 13.25)
                   If (feature 4 <= 0.5949999988079071)
                    If (feature 3 <= 1.5)
                     If (feature 5 <= 4.25)
                      Predict: 0.0
                     Else (feature 5 > 4.25)
                      Predict: 4.0201004625764524E-4
                    Else (feature 3 > 1.5)
                     If (feature 4 <= 0.4749999940395355)
                      Predict: 0.0
                     Else (feature 4 > 0.4749999940395355)
                      Predict: 0.0027325958420723077
                   Else (feature 4 > 0.5949999988079071)
                    If (feature 4 <= 3.625)
                     If (feature 5 <= 8.75)
                      Predict: 6.751395828166836E-5
                     Else (feature 5 > 8.75)
                      Predict: 0.0
                    Else (feature 4 > 3.625)
                     If (feature 3 <= 1.5)
                      Predict: 6.321112515802782E-4
                     Else (feature 3 > 1.5)
                      Predict: 0.0
                  Else (feature 5 > 13.25)
                   If (feature 3 <= 3.5)
                    If (feature 2 in {3.0,4.0})
                     Predict: 0.0
                    Else (feature 2 not in {3.0,4.0})
                     If (feature 3 <= 1.5)
                      Predict: 3.7706090126758663E-4
                     Else (feature 3 > 1.5)
                      Predict: 0.0
                   Else (feature 3 > 3.5)
                    If (feature 5 <= 14.75)
                     Predict: 0.00271453581841731
                    Else (feature 5 > 14.75)
                     Predict: 0.0
                 Else (feature 5 > 18.25)
                  If (feature 4 <= 3.625)
                   If (feature 4 <= 3.2649999856948853)
                    Predict: 0.0
                   Else (feature 4 > 3.2649999856948853)
                    Predict: 0.021399839405848532
                  Else (feature 4 > 3.625)
                   If (feature 3 <= 3.5)
                    If (feature 3 <= 2.5)
                     If (feature 3 <= 1.5)
                      Predict: 3.4662044920589504E-4
                     Else (feature 3 > 1.5)
                      Predict: 3.124349093938763E-4
                    Else (feature 3 > 2.5)
                     Predict: 0.0
                   Else (feature 3 > 3.5)
                    If (feature 2 in {3.0,4.0})
                     Predict: 0.0
                    Else (feature 2 not in {3.0,4.0})
                     If (feature 3 <= 4.5)
                      Predict: 0.003167062549485352
                     Else (feature 3 > 4.5)
                      Predict: 0.0
              Else (feature 1 not in {0.0,5.0,6.0,2.0,3.0,4.0})
               If (feature 0 in {0.0})
                Predict: 0.0
               Else (feature 0 not in {0.0})
                If (feature 2 in {3.0,4.0})
                 Predict: 0.0
                Else (feature 2 not in {3.0,4.0})
                 If (feature 5 <= 40.25)
                  If (feature 5 <= 20.75)
                   Predict: 0.0
                  Else (feature 5 > 20.75)
                   Predict: 7.800000190734863
                 Else (feature 5 > 40.25)
                  Predict: 0.0
             Else (feature 4 > 12.700000286102295)
              If (feature 2 in {4.0,1.0})
               If (feature 1 in {4.0,0.0,1.0})
                If (feature 5 <= 72.75)
                 If (feature 0 in {0.0})
                  Predict: 0.0
                 Else (feature 0 not in {0.0})
                  If (feature 4 <= 22.6850004196167)
                   If (feature 3 <= 3.5)
                    If (feature 1 in {4.0,0.0})
                     If (feature 5 <= 40.25)
                      Predict: 0.0
                     Else (feature 5 > 40.25)
                      Predict: 0.004302747636461827
                    Else (feature 1 not in {4.0,0.0})
                     Predict: 0.006124503533571761
                   Else (feature 3 > 3.5)
                    If (feature 1 in {0.0,4.0})
                     Predict: 0.0
                    Else (feature 1 not in {0.0,4.0})
                     If (feature 2 in {4.0})
                      Predict: 0.0
                     Else (feature 2 not in {4.0})
                      Predict: 0.04109004888489348
                  Else (feature 4 > 22.6850004196167)
                   Predict: 0.0
                Else (feature 5 > 72.75)
                 If (feature 1 in {4.0})
                  Predict: 0.0
                 Else (feature 1 not in {4.0})
                  If (feature 3 <= 1.5)
                   If (feature 4 <= 22.6850004196167)
                    Predict: 0.0
                   Else (feature 4 > 22.6850004196167)
                    If (feature 0 in {0.0})
                     Predict: 0.0
                    Else (feature 0 not in {0.0})
                     Predict: 0.28846153846153844
                  Else (feature 3 > 1.5)
                   Predict: 0.0
               Else (feature 1 not in {4.0,0.0,1.0})
                If (feature 1 in {2.0})
                 If (feature 3 <= 2.5)
                  If (feature 5 <= 72.75)
                   If (feature 3 <= 1.5)
                    If (feature 4 <= 22.6850004196167)
                     If (feature 5 <= 20.75)
                      Predict: 0.0
                     Else (feature 5 > 20.75)
                      Predict: 0.06694915335057146
                    Else (feature 4 > 22.6850004196167)
                     Predict: 0.0
                   Else (feature 3 > 1.5)
                    If (feature 0 in {0.0})
                     Predict: 0.0
                    Else (feature 0 not in {0.0})
                     Predict: 0.23622047244094488
                  Else (feature 5 > 72.75)
                   Predict: 0.0
                 Else (feature 3 > 2.5)
                  Predict: 0.0
                Else (feature 1 not in {2.0})
                 If (feature 3 <= 1.5)
                  Predict: 0.0
                 Else (feature 3 > 1.5)
                  If (feature 0 in {0.0})
                   Predict: 0.0
                  Else (feature 0 not in {0.0})
                   If (feature 4 <= 22.6850004196167)
                    Predict: 0.75
                   Else (feature 4 > 22.6850004196167)
                    Predict: 0.0
              Else (feature 2 not in {4.0,1.0})
               If (feature 5 <= 40.25)
                Predict: 0.0
               Else (feature 5 > 40.25)
                If (feature 1 in {0.0,2.0,3.0,4.0})
                 Predict: 0.0
                Else (feature 1 not in {0.0,2.0,3.0,4.0})
                 Predict: 0.42105263157894735
            Else (feature 2 not in {1.0,3.0,4.0})
             If (feature 4 <= 6.855000019073486)
              If (feature 1 in {0.0,5.0,4.0})
               If (feature 1 in {0.0,5.0})
                If (feature 0 in {0.0})
                 If (feature 4 <= 2.7649999856948853)
                  If (feature 4 <= 1.5049999952316284)
                   If (feature 5 <= 7.25)
                    If (feature 5 <= 5.75)
                     If (feature 5 <= 4.525000095367432)
                      Predict: 1.0020008414731925
                     Else (feature 5 > 4.525000095367432)
                      Predict: 1.15045864901304
                    Else (feature 5 > 5.75)
                     If (feature 5 <= 6.25)
                      Predict: 1.2621283651735642
                     Else (feature 5 > 6.25)
                      Predict: 1.3743548931628613
                   Else (feature 5 > 7.25)
                    If (feature 2 in {2.0})
                     If (feature 5 <= 9.75)
                      Predict: 1.5138285768372672
                     Else (feature 5 > 9.75)
                      Predict: 2.0079861090828977
                    Else (feature 2 not in {2.0})
                     If (feature 5 <= 9.75)
                      Predict: 1.6063360900216812
                     Else (feature 5 > 9.75)
                      Predict: 2.1510405192478754
                  Else (feature 4 > 1.5049999952316284)
                   If (feature 4 <= 2.09499990940094)
                    If (feature 5 <= 10.75)
                     If (feature 5 <= 8.75)
                      Predict: 1.554917197170656
                     Else (feature 5 > 8.75)
                      Predict: 1.8166922351283001
                    Else (feature 5 > 10.75)
                     If (feature 3 <= 5.5)
                      Predict: 2.337158920920737
                     Else (feature 3 > 5.5)
                      Predict: 2.274364187651958
                   Else (feature 4 > 2.09499990940094)
                    If (feature 4 <= 2.3950001001358032)
                     If (feature 4 <= 2.215000033378601)
                      Predict: 1.9529296794468578
                     Else (feature 4 > 2.215000033378601)
                      Predict: 2.021372834482952
                    Else (feature 4 > 2.3950001001358032)
                     If (feature 2 in {2.0})
                      Predict: 2.0623413187324244
                     Else (feature 2 not in {2.0})
                      Predict: 2.1686595384964535
                 Else (feature 4 > 2.7649999856948853)
                  If (feature 5 <= 16.75)
                   If (feature 5 <= 13.75)
                    If (feature 4 <= 3.2649999856948853)
                     If (feature 5 <= 11.75)
                      Predict: 2.079552211359105
                     Else (feature 5 > 11.75)
                      Predict: 2.304190258245848
                    Else (feature 4 > 3.2649999856948853)
                     If (feature 3 <= 1.5)
                      Predict: 2.346835129267193
                     Else (feature 3 > 1.5)
                      Predict: 2.3689208593450006
                   Else (feature 5 > 13.75)
                    If (feature 5 <= 15.25)
                     If (feature 5 <= 14.25)
                      Predict: 2.5437982024645867
                     Else (feature 5 > 14.25)
                      Predict: 2.659631055539128
                    Else (feature 5 > 15.25)
                     If (feature 4 <= 3.625)
                      Predict: 2.846165316059
                     Else (feature 4 > 3.625)
                      Predict: 2.8942424089271652
                  Else (feature 5 > 16.75)
                   If (feature 3 <= 1.5)
                    If (feature 4 <= 5.515000104904175)
                     If (feature 2 in {2.0})
                      Predict: 3.337187506935813
                     Else (feature 2 not in {2.0})
                      Predict: 3.4049743559769055
                    Else (feature 4 > 5.515000104904175)
                     Predict: 3.8622495298074537
                   Else (feature 3 > 1.5)
                    If (feature 2 in {0.0})
                     If (feature 3 <= 5.5)
                      Predict: 3.58823555171601
                     Else (feature 3 > 5.5)
                      Predict: 3.604386948493053
                    Else (feature 2 not in {0.0})
                     Predict: 3.669304546406515
                Else (feature 0 not in {0.0})
                 Predict: 2.038488721629872
               Else (feature 1 not in {0.0,5.0})
                Predict: 4.060000008658359
              Else (feature 1 not in {0.0,5.0,4.0})
               If (feature 0 in {1.0})
                If (feature 1 in {3.0,2.0})
                 Predict: 6.892580268304356
                Else (feature 1 not in {3.0,2.0})
                 If (feature 3 <= 1.5)
                  If (feature 5 <= 40.25)
                   If (feature 4 <= 5.515000104904175)
                    Predict: 4.957272746346214
                   Else (feature 4 > 5.515000104904175)
                    Predict: 0.0
                  Else (feature 5 > 40.25)
                   If (feature 4 <= 1.1950000524520874)
                    Predict: 11.276021567724085
                   Else (feature 4 > 1.1950000524520874)
                    If (feature 4 <= 1.5049999952316284)
                     Predict: 6.041666666666667
                    Else (feature 4 > 1.5049999952316284)
                     If (feature 4 <= 1.6050000190734863)
                      Predict: 16.720000076293946
                     Else (feature 4 > 1.6050000190734863)
                      Predict: 10.075528524755462
                 Else (feature 3 > 1.5)
                  If (feature 3 <= 4.5)
                   If (feature 5 <= 15.25)
                    If (feature 5 <= 12.75)
                     If (feature 5 <= 10.25)
                      Predict: 2.5
                     Else (feature 5 > 10.25)
                      Predict: 2.4333333174387612
                    Else (feature 5 > 12.75)
                     Predict: 2.0
                   Else (feature 5 > 15.25)
                    If (feature 3 <= 2.5)
                     Predict: 8.97983058024261
                    Else (feature 3 > 2.5)
                     Predict: 10.856250047683716
                  Else (feature 3 > 4.5)
                   Predict: 11.550000190734863
               Else (feature 0 not in {1.0})
                If (feature 1 in {3.0})
                 If (feature 4 <= 4.0950000286102295)
                  Predict: 6.063990705060987
                 Else (feature 4 > 4.0950000286102295)
                  Predict: 7.766253213934812
                Else (feature 1 not in {3.0})
                 If (feature 5 <= 24.25)
                  Predict: 0.0
                 Else (feature 5 > 24.25)
                  If (feature 3 <= 5.5)
                   If (feature 4 <= 0.4749999940395355)
                    If (feature 3 <= 2.5)
                     Predict: 9.38347831229153
                    Else (feature 3 > 2.5)
                     If (feature 2 in {2.0})
                      Predict: 10.399999618530273
                     Else (feature 2 not in {2.0})
                      Predict: 10.601538511422964
                   Else (feature 4 > 0.4749999940395355)
                    If (feature 3 <= 4.5)
                     If (feature 4 <= 2.9950000047683716)
                      Predict: 10.686825464642236
                     Else (feature 4 > 2.9950000047683716)
                      Predict: 9.843191583105858
                    Else (feature 3 > 4.5)
                     Predict: 8.870000139872234
                  Else (feature 3 > 5.5)
                   If (feature 4 <= 1.1950000524520874)
                    If (feature 4 <= 0.4749999940395355)
                     Predict: 4.588000106811523
                    Else (feature 4 > 0.4749999940395355)
                     Predict: 5.0
                   Else (feature 4 > 1.1950000524520874)
                    Predict: 8.690000004238552
             Else (feature 4 > 6.855000019073486)
              If (feature 1 in {0.0})
               If (feature 5 <= 29.25)
                If (feature 5 <= 24.25)
                 If (feature 4 <= 12.700000286102295)
                  If (feature 5 <= 20.75)
                   If (feature 3 <= 4.5)
                    If (feature 5 <= 4.25)
                     Predict: 9.502499997615814
                    Else (feature 5 > 4.25)
                     If (feature 4 <= 9.095000267028809)
                      Predict: 3.490053741521733
                     Else (feature 4 > 9.095000267028809)
                      Predict: 2.706315819566187
                   Else (feature 3 > 4.5)
                    If (feature 3 <= 5.5)
                     Predict: 2.90838707647016
                    Else (feature 3 > 5.5)
                     Predict: 3.5709090449593286
                  Else (feature 5 > 20.75)
                   If (feature 3 <= 2.5)
                    If (feature 2 in {0.0})
                     Predict: 4.3256558679938415
                    Else (feature 2 not in {0.0})
                     Predict: 4.639636373519897
                   Else (feature 3 > 2.5)
                    Predict: 4.422383116593705
                 Else (feature 4 > 12.700000286102295)
                  Predict: 15.732222212685478
                Else (feature 5 > 24.25)
                 If (feature 4 <= 9.095000267028809)
                  If (feature 2 in {2.0})
                   Predict: 4.482749985158444
                  Else (feature 2 not in {2.0})
                   Predict: 5.0696129001382895
                 Else (feature 4 > 9.095000267028809)
                  If (feature 2 in {2.0})
                   If (feature 3 <= 3.5)
                    Predict: 5.138113246773774
                   Else (feature 3 > 3.5)
                    If (feature 3 <= 4.5)
                     Predict: 7.125
                    Else (feature 3 > 4.5)
                     If (feature 3 <= 5.5)
                      Predict: 6.146000003814697
                     Else (feature 3 > 5.5)
                      Predict: 5.576666673024495
                  Else (feature 2 not in {2.0})
                   If (feature 3 <= 5.5)
                    If (feature 4 <= 12.700000286102295)
                     If (feature 0 in {0.0})
                      Predict: 5.778268448217999
                     Else (feature 0 not in {0.0})
                      Predict: 5.893438140667234
                    Else (feature 4 > 12.700000286102295)
                     Predict: 5.599999904632568
                   Else (feature 3 > 5.5)
                    Predict: 5.643672916588771
               Else (feature 5 > 29.25)
                If (feature 5 <= 40.25)
                 If (feature 4 <= 9.095000267028809)
                  Predict: 5.962104745970343
                 Else (feature 4 > 9.095000267028809)
                  If (feature 3 <= 2.5)
                   If (feature 2 in {2.0})
                    Predict: 6.095136998450919
                   Else (feature 2 not in {2.0})
                    Predict: 6.8102424300669
                  Else (feature 3 > 2.5)
                   If (feature 2 in {2.0})
                    Predict: 5.9381632512929485
                   Else (feature 2 not in {2.0})
                    If (feature 3 <= 3.5)
                     If (feature 0 in {1.0})
                      Predict: 6.503041234544296
                     Else (feature 0 not in {1.0})
                      Predict: 6.54746464892
                    Else (feature 3 > 3.5)
                     If (feature 3 <= 5.5)
                      Predict: 6.7364598506622535
                     Else (feature 3 > 5.5)
                      Predict: 6.853111104641892
                Else (feature 5 > 40.25)
                 If (feature 4 <= 22.6850004196167)
                  If (feature 0 in {1.0})
                   If (feature 5 <= 72.75)
                    Predict: 8.326659051008436
                   Else (feature 5 > 72.75)
                    Predict: 16.552000045776367
                  Else (feature 0 not in {1.0})
                   If (feature 2 in {0.0})
                    If (feature 4 <= 9.095000267028809)
                     Predict: 7.0981817967963945
                    Else (feature 4 > 9.095000267028809)
                     If (feature 5 <= 72.75)
                      Predict: 8.444857087862369
                     Else (feature 5 > 72.75)
                      Predict: 7.5133334795633955
                   Else (feature 2 not in {0.0})
                    Predict: 8.833559278714455
                 Else (feature 4 > 22.6850004196167)
                  If (feature 3 <= 4.5)
                   If (feature 2 in {0.0})
                    If (feature 0 in {0.0})
                     If (feature 5 <= 72.75)
                      Predict: 10.985598044888825
                     Else (feature 5 > 72.75)
                      Predict: 16.703052651254755
                    Else (feature 0 not in {0.0})
                     Predict: 13.685059782993271
                   Else (feature 2 not in {0.0})
                    If (feature 3 <= 1.5)
                     Predict: 10.697499990463257
                    Else (feature 3 > 1.5)
                     Predict: 30.0
                  Else (feature 3 > 4.5)
                   If (feature 3 <= 5.5)
                    Predict: 23.337755067007883
                   Else (feature 3 > 5.5)
                    If (feature 5 <= 72.75)
                     Predict: 13.247894588269686
                    Else (feature 5 > 72.75)
                     Predict: 14.938461743868315
              Else (feature 1 not in {0.0})
               If (feature 4 <= 22.6850004196167)
                If (feature 5 <= 72.75)
                 If (feature 1 in {4.0,3.0,1.0,5.0})
                  If (feature 5 <= 40.25)
                   If (feature 4 <= 9.095000267028809)
                    If (feature 5 <= 24.25)
                     If (feature 0 in {1.0})
                      Predict: 4.550000190734863
                     Else (feature 0 not in {1.0})
                      Predict: 78.0
                    Else (feature 5 > 24.25)
                     If (feature 2 in {2.0})
                      Predict: 5.360000133514404
                     Else (feature 2 not in {2.0})
                      Predict: 5.378765472383411
                   Else (feature 4 > 9.095000267028809)
                    If (feature 4 <= 12.700000286102295)
                     If (feature 3 <= 2.5)
                      Predict: 4.912666738033295
                     Else (feature 3 > 2.5)
                      Predict: 0.0
                    Else (feature 4 > 12.700000286102295)
                     If (feature 3 <= 2.5)
                      Predict: 3.4209091013128106
                     Else (feature 3 > 2.5)
                      Predict: 0.5714285714285714
                  Else (feature 5 > 40.25)
                   If (feature 2 in {2.0})
                    Predict: 8.67522917756247
                   Else (feature 2 not in {2.0})
                    If (feature 3 <= 4.5)
                     If (feature 4 <= 12.700000286102295)
                      Predict: 9.472063548035091
                     Else (feature 4 > 12.700000286102295)
                      Predict: 9.939695658583425
                    Else (feature 3 > 4.5)
                     If (feature 1 in {3.0,1.0})
                      Predict: 10.098971078714026
                     Else (feature 1 not in {3.0,1.0})
                      Predict: 14.46363639831543
                 Else (feature 1 not in {4.0,3.0,1.0,5.0})
                  If (feature 3 <= 2.5)
                   Predict: 12.738350197213657
                  Else (feature 3 > 2.5)
                   If (feature 2 in {2.0})
                    Predict: 6.189999997615814
                   Else (feature 2 not in {2.0})
                    If (feature 3 <= 5.5)
                     If (feature 3 <= 4.5)
                      Predict: 13.98496065816776
                     Else (feature 3 > 4.5)
                      Predict: 13.266956474470055
                    Else (feature 3 > 5.5)
                     Predict: 12.217469922031265
                Else (feature 5 > 72.75)
                 If (feature 4 <= 12.700000286102295)
                  If (feature 1 in {3.0})
                   Predict: 11.309433982408834
                  Else (feature 1 not in {3.0})
                   Predict: 24.700000762939453
                 Else (feature 4 > 12.700000286102295)
                  If (feature 1 in {4.0,3.0})
                   Predict: 13.504989744494889
                  Else (feature 1 not in {4.0,3.0})
                   If (feature 3 <= 1.5)
                    If (feature 0 in {0.0})
                     Predict: 14.746363601299247
                    Else (feature 0 not in {0.0})
                     Predict: 15.970588252123665
                   Else (feature 3 > 1.5)
                    If (feature 3 <= 2.5)
                     Predict: 12.784246608002546
                    Else (feature 3 > 2.5)
                     Predict: 15.164252906010068
               Else (feature 4 > 22.6850004196167)
                If (feature 3 <= 1.5)
                 If (feature 1 in {1.0})
                  Predict: 10.551529178561056
                 Else (feature 1 not in {1.0})
                  Predict: 17.836814335574896
                Else (feature 3 > 1.5)
                 If (feature 1 in {1.0})
                  Predict: 10.511585485644456
                 Else (feature 1 not in {1.0})
                  If (feature 3 <= 3.5)
                   If (feature 5 <= 72.75)
                    If (feature 1 in {4.0})
                     Predict: 14.5
                    Else (feature 1 not in {4.0})
                     Predict: 18.84999942779541
                   Else (feature 5 > 72.75)
                    If (feature 3 <= 2.5)
                     If (feature 1 in {2.0,3.0})
                      Predict: 14.882812589406967
                     Else (feature 1 not in {2.0,3.0})
                      Predict: 21.15333340962728
                    Else (feature 3 > 2.5)
                     Predict: 15.500294124378877
                  Else (feature 3 > 3.5)
                   If (feature 1 in {3.0})
                    If (feature 0 in {0.0})
                     Predict: 10.0
                    Else (feature 0 not in {0.0})
                     If (feature 3 <= 4.5)
                      Predict: 17.5
                     Else (feature 3 > 4.5)
                      Predict: 15.0
                   Else (feature 1 not in {3.0})
                    Predict: 18.713809512910387
            ..............................................
            ..............................................
            ...............................................
    

Prediction and Evaluation on test data:

        predictions_Regression = rfModel_Regression.predict(indexedTESTRegression.map(lambda x: x.features))
        predictionAndLabels_Regression = indexedTESTRegression.map(lambda lp: lp.label).zip(predictions_Regression)


        metrics_Regression = RegressionMetrics(predictionAndLabels_Regression)
        print("RMSE = %s" % metrics_Regression.rootMeanSquaredError)
        print("R-sqr = %s" % metrics_Regression.r2)
    

Output:

        RMSE = 1.2202493008785438
        R-sqr = 0.5759095465267167
    

However, it is still possible to increase the accuracy by performing Cross-Validation and hyperparameter tunning.

• References

• Apache Spark™

• Big Data Analytics with Java by Rajat Mehta.

• The home for Microsoft documentation and learning for developers and technology professionals. docs.microsoft.com.

• And Others.

Vanecus Data Blog
                    
Image:freepik
© 2021 - VanellusIndicus