PySpark serves as the powerful Python API for Apache Spark, seamlessly bridging the gap between
Python's simplicity and Spark's distributed processing prowess. It empowers data scientists
and engineers to write robust, large-scale data applications using familiar Python libraries
while leveraging the full might of a Spark cluster.
Beyond application development, PySpark offers an interactive shell for real-time, exploratory
data analysis across distributed datasets, making it an indispensable tool for modern data
exploration.
The framework provides comprehensive support for the entire Spark ecosystem, including:
This project outlines the development of predictive Classification and regression models using a modern big data stack We will leverage the scalability of Hadoop 3.3.0 for distributed storage and the high-performance processing engine of Spark 3.1.1 for in-memory analytics. Our goal is to build and evaluate both a Random forest Classification and a Random forest Regression model using a classic dataset: the yellow_tripdata_2014-08.csv file, which contains records of New York City yellow taxi trips for August 2014.

Image from Pixabay
If you are interested with the data you can collect it from here Click the link. 2014 Yellow NYC taxi trip Data . For Classification method, our task is to implement a model to predict for a given taxi trip, if a tip will be paid or not for a trip. And for Regression method, our task is to implement a model to predict for a given taxi trip, what is the expected tip amount for a trip. Our hadoop environment is three nodes cluster, one namenode and two datanodes.
Here we will use Python 3.8.5 version and PyCharm edition 2020.3 for this module.
Import libraries
The Spark, ML, and other libraries we'll need by using the following lines of code
import os
from pyspark.sql import SparkSession
from pyspark.ml.feature import StringIndexer, OneHotEncoder
from pyspark.mllib.evaluation import BinaryClassificationMetrics, RegressionMetrics
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import RandomForest
from pyspark.sql.types import *
import matplotlib.pyplot as plt
import numpy as np
os.environ['SPARK_HOME'] = '/usr/local/spark'
SPARK_HOME = os.environ['SPARK_HOME']
Data Exploration
At first we ingest the data that we want to analyze. The data is brought from external sources or systems where it resides into data exploration and modeling environment. The data exploration and modeling environment is Spark. Firstly we need to make sure our source of data that is our dataset files are present in HDFS where we expect to read them from our spark jobs. To put the files in HDFS first bring the files to the operating system i.e. Linux in our case and from Linux we copy them to HDFS using the following command
We will first load dataset using Apache Spark and see the total numbers of rows, header, and first 5 rows of our dataset. For these our lines of codes are shown bellow
spark = SparkSession \
.builder\
.master("local[*]")\
.appName("NycApp")\
.getOrCreate()
dataRaw = spark.sparkContext\
.textFile("hdfs://master:9000/data/data/yellow_tripdata_2014-08.csv")
header = dataRaw.first();
Now we print the header of the dataset.
print(header)
Output:
vendor_id, pickup_datetime, dropoff_datetime, passenger_count, trip_distance, pickup_longitude, pickup_latitude, rate_code, store_and_fwd_flag, dropoff_longitude, dropoff_latitude, payment_type, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount
Total numbers of records of Dataset
print("Total Records : " + str(dataRaw.count()))
Output:
Total Records : 12688879
The code of first 5 rows of the dataset is as follows
for x in dataRaw.take(5):
print(x)
OutPut:
vendor_id, pickup_datetime, dropoff_datetime, passenger_count, trip_distance, pickup_longitude, pickup_latitude, rate_code, store_and_fwd_flag, dropoff_longitude, dropoff_latitude, payment_type, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount
CMT,2014-08-16 14:58:49,2014-08-16 15:15:59,1,2.7000000000000002,-73.946537000000006,40.776812999999997,1,N,-73.976192999999995,40.755625000000002,CSH,14,0,0.5,0,0,14.5
CMT,2014-08-16 08:10:48,2014-08-16 08:58:16,3,20.399999999999999,-73.776857000000007,40.645099000000002,1,Y,-73.916248999999993,40.837356999999997,CSH,58.5,0,0.5,0,5.3300000000000001,64.329999999999998
CMT,2014-08-16 09:44:07,2014-08-16 09:54:37,1,2.1000000000000001,-73.986585000000005,40.725847999999999,1,N,-73.977157000000005,40.751961000000001,CSH,9.5,0,0.5,0,0,10
Now we can see the dataset rows printed above, We loaded the dataset from an HDFS location and stored it in an RDD of strings. Fortunately, the dataset is relatively clean and has one row per data item but it contains a empty row. Next, we remove the header, delete the empty row and again print 5 rows from the dataset, we run the codes following
dataLines = dataRaw.filter(lambda x: x != header).filter(lambda y: y !="")
for y in dataLines.take(5):
print(y)
Output:
CMT,2014-08-16 14:58:49,2014-08-16 15:15:59,1,2.7000000000000002,-73.946537000000006,40.776812999999997,1,N,-73.976192999999995,40.755625000000002,CSH,14,0,0.5,0,0,14.5
CMT,2014-08-16 08:10:48,2014-08-16 08:58:16,3,20.399999999999999,-73.776857000000007,40.645099000000002,1,Y,-73.916248999999993,40.837356999999997,CSH,58.5,0,0.5,0,5.3300000000000001,64.329999999999998
CMT,2014-08-16 09:44:07,2014-08-16 09:54:37,1,2.1000000000000001,-73.986585000000005,40.725847999999999,1,N,-73.977157000000005,40.751961000000001,CSH,9.5,0,0.5,0,0,10
CMT,2014-08-16 10:46:13,2014-08-16 10:51:25,1,1.3,-73.976290000000006,40.765231,1,N,-73.961484999999996,40.777889000000002,CSH,6,0,0.5,0,0,6.5
CMT,2014-08-16 09:27:23,2014-08-16 09:39:37,2,1.7,-73.995248000000004,40.754646000000001,1,Y,-73.995902999999998,40.769201000000002,CSH,10.5,0,0.5,0,0,11
As seen, the rows of dataset are fine, now we generate schema based on the column strings of the header of the dataset, cast variables according to the schema, and create an initial dataframe and lastly see 10 rows of the dataset, for these we run the following codes
fields = [StructField(field_name, StringType(), True) for field_name in header.split(', ')]
fields[0].dataType = StringType() #vendor_id
fields[1].dataType = StringType() #pickup_datetime
fields[2].dataType = StringType() #dropoff_datetime
fields[3].dataType = FloatType() #passenger_count
fields[4].dataType = FloatType() # trip_distance
fields[5].dataType = FloatType() # pickup_longitude
fields[6].dataType = FloatType() # pickup_latitude
fields[7].dataType = FloatType() # rate_code
fields[8].dataType = StringType() # store_and_fwd_flag
fields[9].dataType = FloatType() # dropoff_longitude
fields[10].dataType = FloatType() # dropoff_latitude
fields[11].dataType = StringType() # payment_type
fields[12].dataType = FloatType() # fare_amount
fields[13].dataType = FloatType() # surcharge
fields[14].dataType = FloatType() # mta_tax
fields[15].dataType = FloatType() # tip_amount
fields[16].dataType = FloatType() # tolls_amount
fields[17].dataType = FloatType() # total_amount
schema = StructType(fields)
rowRDD = dataLines.map(lambda x: x.split(",")) \
.map(lambda r: (r[0], r[1], r[2], float(r[3]), float(r[4]), float(r[5]), float(r[6]), float(r[7]), r[8],
float(r[9]), float(r[10]), r[11], float(r[12]), float(r[13]), float(r[14]), float(r[15]),
float(r[16]), float(r[17])
))
dataDF = spark.createDataFrame(rowRDD, schema)
dataDF.show(10)
Output:
+---------+-------------------+-------------------+---------------+-------------+----------------+---------------+---------+------------------+-----------------+----------------+------------+-----------+---------+-------+----------+------------+------------+
|vendor_id| pickup_datetime| dropoff_datetime|passenger_count|trip_distance|pickup_longitude|pickup_latitude|rate_code|store_and_fwd_flag|dropoff_longitude|dropoff_latitude|payment_type|fare_amount|surcharge|mta_tax|tip_amount|tolls_amount|total_amount|
+---------+-------------------+-------------------+---------------+-------------+----------------+---------------+---------+------------------+-----------------+----------------+------------+-----------+---------+-------+----------+------------+------------+
| CMT|2014-08-16 14:58:49|2014-08-16 15:15:59| 1.0| 2.7| -73.946537| 40.776813| 1.0| N| -73.976193| 40.755625| CSH| 14.0| 0.0| 0.5| 0.0| 0.0| 14.5|
| CMT|2014-08-16 08:10:48|2014-08-16 08:58:16| 3.0| 20.4| -73.776857| 40.645099| 1.0| Y| -73.916249| 40.837357| CSH| 58.5| 0.0| 0.5| 0.0| 5.33| 64.33|
| CMT|2014-08-16 09:44:07|2014-08-16 09:54:37| 1.0| 2.1| -73.986585| 40.725848| 1.0| N| -73.977157| 40.751961| CSH| 9.5| 0.0| 0.5| 0.0| 0.0| 10.0|
| CMT|2014-08-16 10:46:13|2014-08-16 10:51:25| 1.0| 1.3| -73.97629| 40.765231| 1.0| N| -73.961485| 40.777889| CSH| 6.0| 0.0| 0.5| 0.0| 0.0| 6.5|
| CMT|2014-08-16 09:27:23|2014-08-16 09:39:37| 2.0| 1.7| -73.995248| 40.754646| 1.0| Y| -73.995903| 40.769201| CSH| 10.5| 0.0| 0.5| 0.0| 0.0| 11.0|
| CMT|2014-08-16 14:14:16|2014-08-16 14:25:33| 2.0| 1.7| -73.991535| 40.759863| 1.0| N| -74.005722| 40.737558| CSH| 10.0| 0.0| 0.5| 0.0| 0.0| 10.5|
| CMT|2014-08-16 15:55:16|2014-08-16 16:00:10| 1.0| 1.0| -73.972307| 40.794076| 1.0| N| -73.963865| 40.807858| CSH| 6.0| 0.0| 0.5| 0.0| 0.0| 6.5|
| CMT|2014-08-16 14:08:29|2014-08-16 14:32:03| 1.0| 9.2| -73.967338| 40.766009| 1.0| N| -73.872972| 40.774487| CSH| 28.5| 0.0| 0.5| 0.0| 0.0| 29.0|
| CMT|2014-08-16 11:11:21|2014-08-16 11:23:48| 1.0| 2.6| -73.973775| 40.794591| 1.0| N| -73.970561| 40.768086| CSH| 11.5| 0.0| 0.5| 0.0| 0.0| 12.0|
| CMT|2014-08-16 07:44:56|2014-08-16 07:49:26| 1.0| 1.4| -73.98636| 40.737913| 1.0| N| -73.977117| 40.751126| CSH| 6.0| 0.0| 0.5| 0.0| 0.0| 6.5|
+---------+-------------------+-------------------+---------------+-------------+----------------+---------------+---------+------------------+-----------------+----------------+------------+-----------+---------+-------+----------+------------+------------+
only showing top 10 rows
We are not interested with all the columns of the dataset, we create a cleaned data frame by droping unwanted columns and filtering unwanted values, cache and materialize the data frame in memory, and register the cleaned data frame as a temporary table in sqlcontext.
dataDF_cleaned = dataDF.drop('store_and_fwd_flag').drop('pickup_datetime').drop('dropoff_datetime').drop('pickup_longitude')\
.drop('pickup_latitude').drop('dropoff_longitude').drop('dropoff_latitude').drop('surcharge')\
.drop('mta_tax').drop('tolls_amount').drop('total_amount')\
.filter("passenger_count > 0 AND fare_amount >= 1 AND trip_distance > 0")
dataDF_cleaned.createOrReplaceTempView("tempView")
dataDF_cleaned.cache
Data Visualization
In this section, we examine the data by using SQL queries and import the results into a data frame to plot the target variables and prospective features for visual inspection by using the automatic visualization.
Counts of trips by passenger
plotDF1 = spark.sql("Select passenger_count, COUNT(*) AS trip_counts " +
"FROM tempView " +
"WHERE passenger_count > 0 and passenger_count < 7 " +
"GROUP BY passenger_count Order by passenger_count")
plotDF1P = plotDF1.toPandas()
x_labels = plotDF1P['passenger_count'].values
fig = plotDF1P['trip_counts'].plot(kind='bar', facecolor='lightblue')
fig.set_xticklabels(x_labels)
fig.set_title('Counts of trips by Passenger count')
fig.set_xlabel('Passenger count in Trips')
fig.set_ylabel('Trip Counts')
plt.show()
Output:
SQL Query and Data frame:
plotDF2 = spark.sql("SELECT fare_amount, passenger_count, tip_amount " +
"FROM tempView " +
"WHERE passenger_count > 0 AND passenger_count < 7 AND " +
"fare_amount > 0 AND fare_amount < 200 AND payment_type in ('CSH', 'CRD') AND " +
"tip_amount > 0 AND tip_amount < 25")
#plotDF2.show()
plotDF2P = plotDF2.toPandas()
Histogram of tip amount
ax1 = plotDF2P[['tip_amount']].plot(kind='hist', bins=25, facecolor='lightblue')
ax1.set_title('Tip amount distribution')
ax1.set_xlabel('Tip Amount ($)')
ax1.set_ylabel('Counts')
plt.suptitle('')
plt.show()
Output:
Relationship between tip amount and Passenger Count
ax2 = plotDF2P.boxplot(column=['tip_amount'], by=['passenger_count'])
ax2.set_title('Tip amount by Passenger count')
ax2.set_xlabel('Passenger count')
ax2.set_ylabel('Tip Amount ($)')
plt.suptitle('')
plt.show()
Output:
Relationship between tip amount and Fare Amount
ax = plotDF2P.plot(kind='scatter', x= 'fare_amount', y = 'tip_amount', c='blue', alpha = 0.01, s=2*(plotDF2P.passenger_count))
ax.set_title('Tip amount by Fare amount')
ax.set_xlabel('Fare Amount ($)')
ax.set_ylabel('Tip Amount ($)')
plt.axis([-2, 80, -2, 20])
plt.show()
Output:
Feature engineering, transformation and data preparation for modeling
Next, we create a new feature tipped, if the tip_amount is non-zero, then this returns 1, else 0 in our case. We build a classifier with this target value later on.
sqlQuery = "SELECT *, CASE WHEN tip_amount > 0 THEN CAST(1.0 as Double) ELSE CAST(0.0 as Double) END AS tipped FROM tempView"
data_NewFeature = spark.sql(sqlQuery)
data_NewFeature.show()
Output:
+---------+---------------+-------------+---------+------------+-----------+----------+------+
|vendor_id|passenger_count|trip_distance|rate_code|payment_type|fare_amount|tip_amount|tipped|
+---------+---------------+-------------+---------+------------+-----------+----------+------+
| CMT| 1.0| 2.7| 1.0| CSH| 14.0| 0.0| 0.0|
| CMT| 3.0| 20.4| 1.0| CSH| 58.5| 0.0| 0.0|
| CMT| 1.0| 2.1| 1.0| CSH| 9.5| 0.0| 0.0|
| CMT| 1.0| 1.3| 1.0| CSH| 6.0| 0.0| 0.0|
| CMT| 2.0| 1.7| 1.0| CSH| 10.5| 0.0| 0.0|
| CMT| 2.0| 1.7| 1.0| CSH| 10.0| 0.0| 0.0|
| CMT| 1.0| 1.0| 1.0| CSH| 6.0| 0.0| 0.0|
| CMT| 1.0| 9.2| 1.0| CSH| 28.5| 0.0| 0.0|
| CMT| 1.0| 2.6| 1.0| CSH| 11.5| 0.0| 0.0|
| CMT| 1.0| 1.4| 1.0| CSH| 6.0| 0.0| 0.0|
| CMT| 4.0| 3.2| 1.0| CSH| 13.0| 0.0| 0.0|
| CMT| 1.0| 7.8| 1.0| CSH| 25.0| 0.0| 0.0|
| CMT| 1.0| 1.1| 1.0| CSH| 5.5| 0.0| 0.0|
| CMT| 1.0| 3.3| 1.0| CSH| 15.5| 0.0| 0.0|
| CMT| 1.0| 5.3| 1.0| CSH| 19.5| 0.0| 0.0|
| CMT| 1.0| 6.2| 1.0| CSH| 19.5| 0.0| 0.0|
| CMT| 1.0| 15.6| 2.0| CSH| 52.0| 0.0| 0.0|
| CMT| 2.0| 0.9| 1.0| CSH| 6.0| 0.0| 0.0|
| CMT| 1.0| 1.4| 1.0| CSH| 9.0| 0.0| 0.0|
| CMT| 2.0| 1.2| 1.0| CSH| 7.0| 0.0| 0.0|
+---------+---------------+-------------+---------+------------+-----------+----------+------+
only showing top 20 rows
Now, we figure out the average, minimum, maximum, etc. of columns, as this give us general idea about the range of values and other statistics. Apache Spark SQL provides us with a handy describe method that will help us to calculate these values.
data_NewFeature.describe("passenger_count","trip_distance","rate_code","fare_amount","tip_amount").show()
Output:
+-------+------------------+------------------+-------------------+------------------+------------------+
|summary| passenger_count| trip_distance| rate_code| fare_amount| tip_amount|
+-------+------------------+------------------+-------------------+------------------+------------------+
| count| 12612051| 12612051| 12612051| 12612051| 12612051|
| mean| 1.711963105762893| 3.081084886986297| 1.0316361708337525|12.782634287634899|1.4699661752082924|
| stddev|1.3616351834672402|3.6140504506213844|0.28838467968053266|10.461028944648847| 2.272277866399201|
| min| 1.0| 0.01| 0.0| 2.5| 0.0|
| max| 9.0| 100.0| 221.0| 500.0| 200.0|
+-------+------------------+------------------+-------------------+------------------+------------------+
For modeling function from ML and MLlib, requires to prepare target and features by using a variety of techniques, such as indexing, one-hot encoding, and vectorization etc. Here are the procedures to follow in this section.
The dataset contains categorical fields: vendor_id, rate_code, and payment_type. Therefore, we need to convert these into indexed fields, because our models are mathematical and understand only numerical values.To do this, for indexing, we use StringIndexer() functions. Here is the code to index categorical features.
vendor_idIndexer = StringIndexer()\
.setInputCol("vendor_id")\
.setOutputCol("vendor_idIndex")
Indexedvendor_id = vendor_idIndexer.fit(data_NewFeature).transform(data_NewFeature)
#Indexedvendor_id.show()
rate_codeIndexer = StringIndexer()\
.setInputCol("rate_code")\
.setOutputCol("rate_codeIndex")
Indexedrate_code = rate_codeIndexer.fit(Indexedvendor_id).transform(Indexedvendor_id)
payment_typeIndexer = StringIndexer()\
.setInputCol("payment_type")\
.setOutputCol("payment_typeIndex")
IndexedFinal = payment_typeIndexer.fit(Indexedrate_code).transform(Indexedrate_code)
IndexedFinal.show()
Output:
+---------+---------------+-------------+---------+------------+-----------+----------+------+--------------+--------------+-----------------+
|vendor_id|passenger_count|trip_distance|rate_code|payment_type|fare_amount|tip_amount|tipped|vendor_idIndex|rate_codeIndex|payment_typeIndex|
+---------+---------------+-------------+---------+------------+-----------+----------+------+--------------+--------------+-----------------+
| CMT| 1.0| 2.7| 1.0| CSH| 14.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 3.0| 20.4| 1.0| CSH| 58.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 2.1| 1.0| CSH| 9.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 1.3| 1.0| CSH| 6.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 2.0| 1.7| 1.0| CSH| 10.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 2.0| 1.7| 1.0| CSH| 10.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 1.0| 1.0| CSH| 6.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 9.2| 1.0| CSH| 28.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 2.6| 1.0| CSH| 11.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 1.4| 1.0| CSH| 6.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 4.0| 3.2| 1.0| CSH| 13.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 7.8| 1.0| CSH| 25.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 1.1| 1.0| CSH| 5.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 3.3| 1.0| CSH| 15.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 5.3| 1.0| CSH| 19.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 6.2| 1.0| CSH| 19.5| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 15.6| 2.0| CSH| 52.0| 0.0| 0.0| 1.0| 1.0| 1.0|
| CMT| 2.0| 0.9| 1.0| CSH| 6.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 1.0| 1.4| 1.0| CSH| 9.0| 0.0| 0.0| 1.0| 0.0| 1.0|
| CMT| 2.0| 1.2| 1.0| CSH| 7.0| 0.0| 0.0| 1.0| 0.0| 1.0|
+---------+---------------+-------------+---------+------------+-----------+----------+------+--------------+--------------+-----------------+
only showing top 20 rows
Functions for classification and regression:
def parseRowIndexingClassification(line):
features = np.array([line.vendor_idIndex, line.rate_codeIndex, line.payment_typeIndex,
line.passenger_count, line.trip_distance, line.fare_amount])
labPt = LabeledPoint(line.tipped, features)
return labPt
def parseRowIndexingRegression(line):
features = np.array([line.vendor_idIndex, line.rate_codeIndex, line.payment_typeIndex,
line.passenger_count, line.trip_distance, line.fare_amount])
labPt = LabeledPoint(line.tip_amount, features)
return labPt
Now, we create a random sampling of the data, as needed (25% is used here). This can save some time while training models. Then, split into train/test, and create indexed train/test LabeledPoint data objects for input into MLlib for classification and regression modeling.
trainData, testData = FinalSampled.randomSplit([trainingFraction, testingFraction], seed=seed);
#print("Train : " + str(trainData.count()) + " test : " + str(testData.count()))
indexedTRAINClassification = trainData.rdd.map(parseRowIndexingClassification)
indexedTESTClassification = testData.rdd.map(parseRowIndexingClassification)
Finally, we will train the random forsets Classification model specifying the number of categories for the categorical featues and print the trees.
categoricalFeaturesInfo={0:2, 1:12, 2:5}
rfModel_Classification = RandomForest.trainClassifier(indexedTRAINClassification, numClasses=2,
categoricalFeaturesInfo=categoricalFeaturesInfo,
numTrees=25, featureSubsetStrategy="auto",
impurity='gini', maxDepth=5, maxBins=32)
print('Learned classification forest model:')
print(rfModel_Classification.toDebugString())
Output:
Learned classification forest model:
TreeEnsembleModel classifier with 25 trees
Tree 0:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 1:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 2:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {1.0})
If (feature 5 <= 20.25)
If (feature 4 <= 0.48499999940395355)
If (feature 1 in {1.0})
Predict: 0.0
Else (feature 1 not in {1.0})
Predict: 1.0
Else (feature 4 > 0.48499999940395355)
Predict: 1.0
Else (feature 5 > 20.25)
Predict: 1.0
Else (feature 0 not in {1.0})
Predict: 1.0
Tree 3:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 1 in {6.0,5.0,4.0,3.0})
If (feature 1 in {6.0,5.0,4.0})
Predict: 1.0
Else (feature 1 not in {6.0,5.0,4.0})
If (feature 4 <= 0.8449999988079071)
If (feature 4 <= 0.7950000166893005)
Predict: 1.0
Else (feature 4 > 0.7950000166893005)
Predict: 0.0
Else (feature 4 > 0.8449999988079071)
Predict: 1.0
Else (feature 1 not in {6.0,5.0,4.0,3.0})
Predict: 1.0
Tree 4:
If (feature 5 <= 5.75)
If (feature 3 <= 1.5)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 3 > 1.5)
If (feature 4 <= 0.48499999940395355)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 4 > 0.48499999940395355)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 5 > 5.75)
If (feature 3 <= 1.5)
If (feature 4 <= 1.6950000524520874)
If (feature 0 in {0.0})
If (feature 2 in {1.0})
Predict: 0.0
Else (feature 2 not in {1.0})
Predict: 1.0
Else (feature 0 not in {0.0})
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 4 > 1.6950000524520874)
Predict: 1.0
Else (feature 3 > 1.5)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 5:
If (feature 4 <= 1.1449999809265137)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 5 <= 20.25)
Predict: 1.0
Else (feature 5 > 20.25)
If (feature 5 <= 59.75)
If (feature 2 in {2.0})
Predict: 0.0
Else (feature 2 not in {2.0})
Predict: 1.0
Else (feature 5 > 59.75)
Predict: 1.0
Else (feature 4 > 1.1449999809265137)
If (feature 3 <= 1.5)
If (feature 4 <= 1.6950000524520874)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 4 > 1.6950000524520874)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 3 > 1.5)
If (feature 3 <= 4.5)
If (feature 0 in {1.0})
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 0 not in {1.0})
If (feature 2 in {1.0})
Predict: 0.0
Else (feature 2 not in {1.0})
Predict: 1.0
Else (feature 3 > 4.5)
If (feature 4 <= 2.084999918937683)
If (feature 0 in {1.0})
Predict: 0.0
Else (feature 0 not in {1.0})
Predict: 1.0
Else (feature 4 > 2.084999918937683)
Predict: 1.0
Tree 6:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {1.0})
If (feature 1 in {6.0,5.0,4.0,3.0})
If (feature 4 <= 0.5949999988079071)
If (feature 4 <= 0.48499999940395355)
Predict: 1.0
Else (feature 4 > 0.48499999940395355)
Predict: 0.0
Else (feature 4 > 0.5949999988079071)
Predict: 1.0
Else (feature 1 not in {6.0,5.0,4.0,3.0})
Predict: 1.0
Else (feature 0 not in {1.0})
Predict: 1.0
Tree 7:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 1 in {5.0,4.0,3.0})
If (feature 4 <= 0.5949999988079071)
If (feature 3 <= 2.5)
Predict: 1.0
Else (feature 3 > 2.5)
If (feature 5 <= 59.75)
Predict: 1.0
Else (feature 5 > 59.75)
Predict: 0.0
Else (feature 4 > 0.5949999988079071)
Predict: 1.0
Else (feature 1 not in {5.0,4.0,3.0})
Predict: 1.0
Tree 8:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 9:
If (feature 2 in {1.0,3.0,4.0})
If (feature 4 <= 9.224999904632568)
Predict: 0.0
Else (feature 4 > 9.224999904632568)
If (feature 1 in {4.0,5.0,0.0})
Predict: 0.0
Else (feature 1 not in {4.0,5.0,0.0})
If (feature 5 <= 40.75)
If (feature 1 in {2.0,3.0})
Predict: 0.0
Else (feature 1 not in {2.0,3.0})
Predict: 1.0
Else (feature 5 > 40.75)
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 1 in {5.0,4.0,3.0})
If (feature 3 <= 4.5)
Predict: 1.0
Else (feature 3 > 4.5)
If (feature 0 in {1.0})
If (feature 4 <= 0.7950000166893005)
Predict: 0.0
Else (feature 4 > 0.7950000166893005)
Predict: 1.0
Else (feature 0 not in {1.0})
Predict: 1.0
Else (feature 1 not in {5.0,4.0,3.0})
Predict: 1.0
Tree 10:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 1 in {5.0,4.0,3.0})
If (feature 4 <= 0.5949999988079071)
If (feature 3 <= 3.5)
Predict: 1.0
Else (feature 3 > 3.5)
Predict: 0.0
Else (feature 4 > 0.5949999988079071)
Predict: 1.0
Else (feature 1 not in {5.0,4.0,3.0})
Predict: 1.0
Tree 11:
If (feature 2 in {1.0,3.0,4.0})
If (feature 1 in {5.0,1.0,6.0,9.0,2.0,3.0,4.0})
If (feature 5 <= 40.75)
If (feature 3 <= 3.5)
Predict: 0.0
Else (feature 3 > 3.5)
If (feature 1 in {2.0,3.0,4.0})
Predict: 0.0
Else (feature 1 not in {2.0,3.0,4.0})
Predict: 1.0
Else (feature 5 > 40.75)
Predict: 0.0
Else (feature 1 not in {5.0,1.0,6.0,9.0,2.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {1.0})
If (feature 1 in {5.0,4.0,3.0})
If (feature 4 <= 0.7950000166893005)
If (feature 3 <= 4.5)
Predict: 1.0
Else (feature 3 > 4.5)
Predict: 0.0
Else (feature 4 > 0.7950000166893005)
Predict: 1.0
Else (feature 1 not in {5.0,4.0,3.0})
Predict: 1.0
Else (feature 0 not in {1.0})
Predict: 1.0
Tree 12:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 1 in {6.0,5.0,4.0,3.0})
If (feature 4 <= 1.6950000524520874)
If (feature 1 in {5.0,4.0})
Predict: 1.0
Else (feature 1 not in {5.0,4.0})
If (feature 5 <= 5.25)
Predict: 0.0
Else (feature 5 > 5.25)
Predict: 1.0
Else (feature 4 > 1.6950000524520874)
Predict: 1.0
Else (feature 1 not in {6.0,5.0,4.0,3.0})
Predict: 1.0
Tree 13:
If (feature 4 <= 1.1449999809265137)
If (feature 4 <= 0.5949999988079071)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 4 > 0.5949999988079071)
If (feature 3 <= 1.5)
If (feature 5 <= 5.25)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 5 > 5.25)
If (feature 4 <= 0.8449999988079071)
Predict: 0.0
Else (feature 4 > 0.8449999988079071)
Predict: 1.0
Else (feature 3 > 1.5)
If (feature 0 in {1.0})
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 0 not in {1.0})
If (feature 4 <= 0.7950000166893005)
Predict: 0.0
Else (feature 4 > 0.7950000166893005)
Predict: 1.0
Else (feature 4 > 1.1449999809265137)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 14:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 15:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 16:
If (feature 4 <= 1.1449999809265137)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {1.0})
If (feature 5 <= 20.25)
Predict: 1.0
Else (feature 5 > 20.25)
If (feature 5 <= 40.75)
Predict: 0.0
Else (feature 5 > 40.75)
Predict: 1.0
Else (feature 0 not in {1.0})
Predict: 1.0
Else (feature 4 > 1.1449999809265137)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 17:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 18:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {1.0})
If (feature 1 in {5.0,4.0,3.0})
If (feature 4 <= 3.5749999284744263)
If (feature 5 <= 4.75)
Predict: 0.0
Else (feature 5 > 4.75)
Predict: 1.0
Else (feature 4 > 3.5749999284744263)
Predict: 1.0
Else (feature 1 not in {5.0,4.0,3.0})
Predict: 1.0
Else (feature 0 not in {1.0})
Predict: 1.0
Tree 19:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 4 <= 5.454999923706055)
If (feature 0 in {1.0})
Predict: 1.0
Else (feature 0 not in {1.0})
If (feature 1 in {4.0,2.0,1.0,3.0})
If (feature 5 <= 59.75)
Predict: 1.0
Else (feature 5 > 59.75)
Predict: 0.0
Else (feature 1 not in {4.0,2.0,1.0,3.0})
Predict: 1.0
Else (feature 4 > 5.454999923706055)
Predict: 1.0
Tree 20:
If (feature 4 <= 1.1449999809265137)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {1.0})
Predict: 1.0
Else (feature 0 not in {1.0})
If (feature 1 in {2.0,4.0,3.0})
If (feature 1 in {2.0})
Predict: 0.0
Else (feature 1 not in {2.0})
Predict: 1.0
Else (feature 1 not in {2.0,4.0,3.0})
Predict: 1.0
Else (feature 4 > 1.1449999809265137)
If (feature 3 <= 1.5)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 3 > 1.5)
If (feature 0 in {1.0})
If (feature 4 <= 2.084999918937683)
Predict: 0.0
Else (feature 4 > 2.084999918937683)
If (feature 3 <= 2.5)
Predict: 1.0
Else (feature 3 > 2.5)
Predict: 0.0
Else (feature 0 not in {1.0})
If (feature 5 <= 9.25)
If (feature 2 in {1.0})
Predict: 0.0
Else (feature 2 not in {1.0})
Predict: 1.0
Else (feature 5 > 9.25)
If (feature 2 in {1.0})
Predict: 0.0
Else (feature 2 not in {1.0})
Predict: 1.0
Tree 21:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 22:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 23:
If (feature 4 <= 1.1449999809265137)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Else (feature 4 > 1.1449999809265137)
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
Predict: 1.0
Tree 24:
If (feature 2 in {1.0,3.0,4.0})
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {1.0})
Predict: 1.0
Else (feature 0 not in {1.0})
If (feature 1 in {4.0,3.0})
If (feature 5 <= 59.75)
Predict: 1.0
Else (feature 5 > 59.75)
If (feature 4 <= 3.5749999284744263)
Predict: 0.0
Else (feature 4 > 3.5749999284744263)
Predict: 1.0
Else (feature 1 not in {4.0,3.0})
Predict: 1.0
Prediction and Evaluation on test data:
predictions_classification = rfModel_Classification.predict(indexedTESTClassification.map(lambda x: x.features))
predictionAndLabels_classification = indexedTESTClassification.map(lambda lp: lp.label).zip(predictions_classification)
# Area under ROC curve
metrics_classification = BinaryClassificationMetrics(predictionAndLabels_classification)
print("Area under ROC = %s" % metrics_classification.areaUnderROC)
Output:
Area under ROC = 0.981367320801355
indexedTRAINRegression = trainData.rdd.map(parseRowIndexingRegression)
indexedTESTRegression = testData.rdd.map(parseRowIndexingRegression)
Finally, we will train the random forsets Regression model specifying the number of categories for the categorical featues and print the trees.
categoricalFeaturesInfo={0:2, 1:12, 2:5}
rfModel_Regression = RandomForest.trainRegressor(indexedTRAINRegression,
categoricalFeaturesInfo=categoricalFeaturesInfo,
numTrees=25, featureSubsetStrategy="auto",
impurity='variance', maxDepth=10, maxBins=32)
print('Learned regression forest model:')
print(rfModel_Regression.toDebugString())
Output:
Learned regression forest model:
TreeEnsembleModel regressor with 25 trees
Tree 0:
If (feature 5 <= 24.25)
If (feature 2 in {1.0,3.0,4.0})
If (feature 4 <= 12.700000286102295)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
If (feature 5 <= 12.75)
If (feature 3 <= 1.5)
If (feature 4 <= 0.5949999988079071)
If (feature 1 in {1.0,3.0,5.0,6.0})
Predict: 0.0
Else (feature 1 not in {1.0,3.0,5.0,6.0})
If (feature 4 <= 0.4749999940395355)
If (feature 5 <= 5.25)
Predict: 0.0
Else (feature 5 > 5.25)
Predict: 0.001221166860458809
Else (feature 4 > 0.4749999940395355)
If (feature 5 <= 4.25)
Predict: 0.0
Else (feature 5 > 4.25)
Predict: 8.884501413504843E-4
Else (feature 4 > 0.5949999988079071)
If (feature 5 <= 8.25)
If (feature 5 <= 5.75)
Predict: 0.0
Else (feature 5 > 5.75)
If (feature 4 <= 2.7649999856948853)
Predict: 1.6123760576432068E-4
Else (feature 4 > 2.7649999856948853)
Predict: 0.04186046400735544
Else (feature 5 > 8.25)
Predict: 0.0
Else (feature 3 > 1.5)
If (feature 3 <= 2.5)
If (feature 4 <= 0.5949999988079071)
If (feature 4 <= 0.4749999940395355)
Predict: 0.0
Else (feature 4 > 0.4749999940395355)
If (feature 5 <= 9.25)
Predict: 0.0
Else (feature 5 > 9.25)
Predict: 0.34999998410542804
Else (feature 4 > 0.5949999988079071)
If (feature 5 <= 8.75)
If (feature 4 <= 0.7950000166893005)
Predict: 3.9556962025316455E-4
Else (feature 4 > 0.7950000166893005)
Predict: 7.848320773919378E-5
Else (feature 5 > 8.75)
Predict: 0.0
Else (feature 3 > 2.5)
If (feature 5 <= 6.75)
If (feature 1 in {1.0,4.0})
Predict: 0.0
Else (feature 1 not in {1.0,4.0})
If (feature 5 <= 5.75)
Predict: 0.0
Else (feature 5 > 5.75)
Predict: 0.001201569361684368
Else (feature 5 > 6.75)
Predict: 0.0
Else (feature 5 > 12.75)
If (feature 5 <= 20.75)
If (feature 5 <= 14.75)
If (feature 4 <= 1.8049999475479126)
If (feature 5 <= 13.75)
If (feature 5 <= 13.25)
Predict: 0.0
Else (feature 5 > 13.25)
Predict: 0.007623888182973317
Else (feature 5 > 13.75)
Predict: 0.0
Else (feature 4 > 1.8049999475479126)
If (feature 4 <= 3.625)
If (feature 4 <= 2.5749999284744263)
Predict: 5.362426211848061E-4
Else (feature 4 > 2.5749999284744263)
Predict: 0.0
Else (feature 4 > 3.625)
If (feature 4 <= 4.0950000286102295)
Predict: 0.001181102340118705
Else (feature 4 > 4.0950000286102295)
Predict: 0.0
Else (feature 5 > 14.75)
If (feature 4 <= 4.0950000286102295)
If (feature 3 <= 1.5)
If (feature 5 <= 15.75)
Predict: 0.0
Else (feature 5 > 15.75)
Predict: 0.0014655350701109615
Else (feature 3 > 1.5)
Predict: 0.0
Else (feature 4 > 4.0950000286102295)
Predict: 0.0
Else (feature 5 > 20.75)
Predict: 0.0
Else (feature 4 > 12.700000286102295)
If (feature 1 in {0.0,2.0})
Predict: 0.0
Else (feature 1 not in {0.0,2.0})
If (feature 3 <= 1.5)
Predict: 8.5
Else (feature 3 > 1.5)
Predict: 0.0
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 0 in {0.0})
If (feature 1 in {0.0,3.0})
If (feature 3 <= 3.5)
If (feature 5 <= 12.25)
If (feature 5 <= 7.75)
If (feature 5 <= 5.75)
If (feature 2 in {2.0})
If (feature 5 <= 5.25)
Predict: 0.8672591599268052
Else (feature 5 > 5.25)
Predict: 1.070707080039111
Else (feature 2 not in {2.0})
If (feature 3 <= 1.5)
Predict: 1.0942474052490003
Else (feature 3 > 1.5)
Predict: 1.080579636187869
Else (feature 5 > 5.75)
If (feature 4 <= 1.3350000381469727)
If (feature 4 <= 1.0950000286102295)
Predict: 1.330977781414295
Else (feature 4 > 1.0950000286102295)
Predict: 1.3618516748054432
Else (feature 4 > 1.3350000381469727)
Predict: 1.4236676546550344
Else (feature 5 > 7.75)
If (feature 4 <= 2.215000033378601)
If (feature 5 <= 9.75)
If (feature 1 in {0.0})
Predict: 1.684913027016892
Else (feature 1 not in {0.0})
Predict: 1.899999976158142
Else (feature 5 > 9.75)
If (feature 4 <= 1.5049999952316284)
Predict: 2.029793757472228
Else (feature 4 > 1.5049999952316284)
Predict: 1.9889401308375827
Else (feature 4 > 2.215000033378601)
If (feature 4 <= 2.7649999856948853)
If (feature 5 <= 10.25)
Predict: 1.826720022970644
Else (feature 5 > 10.25)
Predict: 2.062994081223262
Else (feature 4 > 2.7649999856948853)
If (feature 4 <= 4.0950000286102295)
Predict: 2.1248015292439884
Else (feature 4 > 4.0950000286102295)
Predict: 9.0
Else (feature 5 > 12.25)
If (feature 2 in {2.0})
If (feature 4 <= 4.704999923706055)
If (feature 4 <= 4.0950000286102295)
Predict: 2.5793899714773603
Else (feature 4 > 4.0950000286102295)
Predict: 3.049680856948203
Else (feature 4 > 4.704999923706055)
If (feature 5 <= 20.75)
If (feature 4 <= 5.515000104904175)
Predict: 3.3192523428212817
Else (feature 4 > 5.515000104904175)
Predict: 3.517008506334745
Else (feature 5 > 20.75)
If (feature 4 <= 6.855000019073486)
Predict: 4.001401291531362
Else (feature 4 > 6.855000019073486)
Predict: 4.612261916909899
Else (feature 2 not in {2.0})
If (feature 5 <= 18.25)
If (feature 3 <= 2.5)
If (feature 5 <= 15.25)
Predict: 2.4789619753114223
Else (feature 5 > 15.25)
Predict: 2.9870010042250867
Else (feature 3 > 2.5)
If (feature 5 <= 15.75)
Predict: 2.496179523222373
Else (feature 5 > 15.75)
Predict: 3.1307795254958966
Else (feature 5 > 18.25)
If (feature 5 <= 20.75)
Predict: 3.4956710842848784
Else (feature 5 > 20.75)
If (feature 4 <= 6.855000019073486)
Predict: 3.914033086576562
Else (feature 4 > 6.855000019073486)
Predict: 4.245271024534371
Else (feature 3 > 3.5)
If (feature 3 <= 5.5)
If (feature 5 <= 12.25)
If (feature 2 in {2.0})
If (feature 3 <= 4.5)
Predict: 1.4892771083367877
Else (feature 3 > 4.5)
Predict: 1.4200277009829259
Else (feature 2 not in {2.0})
If (feature 5 <= 7.75)
If (feature 3 <= 4.5)
Predict: 1.2671493111397083
Else (feature 3 > 4.5)
Predict: 1.2619366881757208
Else (feature 5 > 7.75)
Predict: 1.8598017935057558
Else (feature 5 > 12.25)
If (feature 5 <= 16.75)
If (feature 5 <= 14.25)
If (feature 5 <= 13.25)
Predict: 2.352012153736649
Else (feature 5 > 13.25)
Predict: 2.4903259697178055
Else (feature 5 > 14.25)
If (feature 4 <= 4.0950000286102295)
Predict: 2.722898017668372
Else (feature 4 > 4.0950000286102295)
Predict: 2.833566433052361
Else (feature 5 > 16.75)
If (feature 3 <= 4.5)
If (feature 5 <= 20.75)
Predict: 3.366045410974813
Else (feature 5 > 20.75)
Predict: 4.230404047645402
Else (feature 3 > 4.5)
If (feature 5 <= 20.75)
Predict: 3.3705701882795682
Else (feature 5 > 20.75)
Predict: 4.0339489801256985
Else (feature 3 > 5.5)
If (feature 4 <= 2.7649999856948853)
If (feature 5 <= 8.25)
If (feature 5 <= 5.75)
If (feature 4 <= 1.3350000381469727)
Predict: 1.08850128347418
Else (feature 4 > 1.3350000381469727)
Predict: 2.0400000333786013
Else (feature 5 > 5.75)
If (feature 2 in {2.0})
Predict: 1.2796721285809585
Else (feature 2 not in {2.0})
Predict: 1.404713584225799
Else (feature 5 > 8.25)
If (feature 4 <= 2.09499990940094)
Predict: 1.8782565547312495
Else (feature 4 > 2.09499990940094)
If (feature 5 <= 12.25)
Predict: 1.9597463214663833
Else (feature 5 > 12.25)
Predict: 2.6016171588215102
Else (feature 4 > 2.7649999856948853)
If (feature 4 <= 4.704999923706055)
If (feature 2 in {2.0})
If (feature 4 <= 4.0950000286102295)
Predict: 2.3561855566870307
Else (feature 4 > 4.0950000286102295)
Predict: 3.266000008583069
Else (feature 2 not in {2.0})
If (feature 4 <= 3.625)
Predict: 2.4618393616206853
Else (feature 4 > 3.625)
Predict: 2.899717037529601
Else (feature 4 > 4.704999923706055)
If (feature 4 <= 6.855000019073486)
If (feature 5 <= 18.25)
Predict: 2.988068719398919
Else (feature 5 > 18.25)
Predict: 3.7172788180630585
Else (feature 4 > 6.855000019073486)
Predict: 4.346206889275847
Else (feature 1 not in {0.0,3.0})
If (feature 5 <= 20.75)
If (feature 3 <= 1.5)
Predict: 1.5
Else (feature 3 > 1.5)
Predict: 6.300000190734863
Else (feature 5 > 20.75)
If (feature 3 <= 1.5)
Predict: 78.0
Else (feature 3 > 1.5)
If (feature 4 <= 3.625)
Predict: 6.800000190734863
Else (feature 4 > 3.625)
Predict: 7.050000190734863
Else (feature 0 not in {0.0})
If (feature 5 <= 12.25)
If (feature 5 <= 8.25)
If (feature 3 <= 3.5)
If (feature 4 <= 2.5749999284744263)
If (feature 1 in {3.0,2.0,5.0})
If (feature 4 <= 0.7950000166893005)
If (feature 1 in {3.0})
Predict: 0.7333333333333333
Else (feature 1 not in {3.0})
Predict: 0.9799999952316284
Else (feature 4 > 0.7950000166893005)
Predict: 1.875
Else (feature 1 not in {3.0,2.0,5.0})
If (feature 1 in {4.0,1.0})
If (feature 4 <= 0.5949999988079071)
Predict: 2.0333333015441895
Else (feature 4 > 0.5949999988079071)
Predict: 1.047142846243722
Else (feature 1 not in {4.0,1.0})
If (feature 3 <= 1.5)
Predict: 1.3781561769193391
Else (feature 3 > 1.5)
Predict: 1.3995485300195203
Else (feature 4 > 2.5749999284744263)
Predict: 6.3971428709598825
Else (feature 3 > 3.5)
If (feature 4 <= 1.0049999952316284)
Predict: 1.30311386381571
Else (feature 4 > 1.0049999952316284)
If (feature 4 <= 1.1950000524520874)
If (feature 3 <= 5.5)
If (feature 3 <= 4.5)
Predict: 1.5036051452415695
Else (feature 3 > 4.5)
Predict: 1.4000000026490953
Else (feature 3 > 5.5)
Predict: 1.0
Else (feature 4 > 1.1950000524520874)
If (feature 4 <= 2.09499990940094)
If (feature 4 <= 1.8049999475479126)
Predict: 1.6484381126053265
Else (feature 4 > 1.8049999475479126)
Predict: 1.893199965953827
Else (feature 4 > 2.09499990940094)
If (feature 4 <= 6.855000019073486)
Predict: 1.3423076776357799
Else (feature 4 > 6.855000019073486)
Predict: 1.600000023841858
Else (feature 5 > 8.25)
Predict: 1.976090599779964
Else (feature 5 > 12.25)
If (feature 5 <= 16.75)
If (feature 1 in {5.0,3.0})
Predict: 1.953333314259847
Else (feature 1 not in {5.0,3.0})
If (feature 5 <= 14.25)
Predict: 2.484395947335571
Else (feature 5 > 14.25)
If (feature 5 <= 15.25)
Predict: 2.732694543128903
Else (feature 5 > 15.25)
If (feature 1 in {0.0})
If (feature 3 <= 5.5)
Predict: 2.9364147325522545
Else (feature 3 > 5.5)
Predict: 4.8125
Else (feature 1 not in {0.0})
Predict: 3.0
Else (feature 5 > 16.75)
If (feature 1 in {0.0,5.0,1.0,2.0,3.0})
Predict: 3.630627151993512
Else (feature 1 not in {0.0,5.0,1.0,2.0,3.0})
Predict: 6.5
Else (feature 5 > 24.25)
If (feature 2 in {1.0,3.0,4.0})
If (feature 4 <= 12.700000286102295)
If (feature 4 <= 3.625)
If (feature 4 <= 3.2649999856948853)
Predict: 0.0
Else (feature 4 > 3.2649999856948853)
If (feature 2 in {3.0,4.0})
Predict: 0.0
Else (feature 2 not in {3.0,4.0})
If (feature 5 <= 29.25)
If (feature 3 <= 2.5)
Predict: 0.0
Else (feature 3 > 2.5)
Predict: 0.20769231136028582
Else (feature 5 > 29.25)
Predict: 0.0
Else (feature 4 > 3.625)
If (feature 1 in {0.0,5.0,2.0,3.0,4.0})
If (feature 3 <= 1.5)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
If (feature 5 <= 40.25)
If (feature 2 in {3.0,4.0})
Predict: 0.0
Else (feature 2 not in {3.0,4.0})
If (feature 5 <= 29.25)
Predict: 5.226708480334509E-4
Else (feature 5 > 29.25)
Predict: 7.766990291262136E-4
Else (feature 5 > 40.25)
Predict: 0.0
Else (feature 3 > 1.5)
Predict: 0.0
Else (feature 1 not in {0.0,5.0,2.0,3.0,4.0})
If (feature 5 <= 40.25)
Predict: 7.800000190734863
Else (feature 5 > 40.25)
Predict: 0.0
Else (feature 4 > 12.700000286102295)
If (feature 2 in {4.0,1.0})
If (feature 5 <= 72.75)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
If (feature 1 in {0.0,3.0,4.0,1.0})
If (feature 3 <= 1.5)
If (feature 4 <= 22.6850004196167)
If (feature 1 in {0.0,3.0,4.0})
Predict: 0.0
Else (feature 1 not in {0.0,3.0,4.0})
Predict: 0.006884295543135039
Else (feature 4 > 22.6850004196167)
Predict: 0.0
Else (feature 3 > 1.5)
Predict: 0.0
Else (feature 1 not in {0.0,3.0,4.0,1.0})
Predict: 0.10271605032461661
Else (feature 5 > 72.75)
If (feature 3 <= 1.5)
Predict: 0.0
Else (feature 3 > 1.5)
If (feature 1 in {0.0,2.0,4.0})
Predict: 0.0
Else (feature 1 not in {0.0,2.0,4.0})
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
Predict: 0.3225806451612903
Else (feature 2 not in {4.0,1.0})
If (feature 1 in {0.0,2.0,3.0})
Predict: 0.0
Else (feature 1 not in {0.0,2.0,3.0})
If (feature 3 <= 3.5)
Predict: 0.0
Else (feature 3 > 3.5)
Predict: 6.666666666666667
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 1 in {0.0})
If (feature 5 <= 29.25)
If (feature 2 in {2.0})
Predict: 4.765992650214364
Else (feature 2 not in {2.0})
If (feature 3 <= 1.5)
If (feature 4 <= 9.095000267028809)
If (feature 4 <= 6.855000019073486)
Predict: 4.628108190893623
Else (feature 4 > 6.855000019073486)
Predict: 5.109736812340437
Else (feature 4 > 9.095000267028809)
Predict: 5.894786603555816
Else (feature 3 > 1.5)
Predict: 5.113387489034704
Else (feature 5 > 29.25)
If (feature 0 in {1.0})
Predict: 7.092535707769465
Else (feature 0 not in {1.0})
If (feature 2 in {2.0})
If (feature 5 <= 72.75)
If (feature 4 <= 12.700000286102295)
Predict: 6.139399127387182
Else (feature 4 > 12.700000286102295)
If (feature 5 <= 40.25)
Predict: 5.728181882338091
Else (feature 5 > 40.25)
Predict: 8.758620697876502
Else (feature 5 > 72.75)
Predict: 30.0
Else (feature 2 not in {2.0})
If (feature 4 <= 12.700000286102295)
If (feature 5 <= 40.25)
Predict: 6.622233947063481
Else (feature 5 > 40.25)
Predict: 8.266169436368115
Else (feature 4 > 12.700000286102295)
If (feature 4 <= 22.6850004196167)
Predict: 8.300156427017264
Else (feature 4 > 22.6850004196167)
Predict: 13.641867407833237
Else (feature 1 not in {0.0})
If (feature 5 <= 72.75)
If (feature 4 <= 12.700000286102295)
If (feature 1 in {4.0,3.0})
If (feature 5 <= 40.25)
If (feature 5 <= 29.25)
If (feature 0 in {1.0})
If (feature 1 in {4.0})
Predict: 3.1000000146719127
Else (feature 1 not in {4.0})
Predict: 3.5200000047683715
Else (feature 0 not in {1.0})
If (feature 4 <= 9.095000267028809)
Predict: 5.929999987284343
Else (feature 4 > 9.095000267028809)
Predict: 1.0
Else (feature 5 > 29.25)
If (feature 1 in {3.0})
If (feature 4 <= 1.4049999713897705)
Predict: 3.8662790808566783
Else (feature 4 > 1.4049999713897705)
Predict: 6.578249990940094
Else (feature 1 not in {3.0})
If (feature 3 <= 1.5)
Predict: 5.707916716734569
Else (feature 3 > 1.5)
Predict: 8.6
Else (feature 5 > 40.25)
If (feature 1 in {4.0})
If (feature 4 <= 6.855000019073486)
Predict: 11.5
Else (feature 4 > 6.855000019073486)
If (feature 3 <= 1.5)
Predict: 5.032631472537392
Else (feature 3 > 1.5)
Predict: 11.434999942779541
Else (feature 1 not in {4.0})
If (feature 0 in {0.0})
If (feature 3 <= 2.5)
Predict: 7.534161168196557
Else (feature 3 > 2.5)
Predict: 7.9190476054237005
Else (feature 0 not in {0.0})
Predict: 7.860710682205591
Else (feature 1 not in {4.0,3.0})
If (feature 3 <= 1.5)
If (feature 1 in {1.0})
If (feature 0 in {0.0})
If (feature 2 in {0.0})
Predict: 9.485638695122018
Else (feature 2 not in {0.0})
Predict: 10.399999618530273
Else (feature 0 not in {0.0})
Predict: 10.209748482835368
Else (feature 1 not in {1.0})
If (feature 2 in {2.0})
Predict: 10.199999809265137
Else (feature 2 not in {2.0})
If (feature 5 <= 40.25)
Predict: 11.458571468080793
Else (feature 5 > 40.25)
Predict: 10.256923137566982
Else (feature 3 > 1.5)
If (feature 5 <= 40.25)
If (feature 4 <= 5.515000104904175)
Predict: 8.172500133514404
Else (feature 4 > 5.515000104904175)
Predict: 5.0
Else (feature 5 > 40.25)
If (feature 2 in {2.0})
If (feature 4 <= 0.4749999940395355)
Predict: 10.399999618530273
Else (feature 4 > 0.4749999940395355)
Predict: 5.199999809265137
Else (feature 2 not in {2.0})
Predict: 10.55013706912733
Else (feature 4 > 12.700000286102295)
If (feature 1 in {3.0,5.0,1.0,4.0})
If (feature 5 <= 40.25)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
Predict: 1.4
Else (feature 5 > 40.25)
If (feature 3 <= 4.5)
If (feature 1 in {5.0,1.0,3.0})
If (feature 0 in {0.0})
Predict: 9.843881849969373
Else (feature 0 not in {0.0})
Predict: 9.937772512619807
Else (feature 1 not in {5.0,1.0,3.0})
If (feature 4 <= 22.6850004196167)
Predict: 10.81983741124471
Else (feature 4 > 22.6850004196167)
Predict: 0.0
Else (feature 3 > 4.5)
If (feature 2 in {2.0})
If (feature 3 <= 5.5)
Predict: 8.867368246379652
Else (feature 3 > 5.5)
Predict: 8.728571278708321
Else (feature 2 not in {2.0})
If (feature 1 in {1.0})
Predict: 10.164691484895744
Else (feature 1 not in {1.0})
Predict: 12.015555487738716
Else (feature 1 not in {3.0,5.0,1.0,4.0})
If (feature 2 in {2.0})
Predict: 8.871428549289703
Else (feature 2 not in {2.0})
If (feature 0 in {0.0})
Predict: 12.782215757239374
Else (feature 0 not in {0.0})
Predict: 12.95225352370683
Else (feature 5 > 72.75)
If (feature 1 in {3.0})
If (feature 4 <= 12.700000286102295)
If (feature 3 <= 1.5)
If (feature 4 <= 2.215000033378601)
If (feature 4 <= 1.4049999713897705)
If (feature 4 <= 1.2649999856948853)
Predict: 7.538640763267007
Else (feature 4 > 1.2649999856948853)
Predict: 36.619998931884766
Else (feature 4 > 1.4049999713897705)
Predict: 0.0
Else (feature 4 > 2.215000033378601)
If (feature 4 <= 2.9950000047683716)
Predict: 24.900000381469727
Else (feature 4 > 2.9950000047683716)
Predict: 9.901343274472366
Else (feature 3 > 1.5)
If (feature 4 <= 0.925000011920929)
If (feature 3 <= 2.5)
If (feature 0 in {0.0})
Predict: 10.472222222222221
Else (feature 0 not in {0.0})
Predict: 14.577777650621202
Else (feature 3 > 2.5)
Predict: 13.0
Else (feature 4 > 0.925000011920929)
If (feature 0 in {1.0})
Predict: 3.9753571837209165
Else (feature 0 not in {1.0})
Predict: 15.563333511352539
Else (feature 4 > 12.700000286102295)
If (feature 0 in {1.0})
Predict: 14.524624608099103
Else (feature 0 not in {1.0})
If (feature 3 <= 3.5)
If (feature 4 <= 22.6850004196167)
If (feature 3 <= 2.5)
Predict: 13.6980770276143
Else (feature 3 > 2.5)
Predict: 19.696666717529297
Else (feature 4 > 22.6850004196167)
Predict: 16.412744252626286
Else (feature 3 > 3.5)
Predict: 14.374444749620226
Else (feature 1 not in {3.0})
If (feature 4 <= 22.6850004196167)
If (feature 4 <= 12.700000286102295)
Predict: 24.700000762939453
Else (feature 4 > 12.700000286102295)
If (feature 3 <= 1.5)
Predict: 15.145885282690797
Else (feature 3 > 1.5)
If (feature 3 <= 2.5)
If (feature 1 in {2.0})
Predict: 12.50358021112136
Else (feature 1 not in {2.0})
Predict: 13.649999965320934
Else (feature 3 > 2.5)
If (feature 3 <= 3.5)
Predict: 17.760606187762637
Else (feature 3 > 3.5)
Predict: 14.239862990705934
Else (feature 4 > 22.6850004196167)
If (feature 1 in {2.0})
Predict: 13.870686205724875
Else (feature 1 not in {2.0})
If (feature 0 in {0.0})
Predict: 16.782051098652374
Else (feature 0 not in {0.0})
Predict: 19.368235283038196
Tree 1:
If (feature 2 in {1.0,3.0,4.0})
If (feature 4 <= 12.700000286102295)
If (feature 1 in {0.0,5.0,6.0,2.0,3.0,4.0})
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
If (feature 1 in {5.0,6.0,2.0,3.0,4.0})
Predict: 0.0
Else (feature 1 not in {5.0,6.0,2.0,3.0,4.0})
If (feature 5 <= 18.25)
If (feature 5 <= 13.25)
If (feature 4 <= 0.5949999988079071)
If (feature 3 <= 1.5)
If (feature 5 <= 4.25)
Predict: 0.0
Else (feature 5 > 4.25)
Predict: 4.0201004625764524E-4
Else (feature 3 > 1.5)
If (feature 4 <= 0.4749999940395355)
Predict: 0.0
Else (feature 4 > 0.4749999940395355)
Predict: 0.0027325958420723077
Else (feature 4 > 0.5949999988079071)
If (feature 4 <= 3.625)
If (feature 5 <= 8.75)
Predict: 6.751395828166836E-5
Else (feature 5 > 8.75)
Predict: 0.0
Else (feature 4 > 3.625)
If (feature 3 <= 1.5)
Predict: 6.321112515802782E-4
Else (feature 3 > 1.5)
Predict: 0.0
Else (feature 5 > 13.25)
If (feature 3 <= 3.5)
If (feature 2 in {3.0,4.0})
Predict: 0.0
Else (feature 2 not in {3.0,4.0})
If (feature 3 <= 1.5)
Predict: 3.7706090126758663E-4
Else (feature 3 > 1.5)
Predict: 0.0
Else (feature 3 > 3.5)
If (feature 5 <= 14.75)
Predict: 0.00271453581841731
Else (feature 5 > 14.75)
Predict: 0.0
Else (feature 5 > 18.25)
If (feature 4 <= 3.625)
If (feature 4 <= 3.2649999856948853)
Predict: 0.0
Else (feature 4 > 3.2649999856948853)
Predict: 0.021399839405848532
Else (feature 4 > 3.625)
If (feature 3 <= 3.5)
If (feature 3 <= 2.5)
If (feature 3 <= 1.5)
Predict: 3.4662044920589504E-4
Else (feature 3 > 1.5)
Predict: 3.124349093938763E-4
Else (feature 3 > 2.5)
Predict: 0.0
Else (feature 3 > 3.5)
If (feature 2 in {3.0,4.0})
Predict: 0.0
Else (feature 2 not in {3.0,4.0})
If (feature 3 <= 4.5)
Predict: 0.003167062549485352
Else (feature 3 > 4.5)
Predict: 0.0
Else (feature 1 not in {0.0,5.0,6.0,2.0,3.0,4.0})
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
If (feature 2 in {3.0,4.0})
Predict: 0.0
Else (feature 2 not in {3.0,4.0})
If (feature 5 <= 40.25)
If (feature 5 <= 20.75)
Predict: 0.0
Else (feature 5 > 20.75)
Predict: 7.800000190734863
Else (feature 5 > 40.25)
Predict: 0.0
Else (feature 4 > 12.700000286102295)
If (feature 2 in {4.0,1.0})
If (feature 1 in {4.0,0.0,1.0})
If (feature 5 <= 72.75)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
If (feature 4 <= 22.6850004196167)
If (feature 3 <= 3.5)
If (feature 1 in {4.0,0.0})
If (feature 5 <= 40.25)
Predict: 0.0
Else (feature 5 > 40.25)
Predict: 0.004302747636461827
Else (feature 1 not in {4.0,0.0})
Predict: 0.006124503533571761
Else (feature 3 > 3.5)
If (feature 1 in {0.0,4.0})
Predict: 0.0
Else (feature 1 not in {0.0,4.0})
If (feature 2 in {4.0})
Predict: 0.0
Else (feature 2 not in {4.0})
Predict: 0.04109004888489348
Else (feature 4 > 22.6850004196167)
Predict: 0.0
Else (feature 5 > 72.75)
If (feature 1 in {4.0})
Predict: 0.0
Else (feature 1 not in {4.0})
If (feature 3 <= 1.5)
If (feature 4 <= 22.6850004196167)
Predict: 0.0
Else (feature 4 > 22.6850004196167)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
Predict: 0.28846153846153844
Else (feature 3 > 1.5)
Predict: 0.0
Else (feature 1 not in {4.0,0.0,1.0})
If (feature 1 in {2.0})
If (feature 3 <= 2.5)
If (feature 5 <= 72.75)
If (feature 3 <= 1.5)
If (feature 4 <= 22.6850004196167)
If (feature 5 <= 20.75)
Predict: 0.0
Else (feature 5 > 20.75)
Predict: 0.06694915335057146
Else (feature 4 > 22.6850004196167)
Predict: 0.0
Else (feature 3 > 1.5)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
Predict: 0.23622047244094488
Else (feature 5 > 72.75)
Predict: 0.0
Else (feature 3 > 2.5)
Predict: 0.0
Else (feature 1 not in {2.0})
If (feature 3 <= 1.5)
Predict: 0.0
Else (feature 3 > 1.5)
If (feature 0 in {0.0})
Predict: 0.0
Else (feature 0 not in {0.0})
If (feature 4 <= 22.6850004196167)
Predict: 0.75
Else (feature 4 > 22.6850004196167)
Predict: 0.0
Else (feature 2 not in {4.0,1.0})
If (feature 5 <= 40.25)
Predict: 0.0
Else (feature 5 > 40.25)
If (feature 1 in {0.0,2.0,3.0,4.0})
Predict: 0.0
Else (feature 1 not in {0.0,2.0,3.0,4.0})
Predict: 0.42105263157894735
Else (feature 2 not in {1.0,3.0,4.0})
If (feature 4 <= 6.855000019073486)
If (feature 1 in {0.0,5.0,4.0})
If (feature 1 in {0.0,5.0})
If (feature 0 in {0.0})
If (feature 4 <= 2.7649999856948853)
If (feature 4 <= 1.5049999952316284)
If (feature 5 <= 7.25)
If (feature 5 <= 5.75)
If (feature 5 <= 4.525000095367432)
Predict: 1.0020008414731925
Else (feature 5 > 4.525000095367432)
Predict: 1.15045864901304
Else (feature 5 > 5.75)
If (feature 5 <= 6.25)
Predict: 1.2621283651735642
Else (feature 5 > 6.25)
Predict: 1.3743548931628613
Else (feature 5 > 7.25)
If (feature 2 in {2.0})
If (feature 5 <= 9.75)
Predict: 1.5138285768372672
Else (feature 5 > 9.75)
Predict: 2.0079861090828977
Else (feature 2 not in {2.0})
If (feature 5 <= 9.75)
Predict: 1.6063360900216812
Else (feature 5 > 9.75)
Predict: 2.1510405192478754
Else (feature 4 > 1.5049999952316284)
If (feature 4 <= 2.09499990940094)
If (feature 5 <= 10.75)
If (feature 5 <= 8.75)
Predict: 1.554917197170656
Else (feature 5 > 8.75)
Predict: 1.8166922351283001
Else (feature 5 > 10.75)
If (feature 3 <= 5.5)
Predict: 2.337158920920737
Else (feature 3 > 5.5)
Predict: 2.274364187651958
Else (feature 4 > 2.09499990940094)
If (feature 4 <= 2.3950001001358032)
If (feature 4 <= 2.215000033378601)
Predict: 1.9529296794468578
Else (feature 4 > 2.215000033378601)
Predict: 2.021372834482952
Else (feature 4 > 2.3950001001358032)
If (feature 2 in {2.0})
Predict: 2.0623413187324244
Else (feature 2 not in {2.0})
Predict: 2.1686595384964535
Else (feature 4 > 2.7649999856948853)
If (feature 5 <= 16.75)
If (feature 5 <= 13.75)
If (feature 4 <= 3.2649999856948853)
If (feature 5 <= 11.75)
Predict: 2.079552211359105
Else (feature 5 > 11.75)
Predict: 2.304190258245848
Else (feature 4 > 3.2649999856948853)
If (feature 3 <= 1.5)
Predict: 2.346835129267193
Else (feature 3 > 1.5)
Predict: 2.3689208593450006
Else (feature 5 > 13.75)
If (feature 5 <= 15.25)
If (feature 5 <= 14.25)
Predict: 2.5437982024645867
Else (feature 5 > 14.25)
Predict: 2.659631055539128
Else (feature 5 > 15.25)
If (feature 4 <= 3.625)
Predict: 2.846165316059
Else (feature 4 > 3.625)
Predict: 2.8942424089271652
Else (feature 5 > 16.75)
If (feature 3 <= 1.5)
If (feature 4 <= 5.515000104904175)
If (feature 2 in {2.0})
Predict: 3.337187506935813
Else (feature 2 not in {2.0})
Predict: 3.4049743559769055
Else (feature 4 > 5.515000104904175)
Predict: 3.8622495298074537
Else (feature 3 > 1.5)
If (feature 2 in {0.0})
If (feature 3 <= 5.5)
Predict: 3.58823555171601
Else (feature 3 > 5.5)
Predict: 3.604386948493053
Else (feature 2 not in {0.0})
Predict: 3.669304546406515
Else (feature 0 not in {0.0})
Predict: 2.038488721629872
Else (feature 1 not in {0.0,5.0})
Predict: 4.060000008658359
Else (feature 1 not in {0.0,5.0,4.0})
If (feature 0 in {1.0})
If (feature 1 in {3.0,2.0})
Predict: 6.892580268304356
Else (feature 1 not in {3.0,2.0})
If (feature 3 <= 1.5)
If (feature 5 <= 40.25)
If (feature 4 <= 5.515000104904175)
Predict: 4.957272746346214
Else (feature 4 > 5.515000104904175)
Predict: 0.0
Else (feature 5 > 40.25)
If (feature 4 <= 1.1950000524520874)
Predict: 11.276021567724085
Else (feature 4 > 1.1950000524520874)
If (feature 4 <= 1.5049999952316284)
Predict: 6.041666666666667
Else (feature 4 > 1.5049999952316284)
If (feature 4 <= 1.6050000190734863)
Predict: 16.720000076293946
Else (feature 4 > 1.6050000190734863)
Predict: 10.075528524755462
Else (feature 3 > 1.5)
If (feature 3 <= 4.5)
If (feature 5 <= 15.25)
If (feature 5 <= 12.75)
If (feature 5 <= 10.25)
Predict: 2.5
Else (feature 5 > 10.25)
Predict: 2.4333333174387612
Else (feature 5 > 12.75)
Predict: 2.0
Else (feature 5 > 15.25)
If (feature 3 <= 2.5)
Predict: 8.97983058024261
Else (feature 3 > 2.5)
Predict: 10.856250047683716
Else (feature 3 > 4.5)
Predict: 11.550000190734863
Else (feature 0 not in {1.0})
If (feature 1 in {3.0})
If (feature 4 <= 4.0950000286102295)
Predict: 6.063990705060987
Else (feature 4 > 4.0950000286102295)
Predict: 7.766253213934812
Else (feature 1 not in {3.0})
If (feature 5 <= 24.25)
Predict: 0.0
Else (feature 5 > 24.25)
If (feature 3 <= 5.5)
If (feature 4 <= 0.4749999940395355)
If (feature 3 <= 2.5)
Predict: 9.38347831229153
Else (feature 3 > 2.5)
If (feature 2 in {2.0})
Predict: 10.399999618530273
Else (feature 2 not in {2.0})
Predict: 10.601538511422964
Else (feature 4 > 0.4749999940395355)
If (feature 3 <= 4.5)
If (feature 4 <= 2.9950000047683716)
Predict: 10.686825464642236
Else (feature 4 > 2.9950000047683716)
Predict: 9.843191583105858
Else (feature 3 > 4.5)
Predict: 8.870000139872234
Else (feature 3 > 5.5)
If (feature 4 <= 1.1950000524520874)
If (feature 4 <= 0.4749999940395355)
Predict: 4.588000106811523
Else (feature 4 > 0.4749999940395355)
Predict: 5.0
Else (feature 4 > 1.1950000524520874)
Predict: 8.690000004238552
Else (feature 4 > 6.855000019073486)
If (feature 1 in {0.0})
If (feature 5 <= 29.25)
If (feature 5 <= 24.25)
If (feature 4 <= 12.700000286102295)
If (feature 5 <= 20.75)
If (feature 3 <= 4.5)
If (feature 5 <= 4.25)
Predict: 9.502499997615814
Else (feature 5 > 4.25)
If (feature 4 <= 9.095000267028809)
Predict: 3.490053741521733
Else (feature 4 > 9.095000267028809)
Predict: 2.706315819566187
Else (feature 3 > 4.5)
If (feature 3 <= 5.5)
Predict: 2.90838707647016
Else (feature 3 > 5.5)
Predict: 3.5709090449593286
Else (feature 5 > 20.75)
If (feature 3 <= 2.5)
If (feature 2 in {0.0})
Predict: 4.3256558679938415
Else (feature 2 not in {0.0})
Predict: 4.639636373519897
Else (feature 3 > 2.5)
Predict: 4.422383116593705
Else (feature 4 > 12.700000286102295)
Predict: 15.732222212685478
Else (feature 5 > 24.25)
If (feature 4 <= 9.095000267028809)
If (feature 2 in {2.0})
Predict: 4.482749985158444
Else (feature 2 not in {2.0})
Predict: 5.0696129001382895
Else (feature 4 > 9.095000267028809)
If (feature 2 in {2.0})
If (feature 3 <= 3.5)
Predict: 5.138113246773774
Else (feature 3 > 3.5)
If (feature 3 <= 4.5)
Predict: 7.125
Else (feature 3 > 4.5)
If (feature 3 <= 5.5)
Predict: 6.146000003814697
Else (feature 3 > 5.5)
Predict: 5.576666673024495
Else (feature 2 not in {2.0})
If (feature 3 <= 5.5)
If (feature 4 <= 12.700000286102295)
If (feature 0 in {0.0})
Predict: 5.778268448217999
Else (feature 0 not in {0.0})
Predict: 5.893438140667234
Else (feature 4 > 12.700000286102295)
Predict: 5.599999904632568
Else (feature 3 > 5.5)
Predict: 5.643672916588771
Else (feature 5 > 29.25)
If (feature 5 <= 40.25)
If (feature 4 <= 9.095000267028809)
Predict: 5.962104745970343
Else (feature 4 > 9.095000267028809)
If (feature 3 <= 2.5)
If (feature 2 in {2.0})
Predict: 6.095136998450919
Else (feature 2 not in {2.0})
Predict: 6.8102424300669
Else (feature 3 > 2.5)
If (feature 2 in {2.0})
Predict: 5.9381632512929485
Else (feature 2 not in {2.0})
If (feature 3 <= 3.5)
If (feature 0 in {1.0})
Predict: 6.503041234544296
Else (feature 0 not in {1.0})
Predict: 6.54746464892
Else (feature 3 > 3.5)
If (feature 3 <= 5.5)
Predict: 6.7364598506622535
Else (feature 3 > 5.5)
Predict: 6.853111104641892
Else (feature 5 > 40.25)
If (feature 4 <= 22.6850004196167)
If (feature 0 in {1.0})
If (feature 5 <= 72.75)
Predict: 8.326659051008436
Else (feature 5 > 72.75)
Predict: 16.552000045776367
Else (feature 0 not in {1.0})
If (feature 2 in {0.0})
If (feature 4 <= 9.095000267028809)
Predict: 7.0981817967963945
Else (feature 4 > 9.095000267028809)
If (feature 5 <= 72.75)
Predict: 8.444857087862369
Else (feature 5 > 72.75)
Predict: 7.5133334795633955
Else (feature 2 not in {0.0})
Predict: 8.833559278714455
Else (feature 4 > 22.6850004196167)
If (feature 3 <= 4.5)
If (feature 2 in {0.0})
If (feature 0 in {0.0})
If (feature 5 <= 72.75)
Predict: 10.985598044888825
Else (feature 5 > 72.75)
Predict: 16.703052651254755
Else (feature 0 not in {0.0})
Predict: 13.685059782993271
Else (feature 2 not in {0.0})
If (feature 3 <= 1.5)
Predict: 10.697499990463257
Else (feature 3 > 1.5)
Predict: 30.0
Else (feature 3 > 4.5)
If (feature 3 <= 5.5)
Predict: 23.337755067007883
Else (feature 3 > 5.5)
If (feature 5 <= 72.75)
Predict: 13.247894588269686
Else (feature 5 > 72.75)
Predict: 14.938461743868315
Else (feature 1 not in {0.0})
If (feature 4 <= 22.6850004196167)
If (feature 5 <= 72.75)
If (feature 1 in {4.0,3.0,1.0,5.0})
If (feature 5 <= 40.25)
If (feature 4 <= 9.095000267028809)
If (feature 5 <= 24.25)
If (feature 0 in {1.0})
Predict: 4.550000190734863
Else (feature 0 not in {1.0})
Predict: 78.0
Else (feature 5 > 24.25)
If (feature 2 in {2.0})
Predict: 5.360000133514404
Else (feature 2 not in {2.0})
Predict: 5.378765472383411
Else (feature 4 > 9.095000267028809)
If (feature 4 <= 12.700000286102295)
If (feature 3 <= 2.5)
Predict: 4.912666738033295
Else (feature 3 > 2.5)
Predict: 0.0
Else (feature 4 > 12.700000286102295)
If (feature 3 <= 2.5)
Predict: 3.4209091013128106
Else (feature 3 > 2.5)
Predict: 0.5714285714285714
Else (feature 5 > 40.25)
If (feature 2 in {2.0})
Predict: 8.67522917756247
Else (feature 2 not in {2.0})
If (feature 3 <= 4.5)
If (feature 4 <= 12.700000286102295)
Predict: 9.472063548035091
Else (feature 4 > 12.700000286102295)
Predict: 9.939695658583425
Else (feature 3 > 4.5)
If (feature 1 in {3.0,1.0})
Predict: 10.098971078714026
Else (feature 1 not in {3.0,1.0})
Predict: 14.46363639831543
Else (feature 1 not in {4.0,3.0,1.0,5.0})
If (feature 3 <= 2.5)
Predict: 12.738350197213657
Else (feature 3 > 2.5)
If (feature 2 in {2.0})
Predict: 6.189999997615814
Else (feature 2 not in {2.0})
If (feature 3 <= 5.5)
If (feature 3 <= 4.5)
Predict: 13.98496065816776
Else (feature 3 > 4.5)
Predict: 13.266956474470055
Else (feature 3 > 5.5)
Predict: 12.217469922031265
Else (feature 5 > 72.75)
If (feature 4 <= 12.700000286102295)
If (feature 1 in {3.0})
Predict: 11.309433982408834
Else (feature 1 not in {3.0})
Predict: 24.700000762939453
Else (feature 4 > 12.700000286102295)
If (feature 1 in {4.0,3.0})
Predict: 13.504989744494889
Else (feature 1 not in {4.0,3.0})
If (feature 3 <= 1.5)
If (feature 0 in {0.0})
Predict: 14.746363601299247
Else (feature 0 not in {0.0})
Predict: 15.970588252123665
Else (feature 3 > 1.5)
If (feature 3 <= 2.5)
Predict: 12.784246608002546
Else (feature 3 > 2.5)
Predict: 15.164252906010068
Else (feature 4 > 22.6850004196167)
If (feature 3 <= 1.5)
If (feature 1 in {1.0})
Predict: 10.551529178561056
Else (feature 1 not in {1.0})
Predict: 17.836814335574896
Else (feature 3 > 1.5)
If (feature 1 in {1.0})
Predict: 10.511585485644456
Else (feature 1 not in {1.0})
If (feature 3 <= 3.5)
If (feature 5 <= 72.75)
If (feature 1 in {4.0})
Predict: 14.5
Else (feature 1 not in {4.0})
Predict: 18.84999942779541
Else (feature 5 > 72.75)
If (feature 3 <= 2.5)
If (feature 1 in {2.0,3.0})
Predict: 14.882812589406967
Else (feature 1 not in {2.0,3.0})
Predict: 21.15333340962728
Else (feature 3 > 2.5)
Predict: 15.500294124378877
Else (feature 3 > 3.5)
If (feature 1 in {3.0})
If (feature 0 in {0.0})
Predict: 10.0
Else (feature 0 not in {0.0})
If (feature 3 <= 4.5)
Predict: 17.5
Else (feature 3 > 4.5)
Predict: 15.0
Else (feature 1 not in {3.0})
Predict: 18.713809512910387
..............................................
..............................................
...............................................
Prediction and Evaluation on test data:
predictions_Regression = rfModel_Regression.predict(indexedTESTRegression.map(lambda x: x.features))
predictionAndLabels_Regression = indexedTESTRegression.map(lambda lp: lp.label).zip(predictions_Regression)
metrics_Regression = RegressionMetrics(predictionAndLabels_Regression)
print("RMSE = %s" % metrics_Regression.rootMeanSquaredError)
print("R-sqr = %s" % metrics_Regression.r2)
Output:
RMSE = 1.2202493008785438
R-sqr = 0.5759095465267167
However, it is still possible to increase the accuracy by performing Cross-Validation and hyperparameter tunning.
• Big Data Analytics with Java by Rajat Mehta.
• The home for Microsoft documentation and learning for developers and technology professionals. docs.microsoft.com.
• And Others.

Image:freepik