Module 4: How can programming be used to build intelligence in computers?

Module 4 – High School

This covers the ways different types of programming can build intelligence in computers, including types of mathematical models used in machine learning. In this module, you will use a shell code notebook to run machine learning pipelines for the Mars Rover.

This module will cover the basics of classical machine learning and deep learning in the context of programming intelligent systems in Python. You will learn about the necessary concepts to understand the basics of machine learning, as well as a host of preprocessing steps, data collection steps and frameworks that will aid you in training and building intelligent models whose applications are extremely varied. Finally and most importantly, you will begin to think about how to plan to use machine learning models in the context of rovers and understand how these algorithms can be applied to a wide variety of tasks that will help rovers traverse through Lunar or Martian terrain.

Do keep in mind, however, that machine learning and its associated algorithms are much more complex than presented here in this module. At the same time, however, you will only need to first understand certain basic concepts before you can freely import an algorithm and use a dataset on which it will be able to train.

Part 1 – What is Machine Learning?

Part 2 – Standard Machine Learning Pipeline: Linear Regression

Part 3 – Deep Learning

Part 1 –What is Machine Learning?

Intelligence and Computers

What is Machine Learning?

Machine learning is a field of study whose definition is still widely disputed. At its core, however, it is a term that denotes the creation of algorithms such that they are given the ability to learn patterns without being explicitly programmed. It relies heavily on mathematical optimization, namely statistics and linear algebra.

To understand machine learning better, suppose that there is a task X that needs to be performed which has a performance measure of Y and an experience E. An algorithm in machine learning is said to learn from the experience E with respect to the task X if its performance on X, as measured by Y, improves with experience E.

Machine learning should be implemented under the following conditions:

Condition 1. The problem is appropriate for prediction; that is, there is the possibility of having meaningful data from which patterns can be derived.

Condition 2. You have sufficient data.

Condition 3. There is no simpler way of solving your problem.

Remember: When trying to solve a task, it is always preferred to start with a basic solution. Do not try to solve a problem using the most complicated method if it can be solved with a simpler procedure.

Supervised vs. Unsupervised Learning

Machine learning problems can be split into two different types. Supervised learning involves the model (algorithm) being given the “right” answers known as labels, classes, variables or dependent variables.

In a dataset, the labels (commonly associated with the label Y) will be the data that will be used by the model to predict or classify values. In supervised learning, having the labels present and paired with the features means that the goal of the problem is to correctly predict or classify the values of these labels based on the features.

Note: The features of a dataset are also known as independent variables (commonly associated with the variable X). These are the input variables that are used to make predictions as they describe the measurable characteristics of the data. You will understand the differences between labels and features later during the data preprocessing and data split steps.

Unsupervised learning algorithms do not have any labels from which to learn. For example, instead of predicting values based on a vector of labels, the unsupervised learning algorithm will need to find patterns in the data via clustering, where similar data points are grouped together. Dimensionality reduction is often employed, such as principal component analysis (PCA).

Note: that there do exist other categories of learning algorithms outside of these two categories, primarily consisting of reinforcement learning algorithms and semi-supervised learning algorithms, with the latter being sometimes used for specific types of generative adversarial networks (GANs). You do not need to know any further details regarding these learning algorithms.

Image source: https://commons.wikimedia.org/wiki/File:Machin_learning.png

Machine Learning Algorithms

A machine learning algorithm, also known as an evaluator or model, is a set of mathematical and statistical techniques that are used to learn patterns from data and make predictions based on this learning procedure. There exists a wide assortment of algorithms to such extent that each algorithm could be its own course. Examples of machine learning algorithms include the following:

Linear regression.
Logistic regression.
Naive bayes.
Support vector machine.
Random forests.
K-means clustering.
Nearest neighbors.
Neural networks.

For the purposes of this module, we will focus on linear regression, logistic regression and neural networks.

Data Normalization and Standardization

Also known as feature scaling, data normalization and standardization are a crucial component of any machine learning pipeline. Normalization involves having all features be within the same value range, which is indispensable for ensuring that the model training does eventually converge. Failure to perform normalization means that there can be a huge discrepancy in feature values, which can make optimization algorithms such as gradient descent (more on this later) converge much more slowly or not at all. Specifically, normalization limits all values to an some value between 0 and 1, while standardization scales all data such that each feature will have a mean of 0 and a standard deviation of 1, which can be useful when you want the data to be centered around zero.

Image source: https://www.someka.net/blog/how-to-normalize-data-in-excel/

You will not need to manually normalize the features, as scikit-learn has the built-in MinMaxScaler class that will perform this process for you. Normalization is performed using the following formula:

norm = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)), where X.min(axis = 0) refers to the minimum value found within a specific feature and X.max(axis = 0) corresponds to the maximum value found for that feature.

Scikit-learn also has the built-in StandardScaler class that will perform standardization. We perform standardization using the following formula:

norm = (Xi - mean) / (standard deviation), where Xi denotes the value of feature i at some training example.

Ultimately, when referring to these procedures, normalization is mentioned significantly more often as it is useful in preserving the relationship between features while having them bound to a rigid range between 0 and 1.

Training vs. Validation vs. Test Set

All machine learning pipelines involve a data split. What is meant by a data split here? First, think about how a machine learning model would learn. Do you think that it would be wise to have the algorithm look at the entire data, train on it and then predict new values based on the data that it just saw?

We will not delve into the details of why the above procedure is completely wrong and should be avoided, but do know that all algorithms need to have some part of the data that they will not see until after training is completed.

To aid in your understanding before delving into the training, validation and test sets, imagine the following scenario:

You are studying for an exam and decided that the best approach is to simply memorize every single concept word by word. You are able to recite the entire material and are confident that your approach reflects a complete understanding of the material. Upon taking the exam, however, you notice that some questions approach certain concepts in a matter that is different than what you memorized. You will likely be confused and unable to properly answer questions that cover these concepts in a way that you have never seen before.

The above scenarion exemplifies the necessity of splitting your data. Take note of the following comments:

The training set refers to the part of the dataset on which the model will train. This is the data that the model will see during training.
The validation set refers to a part of the dataset that is not seen by the model during the training. Rather, the goal is to test the model on this data to see how well it performs with unseen data.
The test set functions similarly to the validation set, but it is crucially a part of the dataset that is never actually used during the development of the algorithm but only after it has been optimized to the validation set. The goal here is to have a test set that you did not examine at all during the development of your model, as the algorithm will have already been optimized to the validation set. Please note that we will not be dealing with the test set in this module. As such, you should only focus on the training and validation sets.

NOTE: The test set is an absolutely crucial component of the machine learning pipeline. Later modules will teach you everything you need to know about handling test sets and using them to evaluate the generalization capabilities of your models on purely unseen data.

Image source: https://www.someka.net/blog/how-to-normalize-data-in-excel/

Linear Regression

Linear regression is often one of the first algorithms to which beginners are exposed. It is a supervised learning algorithm that involves the prediction of continuous (non-discrete) valued outputs; that is, the model will attempt to create a line of best fit that will fit the actual values (labels) as closely as possible.

All learning algorithms, including linear regression, use a hypothesis function to make predictions. The values of these predictions then form part of the cost function (more on this concept later). The algorithm will use parameters (learnable weights) to make these predictions. The goal is to learn these parameters such that the predictions match the label values as closely as possible.

Note that “continuous valued outputs” here refers to the idea that the values that are predicted are not bound to a specific range of values as you will see with logistic regression.

The details of how linear regression employs its hypothesis and cost functions to learn parameters via an optimization algorithm known as gradient descent (or some other type of optimization algorithm) is not something that you need to know for the purposes of this module. Do know, however, that the appropriate selection of optimization algorithms will be an important step when you begin building and training your models. Imagine these as a form of aid for a blindfolded man that is stranded at the top of a mountain and seeks to descend down. Logically, the man would seek to go down the path of greatest descent (i.e., the path where the gradient or simply the derivative of that path is decreasing by the largest amount).

Linear Regression Hypothesis Function:

Image source: https://www.humanunsupervised.com/post/regression-univariate-cost-function-hypothesis-gradient-descent

Image source: https://commons.wikimedia.org/wiki/File:Linear_regression_of_airfare_on_distance_to_destination.png

Logistic Regression

Logistic regression is a supervised learning algorithm that outputs discrete values. This algorithm is used for classification problems. For example, if you want to predict whether a house is safe or unsafe based on a matrix of features, then you can have two labels. The label ‘0’ will be the “negative class” and denote “safe.” The label ‘1’ will be the “positive class” and denote “unsafe.” These labels are called classes in a classification problem. Note that in a regression problem, there is no such concept of classes because the predictions are not discrete. Furthermore, a classification problem can have more than two classes, but multiclass classification is not something that you need to know at this moment.

For the example above, if the algorithm predicts “0.7”, then it means that that specific example has been predicted to have a 70% chance of being an unsafe house.

Image source: https://blog.gopenai.com/linear-and-logistic-regression-same-regression-but-different-purpose-f6ff5f93b7ef

Loss and Loss (Cost) Functions

At its core, loss refers to how far apart the predictions and actual values are from each other. It is a measure that varies widely between learning algorithms as each algorithm has a different hypothesis function and approach towards predicting values that may be continuous or discrete.

Cost functions (also known as loss functions) denote the learning process of a model, as with each iteration of learning, the algorithm will seek to choose parameter values such that the cost function is minimized. The learning process is achieved through a series of mathematical procedures that involve taking the derivative of the cost function multiplied by its learning rate such that a minimum value is reached. Note that the learning rate controls how big of a step we take during this process in changing the values of the parameters.

The details of how these cost functions work are much more intricate than described above and are beyond the scope of this module.

Remember: The most important goal here is that the loss must be MINIMIZED.

Image source: https://analyticsindiamag.com/how-is-gradient-descent-used-in-unsupervised-learning-problems/

Overfitting and Underfitting

While you will not need to necessarily worry about these two concepts in this module, it is important that you are aware that machine learning performance metrics are extremely intricate and complex. A low loss does not necessarily mean that the model is adept at generalizing the data and making adequate predictions.

Overfitting occurs when the model performs better on the training set compared to the unseen (validation set) data. A model that is overfitting is said to have high variance and low bias.

Underfitting occurs when the model has neither good training nor test set performance. An underfitting model is said to have high bias and low variance.

Overfitting and underfitting are endemic problems in machine learning algorithms that will always need some form of regularization techniques or modifications to the architecture or dataset. Further details, however, are not necessary for this module.

Image source: https://commons.wikimedia.org/wiki/File:Early-Stopping_Graph.png

Part 2 –

Part 2 – Standard Machine Learning Pipeline: Linear Regression

Standard Machine Learning Pipeline: Linear Regression

In the next several sections, you will be exposed to a full machine learning pipeline for linear regression, beginning with the dataset importation and ending with an interpretation of the loss values.

Use this workbook and data sets to load into your Jupyter notebook. This is a drop box folder and contains a download for the jupyter notebook with the source code and two data sets you will need for this exercise – cars.csv and heart. csv. Make sure you download all 3 files.

Step 1: Importing a Dataset

We will begin by importing our dataset and examining it as a DataFrame using the pandas library.

import pandas as pdimport sklearn

df = pd.read_csv(‘cars.csv’)df

https://www.eng.ufl.edu/secme/wp-content/uploads/sites/165/2024/03/Snip1.png

	Make	Colour	Odometer (KM)	Doors	Price
0	Honda	White	35431	4	15323
1	BMW	Blue	192714	5	19943
2	Honda	White	84714	4	28343
3	Toyota	White	154365	4	13434
4	Nissan	Blue	181577	3	14043
…	…	…	…	…	…
995	Toyota	Black	35820	4	32042
996	Nissan	White	155144	3	5716
997	Nissan	Blue	66604	4	31570
998	Honda	White	215883	4	4001
999	Toyota	Blue	248360	4	12732

1000 rows × 5 columns

Step 2: Data Preprocessing

For the data preprocessing stage, you will need to first check that there are no NaN values (missing data) in this dataset. It is incredibly important that you perform this step to avoid later issues with training.df.isna().sum() # Any NaN values?Make 0 Colour 0 Odometer (KM) 0 Doors 0 Price 0 dtype: int64df.dtypesMake object Colour object Odometer (KM) int64 Doors int64 Price int64 dtype: object

Since there are no NaN values, we can proceed with encoding non-numerical values. Always remember that all data must be numerical so that it can be properly fed into the learning algorithm. Non-numerical data will NOT be accepted by the algorithms. We can one-hot encode the non-numerical columns in this dataset. df_encoded = pd.get_dummies(df, columns = [‘Make’, ‘Colour’]) # One-hot encoding for the two non-numerical columns.df_encoded

https://www.eng.ufl.edu/secme/wp-content/uploads/sites/165/2024/03/Snip2.png

	Odometer (KM)	Doors	Price	Make_BMW	Make_Honda	Make_Nissan	Make_Toyota	Colour_Black	Colour_Blue	Colour_Green	Colour_Red	Colour_White
0	35431	4	15323	0	1	0	0	0	0	0	0	1
1	192714	5	19943	1	0	0	0	0	1	0	0	0
2	84714	4	28343	0	1	0	0	0	0	0	0	1
3	154365	4	13434	0	0	0	1	0	0	0	0	1
4	181577	3	14043	0	0	1	0	0	1	0	0	0
…	…	…	…	…	…	…	…	…	…	…	…	…
995	35820	4	32042	0	0	0	1	1	0	0	0	0
996	155144	3	5716	0	0	1	0	0	0	0	0	1
997	66604	4	31570	0	0	1	0	0	1	0	0	0
998	215883	4	4001	0	1	0	0	0	0	0	0	1
999	248360	4	12732	0	0	0	1	0	1	0	0	0

1000 rows × 12 columns

Now, we can proceed with normalizing the data. Use the MinMaxScaler class to do so. Examine the cell below to see how this class is imported and initialized.

from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

Before proceeding with standardization, we must create the features and labels matrices. These two must be separate as the model will be learning from the features and seeking to predict the labels. Note that in our scenario, we want to predict the price of a car based on its features. As such, the ‘Price’ column is the label column.

X = df_encoded.drop(columns = ‘Price’) # Feature matrix.

y = df_encoded[‘Price’] # Label matrix. X

https://www.eng.ufl.edu/secme/wp-content/uploads/sites/165/2024/03/Snip3.png

	Odometer (KM)	Doors	Make_BMW	Make_Honda	Make_Nissan	Make_Toyota	Colour_Black	Colour_Blue	Colour_Green	Colour_Red	Colour_White
0	35431	4	0	1	0	0	0	0	0	0	1
1	192714	5	1	0	0	0	0	1	0	0	0
2	84714	4	0	1	0	0	0	0	0	0	1
3	154365	4	0	0	0	1	0	0	0	0	1
4	181577	3	0	0	1	0	0	1	0	0	0
…	…	…	…	…	…	…	…	…	…	…	…
995	35820	4	0	0	0	1	1	0	0	0	0
996	155144	3	0	0	1	0	0	0	0	0	1
997	66604	4	0	0	1	0	0	1	0	0	0
998	215883	4	0	1	0	0	0	0	0	0	1
999	248360	4	0	0	0	1	0	1	0	0	0

1000 rows × 11 columns

0 15323 1 19943 2 28343 3 13434 4 14043 … 995 32042 996 5716 997 31570 998 4001 999 12732

Name: Price, Length: 1000, dtype: int64

https://www.eng.ufl.edu/secme/wp-content/uploads/sites/165/2024/03/Snip5.png

Now, normalize the features matrix by using the MinMaxScaler class that was previously initialized. Use the .fit_transform() method, which will compute the mean and standard deviation and apply the standardization to the features all at once.

X_normalized = scaler.fit_transform(X)

Step 3: Data Split

You are now ready to split the data into the training and test sets. Use scikit-learn’s train_test_split() method to perform this data split. You will need to input the feature matrix, label matrix and specify the test_size parameter, which denotes the proportion of the data that will go to the training and test sets. For example, a value of 0.2 means that 80% of the data will be allocated to the training set and 20% will be allocated to the validation set.

Note that X_train and y_train denote the features and the labels for the training set, respectively. These are the features and labels that the model will see during training and from which it will learn. X_test and y_test denote the unseen (validation set) features and labels that will be used to test the model on unseen data.

Specify the seed to have the same results each time using random_state = 42.

IMPORTANT: The train_test_split() function returns the split data in the order seen below. Ensure that you declare the correct variables in the correct order!from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X_normalized, y, test_size = 0.2, random_state = 42)
IMPORTANT: The train_test_split() function returns the split data in the order seen below. Ensure that you declare the correct variables in the correct order!

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_normalized, y, test_size = 0.2, random_state = 42)

https://www.eng.ufl.edu/secme/wp-content/uploads/sites/165/2024/03/snip6.png

Step 4: Algorithm Training

Now, fit (train) the model on the training set. Scikit-learn will automate the learning process for you.

You will need to initialize the linear regression model by first importing it and declaring it as a variable.

X_train.shape(800, 11)

y_train.shape(800,)

from sklearn.linear_model import LinearRegression

from sklearn.ensemble import RandomForestRegressor

model = LinearRegression()

model.fit(X_train, y_train)LinearRegression()

https://www.eng.ufl.edu/secme/wp-content/uploads/sites/165/2024/03/snip7.png

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Step 5: Interpreting Results

Congratulations! You have likely trained your very first machine learning model. Now, it is time to test your model’s performance by using it with the validation set, which once again represents unseen data.

This section requires an important note regarding loss functions. The most common cost (loss) function used in regression problems is MSE or mean squared error. It is the normalized sum of the squared differences between the predicted and actual values. You can import the MSE function using scikit-learn. For this problem, however, the great differences in prices may mean that other metrics such as R-squared may be more appropriate. As such, we will import the r2_score function.

This loss function will denote how well your model performed, as the predicted values will be compared against the target (label) values. You will need to first run the model on the validation set using .predict(), which takes in X_test . Store these predictions in the y_pred variable. Afterwards, use the mean_squared_error() function to compare the y_pred predictions to the actual values in y_test.

from sklearn.metrics import mean_squared_error

y_pred = model.predict(X_test)

from sklearn.metrics import r2_score

# Calculate evaluation metrics

r2 = r2_score(y_test, y_pred)

print(“R-squared:”, r2)

R-squared: 0.4280780088798347

https://www.eng.ufl.edu/secme/wp-content/uploads/sites/165/2024/03/Snip8-step-5.png

Note that the above results are rather poor. The R-squared value above says that that approximately 43.0% of the variance in the target variable is explained by the independent variables in the regression model. You should not be alarmed, but do know that simple models such as linear regression are often not appropriate with small datasets and may often perform poorly. It is always advised that you try out different models to see which ones are the best fit for your dataset.

Furthermore, model optimization and performance is also tied to the size of a dataset and the overall relationship between the features and the target variable. If there is insufficient data, models such as linear regression will often perform poorly such as in this case.

Question: Do you think a simple model such as linear or logistic regression could be used in the context of a rover on Mars or the Moon? Imagine that you want an algorithm to predict whether the rover should move in a particular direction or not based on a collection of features that describe the landscape (e.g., elevation, slope, obstacle density). Would these models be optimal for this task?

Part 3: Deep Learning

An indispensable subset of machine learning is deep learning, which involves a host of new algorithms and mathematical functions yet retains the core concepts of classical machine learning such as cost functions, training and data preprocessing.

Up until now, you have been thinking about machine learning primarily in the context of feature selection and extraction, whereby you as the engineer carefully select the most relevant and important features on which to train a particular model for the task at hand.

The above concept continues to apply when considering artificial neural networks (ANNs) or multi-layer perceptrons (MLPs) as described below. At the same time, however, we ultimately will want to make use of convolutional neural networks (CNNs), which will automatically extract the best features from input regions.

We will not be delving into the mathematical details of how these architectures function and train, but you will need to have a basic understanding of their structure and abilities. Your task will be primarily concerned with the use of a deep neural network for training on tasks that could be useful for a rover. This process, however, will be covered in a later module.

Neural Networks

Although not entirely accurate, you may think of neural networks as something akin to the neurons found in our brain. When activated, these neurons “fire up” and pass on their information to the neurons in the next layer.

Put simply, a neural network is a mathematical function whose ultimate goal is to take in input and pass these results to the next layer of neurons until the output layer provides the corresponding prediction. To update their parameters, which are now called weights, neural networks make use of backpropagation, which is a mathematical process primarily involving the use of the chain rule of calculus to adjust the internal weights of the network by propagating the gradients backward. This way, the neural network can adjust its weights such that the difference between the targets and predictions is minimized.

As pictured below, neural networks consist of a series of layers. The first layer corresponds to the input layer; these are considered to be the features of the model. Any layer between the input and output layers is called a hidden layer. As implied, the last layer is the output layer and where a unit or multiple units will make a prediction.

A very basic neural network, with two units in the input layer, five units in the hidden layer and one output unit. Based on the given context, what do you think can be modified here to make this architecture more “complex” and adept at helping a rover carry out its tasks?

Deep Neural Networks

Deep neural networks (DNNs) are not considered to be separate from “neural networks.” In fact, this term simply refers to neural networks with multiple hidden layers. With multiple layers, these networks will have many more weights with which to work and can therefore capture patterns more effectively than a simple neural network with a single hidden layer. Up until now, we have been discussing fully-connected neural networks, which means that every single neuron (unit) in a given layer has a connection (weight) with every single neuron in the subsequent layer. You can imagine that having a tremendous amount of fully-connected layers would lead to an explosive increase in the number of parameters (weights) in the network. As you have previously learned, increased model complexity can lead to overfitting.

As such, we will focus on a particular type of deep neural network that makes use of sparse connections and a unique way of capturing spatial features from pixel data and other structures. Such neural networks are called convolutional neural networks (CNNs). These will form the backbone of activities with which you will be working in future modules, as well as the final model that you will be developing and training for a rover. While the mathematical details of CNNs are beyond the scope of this activity, do know that CNNs make use of filters to detect edges in a particular region. This ability makes them formidable architectures that can be applied to a wide range of tasks, including but not limited to the following:

Image classification, whereby an image is assigned a particular label.
Object detection, whereby multiple objects in a given input image can be detected and often highlighted with bounding boxes.
Object localization, whereby a single object in a given input image is detected and highlighted with a bounding box.
Facial recognition.
Gesture recognition.
Audio classification.
Medical imaging.

Question: Based on the above applications of CNNs, which of the tasks could be most useful for a rover? What would a CNN be doing in a particular scenario where a rover is traversing through difficult terrain?

Take a look at the above image and notice the bounding boxes. Why could deep learning be useful from this perspective? What do you think are some of the biggest challenges that would arise when training a deep learning model for such a task?

Credits: NASA/JPL-Caltech

TensorFlow

TensorFlow is an open-source machine learning framework that was developed by Google. With this framework, you have access to an assortment of tools that facilitate the machine learning pipeline. Most importantly, TensorFlow supports both classical machine learning and deep learning models, which makes mastering the basics of this framework of paramount importance to be able to apply your ideas and deploy them. Much of the machine learning process and mathematical calculations are performed by the framework, which means that you could build and train a model rather quickly depending on the dataset and task at hand.

You will be exposed to the basics of this framework in this module. It is important that you understand how to properly make use of TensorFlow for building a basic neural network, for this intuition will be indispensable when you build a more versatile model and handle a custom dataset in the next module.

TensorFlow Pipeline

The information below provides a general description of the steps that are necessary to successfully build, train and evaluate your first neural network using TensorFlow. Do keep in mind that the procedure is far more nuanced than at first glance. You will be exposed to the intricacies of this library in the context of model training in a later module.

Data Preprocessing

Similarly to the classical machine learning approach, you must first load and preprocess the corresponding dataset. Remember to consider procedures such as normalization as well as the proper data splits into a training, validation and test set.

Model Initialization

In this stage, you will usually make use of Keras, which is a high-level TensorFlow API with which you can construct your neural network. You will need to choose the number of layers, whether the layer is a dense (fully-connected) or convolutional layer, activation functions and general hyperparameters. Details on these layers are to be covered in the next module.

When constructing your neural network, remember that the concepts of overfitting and underfitting still apply.

Model Compilation

In this stage, you must initialize the cost function, optimizer and metrics. You will use model.compile() to subsequently compile your model.

Training

Using the .fit() function, you will train the model on the training data with a specified number of epochs and the batch size.

An epoch is an iteration over the entire training set, while the batch size corresponds to the amount of samples per gradient update.

TensorFlow will handle the backpropagation process automatically for you. You will not need to worry about manually computing and updating gradients.

Evaluation

Using the .evaluate() function, you will need to evaluate your trained model on the validation set. You previously specified evaluation metrics. It is your responsibility to assess the model’s generalization capabilities.

Predictions and Deployment

For the last general stage, you will want to make use of a test set to evaluate the model’s generalization on purely unseen data. Although not currently applicable to this module, the model will need to eventually be deployed to its corresponding environment.

Machine Learning and Rovers

After learning about the great variety of machine learning and deep learning paradigms, it is now time to think of these in the context of rovers.

Suppose that a rover is seeking to traverse through the rocky terrain of Mars. As you know, the red planet has a rather rugged landscape, with certain points of elevation accompanied by plains. Autonomous navigation capabilities therefore become extremely important to ensure that the rover can move through an optimal path that will not provide physical challenges and lead to potential damage or extreme inefficiency in movement.

Thankfully, machine learning can certainly be applied to a wide range of scenarios on the red planet. You may be wondering how machine learning algorithms could play a role in aiding rovers when they are quite far away from us. The answer lies in gathering sufficient quality, relevant data for the task at hand. With sufficient data and an appropriate algorithm with a clearly defined task and metrics, you can certainly deploy a model with tremendous capabilities for generalization. For example, imagine a rover traversing the rugged terrain of Mars, navigating through rocky landscapes and sandy plains. This autonomous navigation is crucial for ensuring the rover’s safety and efficiency in its exploration mission. Here, machine learning algorithms come into play, thereby enabling the rover to consider its surroundings such that it may effectively optimize its path and avoid major obstacles. Using sufficient and relevant data, the rover would be able to adapt to more scenarios than a rules-based algorithm that could only provide a narrow quantity of scenarios.

You may notice that deep learning is not necessarily the exclusive option for the above scenario. In fact, you could employ classical machine learning algorithms as previously described by selecting features by yourself. At the same time, however, it is important to take into account that the rover is also supported by components such as cameras and sensors, which provide different modalities that cannot be effectively tackled through classical machine learning models. Consequently, deep learning models will be indispensable for automatically selecting the best features from input images, video and audio such that both high-level and low-level features can lead to the optimization of weights. In the end, you will be able to experiment with and ultimately train a full deep learning model that could help the rover analyze images, identify geographical obstacles and classify terrain from data gathered by its sensors.

Below are activities that will aid you in mastering the basics of the machine learning pipeline with Python. As you go through these activities and considering the context given above, think about what potential challenges rovers could encounter while navigating through terrain. Can you think of specific tasks that would be feasible given sufficient data and an appropriate model? Consider whether some of these tasks would be regression or classification problems (or perhaps neither).

Activity 1:Logistic Regression

Using the notebook provided in the dropbox link. Copy and paste the cells into a new notebook. Name it by your team name _Activity 1_LR

Now that you have successfully examined an entire machine learning pipeline, it is your turn to apply these concepts to a basic model before moving on to the deep learning pipeline. For scikit-learn, most of the steps that were shown above are standard procedure for any machine learning algorithm. As such, your task is to now train a logistic regression classifier following the above steps. Make sure that you refer to the preliminary information regarding logistic regression before splitting the dataset into the features and labels matrices. You are highly encouraged to add additional cells to divide the code such that it is cleaner and readable.

Keep a note of the following things:

The cost function for logistic regression is the logarithmic loss function.
For the predicted values during testing, use y_pred_proba = clf.predict_proba(X_test), as logistic regression requires that you predict the probabilities for each class. (In this case, we have two classes: 0 and 1).
For the logarithmic loss, use logloss = log_loss(y_test, y_pred_proba). Import the log_loss first using from sklearn.metrics import log_loss.

1.Importing the Dataset

Import the heart.csv dataset in the cell below. Add additional cells for yourself to examine the contents of the dataset and determine which column would correspond to the labels.

2. Data Preprocessing

Check for any NaN values in the dataset. If there are none, then you can simply proceed by initializing the StandardScaler class and creating the features and labels matrices. Standardize and fit the features matrix using the StandardScaler class or normalize them using MinMaxScaler to see the difference between the two. Make sure that any columns that are non-numerical are one-hot encoded before proceeding as well.

3. Data Split, Training and Testing

Split the data and train the model. Then, test the model on the validation set.

Activity 2:

Activity: Training a Neural Network to Recognize Handwritten Digits

In this activity, you will use TensorFlow to train your very own neural network for a multiclass classification task where the model will be tasked to recognize a numerical digit based on an image input. The purpose of this activity is to introduce you to an extremely basic scenario of image recognition that can serve as a building block for understanding how deep learning techniques could aid a rover in recognizing its own surroundings through image input. In computer vision, image recognition tasks are all but the only primary task. As you complete this activity, consider how the neural network that you trained could become more complex (i.e., have more layers and consequently more parameters) such that images of the surrounding Martian or Lunar landscape could be “recognized” based on the features of that landscape.

Here is a video to help you understand what to do:

https://www.youtube.com/watch?v=tfMtrlKjMfE

Use the below link to access the TensorFlow documentation. When working with libraries, it is important to always have the documentation as a reference guide. Use the documentation as a reference for anything that may seem unclear to you.

https://www.tensorflow.org/tutorials/quickstart/beginner

As the first step, import the necessary libraries and other dependencies below.

Note: Use “from tensorflow.keras.datasets import mnist” to import the MNIST dataset, which is the dataset of handwritten digit images with which you will be working.

Load the MNIST dataset (including the validation set) and examine the shape of the training examples. What are the dimensions of the images?

Normalize the pixel values of the images. Refer back to previous sections if you need to remember why this step is indispensable.

Initialize the model using TensorFlow with one hidden layer that uses ReLU as the activation function and an output layer that uses softmax. Remember that softmax is needed when performing multiclass classification; that is, we need such an activation function for scenarios where we have more than two labels present.

Compile the model; that is, define a proper cost function (sometimes known as the criterion), optimization algorithm (also known as the optimizer) and the appropriate metrics to measure training and validation performance.

Hint: Think about what you learned regarding regression and classification tasks. What metrics would be appropriate for a task where you are predicting discrete values rather than continous values?

Train and evaluate the model. What performance do you observe on the training set versus the validation set? Is the model overfitting, underfitting or exhibiting neither of these two behaviors? If overfitting or underfitting, what are some possible reasons in the context of the neural network that you built and trained?

Upload your jupyter notebooks to the qualtrics link below

https://ufl.qualtrics.com/jfe/form/SV_00LiwF9NZVVQ9