Introduction To Machine Learning using Python
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of Computer Programs that can change when exposed to new data. In this article, we’ll see basics of Machine Learning, and implementation of a simple machine-learning algorithm using python.
Machine learning is a method of teaching computers to learn from data, without being explicitly programmed. Python is a popular programming language for machine learning because it has a large number of powerful libraries and frameworks that make it easy to implement machine learning algorithms.
To get started with machine learning using Python, you will need to have a basic understanding of Python programming and some knowledge of mathematical concepts such as probability, statistics, and linear algebra.
There are several libraries and frameworks in Python that can be used for machine learning, including:
scikit-learn: This library provides a wide range of machine learning algorithms, including supervised and unsupervised learning, and it is built on top of other libraries such as NumPy and SciPy.
TensorFlow: This library is an open-source machine learning framework developed by Google, it is widely used for deep learning and other complex machine learning tasks.
Keras: A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
PyTorch: An open-source machine learning library for Python, based on Torch library. It provides a seamless integration of computation graph and PyTorch tensors.
Theano: A numerical computation library for Python that allows you to efficiently define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays.
Pandas: A library that provides fast and flexible data structures and data analysis tools for the Python programming language.
To start with machine learning, a good way i
Setting up the environment
Python community has developed many modules to help programmers implement machine learning. In this article, we will be using numpy, scipy and scikit-learn modules. We can install them using cmd command:
pip install numpy scipy scikit-learn
A better option would be downloading miniconda or anaconda packages for python, which come prebundled with these packages. Follow the instructions given here to use anaconda.
Machine Learning overview
Machine learning involves a computer to be trained using a given data set, and use this training to predict the properties of a given new data. For example, we can train a computer by feeding it 1000 images of cats and 1000 more images which are not of a cat, and tell each time to the computer whether a picture is cat or not. Then if we show the computer a new image, then from the above training, the computer should be able to tell whether this new image is a cat or not.
The process of training and prediction involves the use of specialized algorithms. We feed the training data to an algorithm, and the algorithm uses this training data to give predictions on a new test data. One such algorithm is K-Nearest-Neighbor classification (KNN classification). It takes a test data, and finds k nearest data values to this data from test data set. Then it selects the neighbor of maximum frequency and gives its properties as the prediction result. For example if the training set is:
Now we want to predict flower type for petal of size 2.5 cm. So if we decide no. of neighbors (K)=3, we see that the 3 nearest neighbors of 2.5 are 1, 2 and 3. Their frequencies are 2, 3 and 2 respectively. Therefore the neighbor of maximum frequency is 2 and flower type corresponding to it is b. So for a petal of size 2.5, the prediction will be flower type b.
Implementing KNN- classification algorithm using Python on IRIS dataset
Here is a python script which demonstrates knn classification algorithm. Here we use the famous iris flower dataset to train the computer, and then give a new value to the computer to make predictions about it. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features are measured from each sample: The length and Width of Sepals & Petals, in centimeters.
We train our program using this dataset, and then use this training to predict species of a iris flower with given measurements.
Note that this program might not run on Geeksforgeeks IDE, but it can run easily on your local python interpreter, provided, you have installed the required libraries.
Predicted target name:  Predicted feature name: ['setosa'] Test score: 0.97
Explanation of the program:
Training the Dataset
- The first line imports iris data set which is already predefined in sklearn module. Iris data set is basically a table which contains information about various varieties of iris flowers.
- We import kNeighborsClassifier algorithm and train_test_split class from sklearn and numpy module for use in this program.
- Then we encapsulate load_iris() method in iris_dataset variable. Further we divide the dataset into training data and test data using train_test_split method. The X prefix in variable denotes the feature values (eg. petal length etc) and y prefix denotes target values (eg. 0 for setosa, 1 for virginica and 2 for versicolor).
- This method divides dataset into training and test data randomly in ratio of 75:25. Then we encapsulate KNeighborsClassifier method in kn variable while keeping value of k=1. This method contains K Nearest Neighbor algorithm in it.
- In the next line, we fit our training data into this algorithm so that computer can get trained using this data. Now the training part is complete.
Testing the Dataset
- Now we have dimensions of a new flower in a numpy array called x_new and we want to predict the species of this flower. We do this using the predict method which takes this array as input and spits out predicted target value as output.
- So the predicted target value comes out to be 0 which stands for setosa. So this flower has good chances to be of setosa species.
- Finally we find the test score which is the ratio of no. of predictions found correct and total predictions made. We do this using the score method which basically compares the actual values of the test set with the predicted values.
here are several benefits to using Python for machine learning:
- Wide range of libraries and frameworks: Python has a large number of libraries and frameworks available for machine learning, such as scikit-learn, TensorFlow, and Keras, which make it easy to implement machine learning algorithms and speed up the development process.
- Easy to learn and use: Python has a simple and easy-to-learn syntax, which makes it accessible to a wide range of users, including those without a background in computer science or programming.
- Large community and resources: Python has a large and active community, which means there are many resources available for learning and troubleshooting, including tutorials, forums, and documentation.
- Versatility: Python can be used for a wide range of applications, including data analysis, web development, and scientific computing, making it a versatile tool for machine learning projects.
- High-performance computing: Python is widely used in high-performance computing and data science, which makes it suitable for handling large datasets and complex machine learning tasks.
Thus, we saw how machine learning works and developed a basic program to implement it using scikit-learn module in python.
This article is contributed by tkkhhaarree. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.