Machine learning is about extracting knowledge from data. It is basically a research field at the intersection of statistics, Artificial Intelligence, and Computer Science and is sometimes referred to as Predictive analytics or Statistical Learning. A Machine Learning system is trained rather than being explicitly programmed, which means it evolves over time. Machine Learning is a subset of AI hence the hype on the topic. ML’s key idea is to create models that can learn from and make future predictions on data.
Machine Learning Paradigm: Data + Answers = Rules
Models are mathematical representations for a real-world process aiming to extract knowledge from data to solve a specified problem.
Model = Algorithm(Data)
It should be noted that data is the key ‘ingredient’ in Machine Learning. Data helps in training the model. Kaggle and Zindi are popular data repositories.
Types of Data
Data can be categorized into different types to serve different purposes:
- Numerical Data: Numbers represent this data. It is data that is measurable, like weight, height, etc.
- Time Series Data: Numerical data points are ordered across points in time.
- Categorical Data: Data is subdivided into limited groups like race, sex, country. Categorical variables take on a limited number of values assigning each individual to a particular group based on the qualitative property.
- Text Data: Consists of non-numerical data, which can be words, sentences, or paragraphs.
- Image Data: Consists of images that are either grayscale or color images.
The data in Machine learning eventually ends up being numerical regardless of whether it is numerical in its original form for easy processing by Machine Learning algorithms.
Seven Steps of Machine Learning
- Gathering Data – This is an important step in the Machine Learning life-cycle. However, a Machine Learning practitioner should ensure that the data collected should address the problem at hand. It is important to note that there are different types of data.
- Preparing and visualizing the Data – This stage involves cleaning the data, removing anomalies, standardizing values and outliers, and visualizing the data to get insights. It also involves understanding the data as to whether it’s categorical, numerical, or text. A common library used for this is Pandas.
- Choosing a model- The selection of a model is important because machine learning algorithms work best depending on the data type. For example, the Naive Bayes algorithm works best on textual data.
- Training- This is the “learning” stage. The model tries to map the inputs to output to make accurate predictions.
- Evaluation- Evaluation allows Machine Learning practitioners to check whether the expectations have been reached or not. If not satisfied, the previous steps are revisited.
- Hyparameter tuning – After evaluating the model’s performance on new data, it is possible to increase the accuracy rate hence the need to tune parameters.
- Prediction – This is the final stage where the final results are analyzed as per their specific needs. Amazingly, Machine Learning can solve everything literally!
Types of Machine Learning
Machine learning is categorized into 3 distinct categories:
- Supervised Learning -These algorithms are used whenever you want to predict a certain outcome from a given input data, and you also have examples of input/output pairs. The goal of supervised learning is to approximate the mapping function from these input/output examples. When we feed in new, never-before-seen data, the algorithm can make accurate predictions.⠀
- Unsupervised Learning – This includes all kinds of Machine Learning problems where there is no output; The learning algorithm is just shown the input data and extract knowledge from this data. The main types of unsupervised Learning are Unsupervised transformations(PCA) and Clustering algorithms(K-Means)
- Reinforcement Learning – A mathematical process to model decision-making in situations where the results are partially random and under the influence of a decision. It is an active process where the agent’s actions influence the data observed in the future, that influence future states.
Supervised and unsupervised learning are passive processes where learning is performed without any actions that could influence the data.
Why is Machine Learning Used?
Data is being used in different areas to make ‘smart’ decisions. With machine learning, it is easy to build models from the vast amount of data that can provide more accurate results. It is also possible to work with more complex data and deliver faster solutions. By building precise models, an organization can identify profitable opportunities – or avoiding unknown risks.
In general, there are three forces are driving advances in Machine Learning:
- Hardware – Offshelf CPUs have become faster.
- Datasets – Data is readily available from sites like Kaggle and Zindi. Data Scraping is also a technique used to get data.
- Algorithmic advances – Algorithms are being improved for better gradient propagation.
Applications Of Machine Learning
- Image Recognition – It is used to identify objects, persons, places, digital images, etc
- Speech Recognition -The process of converting voice to text.
- Self-Driving Cars
- Medical diagnosis – Machine learning is used for medical diagnosis; It is also used in some systems to identify tumors.
- Automatic Language Translation – It is possible to translate one language to another in the form of text or audio.
That’s all about in this session of introduction to Machine Learning and why is it so important these days. It is a new wave of investment that will greatly improve in the coming years! Stay tuned to Code Underscored to learn more about Machine Learning.