Monday 12 February 2024

Types of Machine Learning-Supervised and Unsupervised

There are three main types of machine learning, each with unique strengths and applications:

1. Supervised Learning: This type learns from labeled data, where each data point has a corresponding label indicating its category or value.

Case Study: Predicting credit card fraud. Historical transaction data is labeled as fraudulent or legitimate. The model learns to identify patterns in transactions that indicate fraud risk, helping banks prevent losses.

2. Unsupervised Learning: This type analyzes unlabeled data, seeking to uncover hidden patterns and structures.

Case Study: Customer segmentation. Unsupervised algorithms group customers based on their purchase history, demographics, or other factors, revealing distinct customer segments for targeted marketing campaigns.

3. Reinforcement Learning: This type learns through trial and error, interacting with an environment and receiving rewards for desired actions.

Case Study: Self-driving cars. Reinforcement learning agents train on simulated environments and real-world data, learning to navigate roads, avoid obstacles, and make safe decisions without explicit instructions.

Here are some additional examples:

  • Supervised Learning:
    • Image recognition (identifying objects in photos)
    • Spam filtering (classifying emails as spam or not)
    • Sentiment analysis (understanding the emotional tone of text)
  • Unsupervised Learning:
    • Anomaly detection (identifying unusual patterns in data)
    • Recommender systems (suggesting products or content users might like)
    • Market basket analysis (finding products frequently purchased together)
  • Reinforcement Learning:
    • Playing games (learning strategies to win against an opponent)
    • Robot control (learning to perform tasks like walking or manipulating objects)
    • Resource management (optimizing how to allocate resources in a complex system)

The modeling process - Machine Learning

 The modeling phase consists of four steps:

  1. Feature engineering and model selection
  2. Training the model
  3. Model validation and selection
  4. Applying the trained model to unseen data

1.Feature engineering and model selection

  • Engineering features are variables obtained from a data set.
  • In practice, features are often found independently, often scattered among different data sets.
  • Models often need to be transformed or combined to achieve predictions.
  • Interaction variables, such as vinegar and bleach, can have significant impacts when mixed.
  • Modeling techniques can be used to derive features, often in text mining.
  • One common mistake in model construction is availability bias, where features are only those easily available.
  • Models with availability bias often fail when validated, as they represent a one-sided truth.
  • An example of this is the case of plane fortification in WWII, where engineers ignored an important part of the data due to availability bias.
  • Once initial features are created, a model can be trained to the data.

 2.Training the model--Model Training and Validation Process

  • Using appropriate predictors and modeling techniques.
  • Presenting model data for learning.
  • Common modeling techniques available in almost every programming language, including Python.
  • Heavy mathematical calculations for advanced data science techniques.
  • Testing model's extrapolation to reality through model validation. 

3.Model validation and selection

  • Models in data science have predictive power and generalizability to unexplored data.
  • Error measures and validation strategies are crucial for model quality.
  • Classification error rate and mean squared error are common error measures in machine learning.
  • Classification error rate is the percentage of test data that the model mislabeled.
  • Mean squared error measures the average error of prediction.
  • Squaring the average error can cancel out wrong predictions in one direction.
  • Bigger errors gain more weight, while small errors remain small or shrink. 
  • Many validation strategies exist, including the following common ones:


■ Dividing your data into a training set with X% of the observations and keeping the rest as a holdout data set (a data set that’s never used for model creation)—This is the most common technique.


■ K-folds cross validation—This strategy divides the data set into k parts and uses each part one time as a test data set while using the others as a training data set. This has the advantage that you use all the data available in the data set.


■ Leave-1 out—This approach is the same as k-folds but with k=1. You always leave one observation out and train on the rest of the data. This is used only on small data sets, so it’s more valuable to people evaluating laboratory experiments than to big data analysts

 Machine Learning Regularization and Validation

  • Regularization in machine learning involves penalizing extra variables used in model construction.
  • L1 regularization aims for a model with as few predictors as possible for robustness.
  • L2 regularization aims to keep variance between predictor coefficients as small as possible to increase interpretability.
  • Regularization is used to prevent over-fitting by limiting the number of features used.
  • Validation is crucial as it determines the model's effectiveness in real-life conditions.
  • Models should be tested on data the model has never seen and should represent what it would encounter when applied to fresh observations.
  • Instruments like the confusion matrix are beneficial for classification models.
  • Once a model is constructed, it can be used to predict the future.

 4.Applying the trained model to unseen data

  • Successful implementation of the first three steps results in a model that generalizes to unseen data.
  • Model scoring is the process of applying the model to new data.
  • It involves preparing a new data set with features as defined by the model.
  • The model is then applied to this new data set, resulting in a prediction.

Where it (Machine Learning) is used in data science

 Although machine learning is mainly linked to the data-modeling step of the data science process, it can be used at almost every step. the data science process is shown below

The data modeling phase can’t start until you have qualitative raw data you can understand. But prior to that, the data preparation phase can benefit from the use of machine learning. An example would be cleansing a list of text strings; machine learning can group similar strings together so it becomes easier to correct spelling errors.

Machine learning is also useful when exploring data. Algorithms can root out underlying patterns in the data where they’d be difficult to find with only charts.

Given that machine learning is useful throughout the data science process, it shouldn’t come as a surprise that a considerable number of Python libraries were developed to make your life a bit easier.

Advertisement

Follow US

Join 12,000+ People Following

Notifications

More

Results

More

Java Tutorial

More

Digital Logic design Tutorial

More

syllabus

More

ANU Materials

More

Advertisement

Top