Course - Feature Engineering
Introduction
Want to know how you can improve the accuracy of your ML models? What about how to find which data columns make the most useful features? Welcome to Feature Engineering where we will discuss good vs bad features and how you can preprocess and transform them for optimal use in your models.
- Introduction to Feature Engineering
- Intro to Qwiklabs
Raw Data to Features
Feature engineering is often the longest and most difficult phase of building your ML project. In the feature engineering process, you start with your raw data and use your own domain knowledge to create features that will make your machine learning algorithms work. In this module we explore what makes a good feature and how to represent them in your ML model.
- Raw Data to Features
- Good vs Bad Features
- Quiz: Features are Related to the Objective
- Features Known at Prediction-time
- Quiz: Features are Knowable at Prediction Time
- Features should be Numeric
- Quiz: Features Should be Numeric
- Features Should Have Enough Examples
- Quiz: Features Should Have Enough Examples (p1)
- Quiz: Features Should Have Enough Examples (p2)
- Bringing Human Insight
- Discussion Prompt: Choosing Relevant Features
- Representing Features
- ML vs Statistics
- Lab Solution: Improve model accuracy with new features
Raw Data to Features
Qwiklabs -- Improve model accuracy with new features
Representing Features
Preprocessing and Feature Creation
This section of the module covers pre-processing and feature creation which are data processing techniques that can help you prepare a feature set for a machine learning system.
- Preprocessing and Feature Creation
- Beam and Dataflow
- Lab Intro: Simple Dataflow Pipeline
- Lab Solution: Simple Dataflow Pipeline
- Data Pipelines that Scale
- Lab Intro: MapReduce in Dataflow
- Lab Solution: MapReduce in Dataflow
- Preprocessing with Cloud Dataprep
- Lab Intro: Computing Time-Windowed Features in Cloud Dataprep
- Lab Solution: Computing Time-Windowed Features in Cloud Dataprep
- Discussion Prompt: Performing Exploratory Analysis
Preprocessing and Feature Creation
Simple Dataflow Pipeline
MapReduce in Dataflow
Apache Beam and Cloud Dataflow
Computing Time-Windowed Features in Cloud Dataprep
Preprocessing with Cloud Dataprep
Feature Crosses
In traditional machine learning, feature crosses don't play much of a role, but in modern day ML methods, feature crosses are an invaluable part of your toolkit.In this module, you will learn how to recognize the kinds of problems where feature crosses are a powerful way to help machines learn.
- Introducing Feature Crosses
- What is a Feature Cross?
- Discretization
- Memorization vs. Generalization
- Taxi colors
- Lab Intro: Feature Crosses to create a good classifier
- Lab Solution: Feature Crosses to create a good classifier
- Sparsity + Quiz
- Lab Intro: Too Much of a Good Thing
- Lab Solution: Too Much of a Good Thing
- Implementing Feature Crosses
- Embedding Feature Crosses
- Where to Do Feature Engineering
- Feature Creation in TensorFlow
- Feature Creation in DataFlow
- Lab Intro: Improve ML Model with Feature Engineering
- Lab Solution (p1): ML Fairness Debrief
- Lab Solution (p2): Improve ML Model with Feature Engineering
Feature crosses
Improve ML Model with Feature Engineering
TF Transform
TensorFlow Transform (tf.Transform) is a library for preprocessing data with TensorFlow. tf.Transform is useful for preprocessing that requires a full pass the data, such as: - normalizing an input value by mean and stdev - integerizing a vocabulary by looking at all input examples for values - bucketizing inputs based on the observed data distribution In this module we will explore use cases for tf.Transform.
- Introducing TensorFlow Transform
- TensorFlow Transform
- Analyze phase
- Transform phase
- Supporting serving
- Lab Intro: Exploring tf.transform
- Lab Solution: Exploring tf.transform
Exploring tf.transform
tf.transform
https://heartbeat.fritz.ai/a-practical-guide-to-feature-engineering-in-python-8326e40747c8