MM466 Project - Multi-mode Fault Diagnosis Datasets of Gearbox Under Variable Working Conditions
Group Members: Wilisoni Marayawa (S11196753), Naitinteari Tekamwi (S11126433)
Gearbox Fault Diagnosis Using Machine Learning: Project Blog Summary
Our machine learning journey for gearbox fault diagnosis began in Week 6, but not without hurdles. Initially, we worked with a different dataset, which presented significant compatibility and quality issues. From Weeks 6 to 10, a substantial portion of our time was spent trying to clean, understand, and process this first dataset. However, due to persistent challenges and feedback from our supervisor, we transitioned to a new dataset at the end of Week 10—the MCC5-THU gearbox fault diagnosis dataset.
Week 10 to 12: Familiarization Phase
Upon receiving the MCC5-THU dataset, we dedicated Weeks 10 to 12 to exploring and understanding its structure. This dataset comprised 240 .csv files, each recording 8 sensor channels (speed, torque, 3-axis motor vibration, and 3-axis gearbox vibration) under different fault conditions and loads. Given the large size and variability of the files (~768,000 rows each), initial attempts to load and merge all 240 files proved impractical within MATLAB due to memory constraints and execution time.
Decision Point:
We reduced our working set to 96 strategically selected files, covering various fault types and load conditions. This choice balanced dataset diversity and computational feasibility. The following files were chosen to balance fault type and fault severity and operating conditions:
teeth_crack_L_speed_circulation_10Nm-2000rpm.csv
teeth_crack_L_torque_circulation_1000rpm_10Nm.csv
teeth_crack_L_torque_circulation_1000rpm_20Nm.csv
teeth_crack_M_speed_circulation_10Nm-1000rpm.csv
teeth_crack_M_speed_circulation_10Nm-2000rpm.csv
teeth_crack_M_torque_circulation_1000rpm_10Nm.csv
teeth_crack_M_torque_circulation_1000rpm_20Nm.csv
gear_pitting_H_speed_circulation_10Nm-1000rpm.csv
gear_pitting_H_speed_circulation_10Nm-2000rpm.csv
gear_pitting_H_torque_circulation_1000rpm_10Nm.csv
gear_pitting_H_torque_circulation_1000rpm_20Nm.csv
gear_pitting_L_speed_circulation_10Nm-1000rpm.csv
gear_pitting_L_speed_circulation_10Nm-2000rpm.csv
gear_pitting_L_torque_circulation_1000rpm_10Nm.csv
gear_pitting_L_torque_circulation_1000rpm_20Nm.csv
gear_pitting_M_speed_circulation_10Nm-1000rpm.csv
gear_pitting_M_speed_circulation_10Nm-2000rpm.csv
gear_pitting_M_torque_circulation_1000rpm_10Nm.csv
gear_pitting_M_torque_circulation_1000rpm_20Nm.csv
gear_wear_H_speed_circulation_10Nm-1000rpm.csv
gear_wear_H_speed_circulation_10Nm-2000rpm.csv
gear_wear_H_torque_circulation_1000rpm_10Nm.csv
gear_wear_H_torque_circulation_1000rpm_20Nm.csv
gear_wear_L_speed_circulation_10Nm-1000rpm.csv
gear_wear_L_speed_circulation_10Nm-2000rpm.csv
gear_wear_L_torque_circulation_1000rpm_10Nm.csv
gear_wear_L_torque_circulation_1000rpm_20Nm.csv
gear_wear_M_speed_circulation_10Nm-1000rpm.csv
gear_wear_M_speed_circulation_10Nm-2000rpm.csv
gear_wear_M_torque_circulation_1000rpm_10Nm.csv
gear_wear_M_torque_circulation_1000rpm_20Nm.csv
health_speed_circulation_10Nm-1000rpm.csv
health_speed_circulation_10Nm-2000rpm.csv
health_speed_circulation_10Nm-3000rpm.csv
health_speed_circulation_20Nm-1000rpm.csv
health_speed_circulation_20Nm-2000rpm.csv
health_speed_circulation_20Nm-3000rpm.csv
health_torque_circulation_1000rpm_10Nm.csv
health_torque_circulation_1000rpm_20Nm.csv
health_torque_circulation_2000rpm_10Nm.csv
health_torque_circulation_2000rpm_20Nm.csv
health_torque_circulation_3000rpm_10Nm.csv
health_torque_circulation_3000rpm_20Nm.csv
miss_teeth_speed_circulation_10Nm-1000rpm.csv
miss_teeth_speed_circulation_10Nm-2000rpm.csv
miss_teeth_speed_circulation_10Nm-3000rpm.csv
miss_teeth_speed_circulation_20Nm-1000rpm.csv
miss_teeth_speed_circulation_20Nm-2000rpm.csv
miss_teeth_speed_circulation_20Nm-3000rpm.csv
miss_teeth_torque_circulation_1000rpm_10Nm.csv
miss_teeth_torque_circulation_1000rpm_20Nm.csv
miss_teeth_torque_circulation_2000rpm_10Nm.csv
miss_teeth_torque_circulation_2000rpm_20Nm.csv
miss_teeth_torque_circulation_3000rpm_10Nm.csv
miss_teeth_torque_circulation_3000rpm_20Nm.csv
teeth_break_and_bearing_inner_H_speed_circulation_10Nm-1000rpm.csv
teeth_break_and_bearing_inner_H_speed_circulation_10Nm-2000rpm.csv
teeth_break_and_bearing_inner_H_torque_circulation_1000rpm_10Nm.csv
teeth_break_and_bearing_inner_H_torque_circulation_1000rpm_20Nm.csv
teeth_break_and_bearing_inner_L_speed_circulation_10Nm-1000rpm.csv
teeth_break_and_bearing_inner_L_speed_circulation_10Nm-2000rpm.csv
teeth_break_and_bearing_inner_L_torque_circulation_1000rpm_10Nm.csv
teeth_break_and_bearing_inner_L_torque_circulation_1000rpm_20Nm.csv
teeth_break_and_bearing_inner_M_speed_circulation_10Nm-1000rpm.csv
teeth_break_and_bearing_inner_M_speed_circulation_10Nm-2000rpm.csv
teeth_break_and_bearing_inner_M_torque_circulation_1000rpm_10Nm.csv
teeth_break_and_bearing_inner_M_torque_circulation_1000rpm_20Nm.csv
teeth_break_and_bearing_outer_H_speed_circulation_10Nm-1000rpm.csv
teeth_break_and_bearing_outer_H_speed_circulation_10Nm-2000rpm.csv
teeth_break_and_bearing_outer_H_torque_circulation_1000rpm_10Nm.csv
teeth_break_and_bearing_outer_H_torque_circulation_1000rpm_20Nm.csv
teeth_break_and_bearing_outer_L_speed_circulation_10Nm-1000rpm.csv
teeth_break_and_bearing_outer_L_speed_circulation_10Nm-2000rpm.csv
teeth_break_and_bearing_outer_L_torque_circulation_1000rpm_10Nm.csv
teeth_break_and_bearing_outer_L_torque_circulation_1000rpm_20Nm.csv
teeth_break_and_bearing_outer_M_speed_circulation_10Nm-1000rpm.csv
teeth_break_and_bearing_outer_M_speed_circulation_10Nm-2000rpm.csv
teeth_break_and_bearing_outer_M_torque_circulation_1000rpm_10Nm.csv
teeth_break_and_bearing_outer_M_torque_circulation_1000rpm_20Nm.csv
teeth_break_H_speed_circulation_10Nm-1000rpm.csv
teeth_break_H_speed_circulation_10Nm-2000rpm.csv
teeth_break_H_torque_circulation_1000rpm_10Nm.csv
teeth_break_H_torque_circulation_1000rpm_20Nm.csv
teeth_break_L_speed_circulation_10Nm-1000rpm.csv
teeth_break_L_speed_circulation_10Nm-2000rpm.csv
teeth_break_L_torque_circulation_1000rpm_10Nm.csv
teeth_break_L_torque_circulation_1000rpm_20Nm.csv
teeth_break_M_speed_circulation_10Nm-1000rpm.csv
teeth_break_M_speed_circulation_10Nm-2000rpm.csv
teeth_break_M_torque_circulation_1000rpm_10Nm.csv
teeth_break_M_torque_circulation_1000rpm_20Nm.csv
teeth_crack_H_speed_circulation_10Nm-1000rpm.csv
teeth_crack_H_speed_circulation_10Nm-2000rpm.csv
teeth_crack_H_torque_circulation_1000rpm_10Nm.csv
teeth_crack_H_torque_circulation_1000rpm_20Nm.csv
teeth_crack_L_speed_circulation_10Nm-1000rpm.csv
Week 12 to 13: Loading and EDA
Our focus in Weeks 12 and 13 was to:
-
Load the selected files efficiently.
-
Develop a strategy to segment signals into 1-second windows with 50% overlap to increase the sample size while preserving fault-related patterns.
-
Perform exploratory data analysis (EDA) such as correlation heatmaps, class distribution plots, and basic time-domain plots for verification.
This overlapping window strategy was essential to maximize data utility from lengthy recordings and introduce variability in training samples.
Below are some EDA plots that studies showed would help show meaningful information especially with vibrational signals present in our dataset.
For respective fault types and fault severity, time series vibrational plots, fft, spectograms and feature based signals were used to explore our data:
Week 13 to 14: Feature Extraction Strategy
From Week 13 through 14, we implemented a hybrid feature extraction approach, combining:
-
Time-domain features: RMS, STD, Kurtosis, Crest Factor, Skewness, Peak-to-Peak.
-
Frequency-domain features: Spectral Centroid, Spectral Entropy.
-
Wavelet-based energy features: MODWT using Symlet-4 wavelet over 5 levels.
These were applied to all 8 channels per file, giving us a rich, high-dimensional feature set. The multi-domain fusion aimed to capture both transient and frequency-based fault signatures.
We ensured all extracted features were normalized using Z-score normalization, ensuring consistency for model training.
Week 15 to 16: Dimensionality Reduction and Model Training
With over 100 features, dimensionality became a concern. We used Principal Component Analysis (PCA) to reduce dimensionality while preserving 95% of the variance. We were able to reduce from our feature count from 112 to 28. This step improved training time and reduced noise/correlation among features.
We then performed a stratified train-validation-test split (70/15/15) to preserve class balance.
We trained a range of models using Classification Learner App:
-
SVMs (linear, quadratic, Gaussian)
-
Ensemble models (Bagged Trees, Boosted Trees)
-
Decision Trees
Models were evaluated using accuracy, precision, recall, F1-score, and confusion matrices. The top-performing model achieved ~92.9% accuracy. Final evaluations were performed on an unseen test set that saw the highest accuracy of 93.`9% on the same model that yielded the best results during training. Specifically, the test set may have contained instances that were more distinguishable or representative of the learned patterns during training, leading to improved classification performance. Additionally, the stratified sampling used during dataset splitting may have yielded a test set with less noise or lower intra-class variation, making it easier for the model to generalize to those samples.
Justification of Our Approach
Each step in our pipeline was motivated by practical challenges and technical reasoning:
-
File reduction addressed resource limitations.
-
Overlapping windows enriched training data diversity.
-
Multi-domain feature extraction ensured a wide capture of fault characteristics.
-
PCA tackled high-dimensional noise and redundancy.
-
Stratified splitting guaranteed fair model evaluation.
This methodical, justified approach led to the successful development of a high-performing, robust fault classification model.
🔠Future Work
Looking ahead, several key directions can be taken to enhance the reliability and practicality of the fault diagnosis system:
-
Utilizing the Full Dataset
The current study focused on a reduced subset of the original dataset due to processing and memory constraints. Future work will involve leveraging the entire dataset, which includes all 240 files, to better reflect real-world variability and operating conditions. This will provide a more comprehensive training base for the models. -
Expanding Class Definitions Based on Fault Severity
Rather than generalizing fault types into single classes, future iterations will include sub-categorization based on fault severity (e.g., low, medium, high damage levels). This multi-class expansion can improve granularity and allow the model to differentiate not just between fault types, but also their progression stages. -
Hyperparameter Tuning
Further improvement in model performance is expected through systematic hyperparameter optimization. Techniques such as grid search or Bayesian optimization can be applied to fine-tune classifiers like SVMs and ensemble models for better generalization. -
Additional Directions
Once these foundations are addressed, the project can be extended to explore real-time deployment, deep learning architectures (e.g., CNNs), and cross-condition generalization using transfer learning.
Walkthrough process of MATLAB
Video 2:
















































Comments
Post a Comment