Applied AI/ML from MIT certificate program

Integrating insights from the 2024 MIT No-Code AI and Machine Learning certificate program, these projects explore how AI agents and ML models can elevate clinician workflows, cybersecurity, and customer feedback analysis, with direct applications in product design.

Based on the 2024 MIT No-Code AI and Machine Learning certificate program, explored how AI agents and ML models can elevate clinician workflows, cybersecurity, and customer feedback analysis, with direct applications in product.

AI agents enhancing clinician workflows

AI-driven workflows address administrative burdens that detract from patient care, automating scheduling and documentation to reduce clinician fatigue. Disclaimer: All concepts on this page are my personal projects from MIT training; no affiliation with company work.

AI agents enhancing clinician workflows

AI-driven workflows address administrative burdens, automating scheduling and documentation. Disclaimer: All concepts on this page are my personal projects from MIT education; no affiliation with company work.

Adaptive AI scheduling for clinicians

Multiple AI assistants dynamically adjust clinician schedules, accommodating personal needs, while ensuring efficient clinical prioritization and workflow.

Highlights

Dynamic schedule adjustment

AI instantly adjusts schedules based on real-time data.

Task sequencing and prioritization

Efficiently sequences tasks, prioritizes urgencies, and time-boxes work.

Collaborative AI coordination

Multiple AI agents optimize workflow seamlessly.

Key AI/ML capabilities needed

Reinforcement learning (RL)

Optimizes task sequencing using real-time data.

ML & optimization algorithms

Combines machine learning predictions with optimization techniques.

Natural language processing (NLP)

Interprets clinician inputs for schedule adjustments.

AI-augmented communication and documentation

AI assistants help refine patient messages and clinical notes, streamlining communication and reducing clinician workload.

Highlights

Automated message drafting

AI generates message drafts matching clinician tone.

Efficient documentation

Automatically drafts standardized (e.g., SOAP) clinical notes to reduce administrative burden.

Integrated patient data

Summarizes relevant patient information within the interface.

Key AI/ML capabilities needed

Advanced NLP & generation

Uses transformers to create coherent, contextually aware messages and notes.

Context-aware summarization

Summarizes relevant patient history and recent interactions for efficient documentation.

Personalized language models

Fine-tunes AI outputs to match individual clinician styles, enhancing authenticity.

Highlights

Automated message drafting

AI generates message drafts matching clinician tone.

Integrated patient data

Summarizes relevant patient information within the interface.

Efficient documentation

Automatically drafts standardized (e.g., SOAP) clinical notes to reduce administrative burden.

Key AI/ML capabilities needed

Advanced NLP & generation

Uses transformers to create coherent, contextually aware messages and notes.

Context-aware summarization

Summarizes relevant patient history and recent interactions for efficient documentation.

Personalized language models

Fine-tunes AI outputs to match individual clinician styles, enhancing authenticity.

Intelligent data verification and decision support

AI assists clinicians in verifying patient data, enhancing decision-making accuracy while incorporating restorative breaks for well-being.

Highlights

Real-time data retrieval

AI surfaces relevant patient data instantly.

Interactive verification

Clinicians confirm and update data through AI collaboration.

Decision insights

Provides insights for informed clinical decisions.

Key AI/ML capabilities needed

Retrieval-augmented generation (RAG)

Combines data retrieval with transformers to improve decision accuracy.

Explainable AI (XAI)

Offers transparent reasoning behind data suggestions to build clinician trust.

User state modeling

Monitors clinician stress and workload levels to dynamically adjust schedules.

Highlights

Real-time data retrieval

AI surfaces relevant patient data instantly.

Decision insights

Provides insights for informed clinical decisions.

Interactive verification

Clinicians confirm and update data through AI collaboration.

Key AI/ML capabilities needed

Retrieval-augmented generation (RAG)

Combines data retrieval with transformers to improve decision accuracy.

Explainable AI (XAI)

Offers transparent reasoning behind data suggestions to build clinician trust.

User state modeling

Monitors clinician stress and workload levels to dynamically adjust schedules.

AI-driven patient care planning

AI creates temporary patient clusters / patient groupings, allowing clinicians to efficiently deliver customized advice to many patients with minimal effort.

Highlights

Ephemeral patient clustering

AI forms temporary groups based on dietary and medical profiles.

Bulk nutrition plan generation

Creates tailored nutrition advice for entire patient groups with a simple action.

Dynamic task creation

Proactively creates and sequences follow-up tasks based on clinician and patient needs.

Key AI/ML capabilities needed

Dynamic clustering

Utilizes algorithms to create and dissolve patient groups based on current needs.

Batch recommendations

Generates personalized nutrition plans for entire clusters efficiently.

Recommends personalized care plans factoring in medical history and lifestyle.

Chained task automation

Automatically creates follow-up tasks based on ongoing patient interactions.

Highlights

Ephemeral patient clustering

AI forms temporary groups based on dietary and medical profiles.

Dynamic task creation

Proactively creates and sequences follow-up tasks based on clinician and patient needs.

Bulk nutrition plan generation

Creates tailored nutrition advice for entire patient groups with a simple action.

Key AI/ML capabilities needed

Revenue loss from booking cancellations

Utilizes algorithms to create and dissolve patient groups based on current needs.

Batch recommendations

Generates personalized nutrition plans for entire clusters efficiently.

Chained task automation

Automatically creates follow-up tasks based on ongoing patient interactions.

Applied AI and machine learning case studies

Practical case studies from MIT's AI program showcase solutions in decision systems, unstructured data analysis, and prompt engineering, tackling business challenges with predictive analytics and automation.

Predicting hotel booking cancellations with machine learning

EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.
EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.
EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.
EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.

EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.
EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.
EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.
EDA - Bivariate analysis
1
Lead time: 50% of booking cancellations occur within 3 months.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Model training
2
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations. Correctly identified 494 cancellations, missed 106.

Performance metrics: Precision decreased by 5%, while recall increased by 4% from the baseline Random Forest model.
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.
# Special requests (20%): Fewer requests mean higher cancellations.
Market segment (18%): Online bookings cancel more often.
Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Recommendations
5
Targeted policies: Tier cancellations for short lead times, online bookings, $54-$162 rooms; incentivize special requests.
Enhance online experience: Improve booking with better data, research, and testing.
Boost retention: Prioritize loyalty for revenue over new customer acquisition.

EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.

EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.

EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.

EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.

EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.

Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Accuracy: 86%
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.

EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.

EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.
EDA - Bivariate analysis
1
Analyze key trends and correlations in booking data.
Lead time: 50% of booking cancellations occur within 3 months, increasing closer to the arrival date.
Special requests: 71% of cancellations had no special requests, suggesting lower investment among guests who book these rooms.
Screenshot from my Dataiku project
Model training
2
Screenshot from my Dataiku project
Model evaluation
3
Model selection: Pruned Random Forest model selected for optimal recall, minimizing missed booking cancellations to protect revenue and opportunity costs.

Confusion matrix: Correctly identified 494 cancellations, missed 106.

Performance metrics
Precision: 77% (-5% from baseline)
Recall: 82% (+4% from baseline)
F1-score: 79%  
Screenshot from my Dataiku project
Top predictors
4
Lead time (32%): Short booking windows increase cancellations.

# Special requests (20%): Fewer requests mean higher cancellations.

Market segment (18%): Online bookings cancel more often.

Avg price per room (12%): Cancellations cluster in the $54-$162 range.
Screenshot from my Dataiku project
Recommendations
5
Targeted cancellation policies
Apply tiered policies for short lead times, online bookings, and $54-$162 rooms; offer incentives for special requests.
Optimize online experience
Strengthen data collection, research, and testing to enhance booking and cancellation processes.
Focus on retention
Drive revenue by prioritizing repeat guest loyalty over new customer acquisition.

Used ML models to predict hotel booking cancellations at INN Hotels, addressing financial and operational challenges from hotel booking cancellations by identifying key predictive factors to minimize revenue loss and optimize operations.

Goals

Minimize revenue loss

Reduce financial impact from high cancellation rates.

Optimize operations

Improve resource allocation and staffing efficiency.

Problems

Revenue loss from cancellations

High cancellation rates at INN Hotels lead to significant financial and operational challenges.

Operational disruptions

Unpredictable cancellations disrupt resource allocation and staffing, affecting service quality.

Recommends personalized care plans factoring in medical history and lifestyle.

Goals

Minimize revenue loss

Reduce financial impact from high cancellation rates.

Optimize operations

Improve resource allocation and staffing efficiency.

Problems

Revenue loss from booking cancellations

High cancellation rates at INN Hotels lead to significant financial and operational challenges.

Operational disruptions

Unpredictable cancellations disrupt resource allocation and staffing, affecting service quality.

Highlights

Automated message drafting

AI instantly adjusts schedules based on real-time data.

Collaborative AI coordination

Multiple AI agents optimize workflow seamlessly.

Task sequencing and prioritization

Efficiently sequences tasks, prioritizes urgencies, and time-boxes work.

Key AI/ML capabilities needed

Reinforcement learning (RL)

Optimizes task sequencing using real-time data.

ML & optimization algorithms

Combines machine learning predictions with optimization techniques.

Natural language processing (NLP)

Interprets clinician inputs for schedule adjustments.

Detecting SMS spam with natural language processing

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable spam flagging to improve model accuracy via explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to accurately predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest model for its superior performance metrics and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy via explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).

Most SMS are neutral (45.5%) or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%), indicating reducing spam can lower negative emotions.
Screenshot from my Dataiku project
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Screenshot from my Dataiku project
Model evaluation
3
Model choice: Chose the baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.

Confusion matrix: Correctly identified 168 spam SMS, misidentified 25.

Performance metrics
Accuracy: 98%
F1-Score: 91%
Recall: 87%
Screenshot from my Dataiku project
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not.
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Screenshot from my Dataiku project
Recommendations
5
Auto-block spam
Automatically block messages containing the top spam words (e.g. free, txt, claim) on all company phones.
Employee alerts
Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback
Enable manual spam flagging to improve model accuracy from explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.

Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.
Sentiment analysis
1
Sentiment analysis scores range from 1 (very negative) to 5 (very positive).
Most SMS are neutral or negative (41.9%).
Spam messages show higher rates of extreme negativity (29%) vs. non-spam (13%).
Word cloud
2
Most frequent spam words: “free,” “text,” “txt,” “ur,” “u,” “mobile,” and “claim.”
Caution: Words such as “u” and “ur” also appear in non-spam messages, reducing their reliability to predict spam.
Model evaluation
3
Model choice: Baseline Random Forest for optimal performance and recall, which minimizes costly missed spam detections.
Confusion matrix: 168 spam found, 25 missed
Performance metrics: Accuracy: 98%, F1-Score: 91%, Recall: 87%.
Top predictors
4
These model-identified words significantly determine if an SMS is spam or not:
free (6%)
txt (5%)
claim (3.2%)
mobile (2.8%)
text (2.6%)
Recommendations
5
Auto-block spam: Block messages with top spam words (e.g., free, txt, claim) on company devices.
Employee alerts: Flag suspicious messages with common spam/non-spam words, offering a bypass option.
User feedback: Enable spam flagging to improve model accuracy via explicit employee input.

Developed an NLP model with text preprocessing and Random forest classification model to analyze unstructured SMS data for Cyber Solutions, enhancing cybersecurity and mitigating phishing risks amid rising cybercrime.

Goals

Prevent cyberattacks

Accurately classify spam SMS to reduce cybercrime costs.

Enhance employee security

Protect employees against phishing and financial loss.

Problems

SMS spam prevalence

95% of SMS are read within 3 minutes, making spam a significant threat.

Rising cybercrime costs

Cybercrime rates increased 40-fold since 2001, affecting 6.9 billion victims in 2021, costing $2 billion annually.

Goals

Prevent cyberattacks

Accurately classify spam SMS to reduce cybercrime costs.

Enhance employee security

Protect employees against phishing and financial loss.

Problems

SMS spam prevalence

95% of SMS are read within 3 minutes, making spam a significant threat.

Rising cybercrime costs

Cybercrime rates increased 40-fold since 2001, affecting 6.9 billion victims in 2021, costing $2 billion annually.

Recommends personalized care plans factoring in medical history and lifestyle.

Key takeaways from MIT's No-Code AI and Machine Learning program

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks & large data for precise solutions.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks & large data for precise solutions.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks & large data for precise solutions.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks & large data for precise solutions.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on the most critical parts of the input, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares input data for processing.
2
Hidden layers
Neurons sum weighted inputs and apply activation functions to capture complex patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Predicting housing prices by size.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Segmenting users based on behavior.
Reinforcement learning
Learns strategies via trial & error for rewards.
Training a chess-playing agent.
Deep learning
Uses multi-layered neural networks to learn from large data sets for precision.
Autonomous cars driving themselves.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others simultaneously for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks like image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks like text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data for complex problem-solving.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head self-attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts with labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing links.
Reinforcement learning
Learns strategies via trial & error for rewards.
Deep learning
Uses multi-layered neural networks & large data for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision
Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall
Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1-score
Balances precision and recall when both are important.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Each word considers all others at the same time for context.
Multi-head attention
Parallel attention layers process different parts of input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Common types:
Discriminative AI
Classifies data by recognizing patterns in different categories.
Spam detection
Img recognition
Generative AI
Creates data that mimics patterns from its learned examples.
Hyper-realistic img
Creating new voices
What does AI do?
4
Encoder models
Processes input simultaneously to understand context and categorize output, ideal for discriminative tasks such as image recognition, e.g., BERT.
Spam detection
Img recognition
Decoder models
Builds output token-by-token from prior context, suited for generative tasks such as text or image creation, e.g., Chat GPT.
Hyper-realistic img
Creating new voices
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer
Receives and prepares data for processing.
2
Hidden layers
Neurons combine inputs and apply activation functions to detect patterns.
3
Output layer
Generates the final prediction based on processed data.
How does AI work?
6
1
Multi-head attention
Focuses on input parts for context.
2
Hidden layers
Neurons process data for complexity.
3
Input layer
Uses previous outputs for context.
4
Cross-attention
Merges encoder and decoder data.
5
Feed-forward network
Generates final predictions.
Model selection
7
Select the algorithm based on problem, goal, and data:
Supervised learning
Predicts outcomes using labeled data, minimizing errors.
Unsupervised learning
Finds patterns in unlabeled data by analyzing relationships.
Reinforcement learning
Learns strategies through trial and error to maximize rewards.
Deep learning
Utilizes complex neural networks and large data sets for precision.
Model training process
8
1
Design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare, select the best final version
Model evaluation
9
Performance metrics
Screenshot from my Dataiku project
Precision: Ensures accuracy of positive predictions, minimizing false positives, e.g., priority email filter.
Recall: Identifies all positive instances, reducing false negatives, e.g., disease detection.
F1 score: Balances precision and recall when both are important.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.

How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.
How does AI work?
1
Transformers
An advanced architecture using attention mechanisms and parallel processing to efficiently handle complex data.
How does AI work?
2
Attention mechanism
Enables focus on crucial input parts, refining output relevance.
Self-attention
Words consider all others simultaneously.
Multi-head attention
Parallel attention layers process different parts of the input concurrently.
What does AI do?
3
Artificial intelligence enables problem-solving, learning, and content generation. Types:
Discriminative AI
Classifies data by detecting category patterns.
Generative AI
Produces new data that mimics learned patterns.
What does AI do?
4
Encoder models
Processes input at once for context, ideal for tasks such as image classification, e.g., BERT.
Decoder models
Builds output step-by-step from context, suited for generative tasks such as text creation, e.g., Chat GPT.
How does AI work?
5
Neural networks use layers of neurons to process data.
1
Input layer receives and prepares data.
2
Hidden layers - Neurons combine inputs and activate to detect patterns.
3
Output layer generates predictions.
How does AI work?
6
Encoder
1
Multi-head attention focuses on input parts for context.
2
Hidden layers - Neurons process data for complexity.
Decoder
3
Input layer uses former outputs for context.
4
Cross-attention merges encoder and decoder data.
5
Feed-forward network makes predictions.
Model selection
7
Supervised learning: Predicts with labeled data, minimizing errors.
Unsupervised learning: Finds patterns in unlabeled data by analyzing links.
Reinforcement learning: Learns strategies via trial & error for rewards.
Deep learning: Uses neural networks & large data for precision.
Model training process
8
1
First, design and train a baseline model
2
Next, evaluate the model's performance
3
Adjust the parameters and reevaluate
4
Compare models, select the best final version
Model evaluation
9
Precision: Ensures accuracy of positive predictions, minimizing false positives.
Recall: Identifies all positive instances, reducing false negatives.
F1 score: Balances both precision and recall.

Few slides selected based on my learnings from MIT's AI/ML certificate program.

An overview of essential AI and ML concepts, model types, how they work, and their significance in product design. Covers AI fundamentals, neural networks, transformers, model selection, training, evaluation, and pitfalls like overfitting, emphasizing their impact on user experience.