Coding Fungus: Python Library Categorisation from DS, ML perspective

Python libraries in Data Science (DS) and Machine Learning (ML) are categorized by their specific role in the end-to-end model pipeline: data ingestion, manipulation, visualization, algorithm training, and production deployment.

1. Data Processing & Manipulation

These libraries handle the heavy lifting of data cleaning, restructuring, and numerical operations.

NumPy: The foundation of scientific computing. Provides support for large, multi-dimensional arrays and high-level mathematical functions.
Pandas: Essential for data wrangling. Offers DataFrame structures to easily manipulate tabular data, handle missing values, and merge datasets.
SciPy: Built on NumPy, it provides modules for optimization, integration, linear algebra, and statistics.

2. Exploratory Data Analysis (EDA) & Visualization

These tools help uncover data distributions, correlations, and tell a story with data.

Matplotlib: The foundational plotting library for static, animated, and interactive visualizations.
Seaborn: Built on top of Matplotlib, it provides a high-level interface for drawing attractive and informative statistical graphics.
Plotly: Ideal for interactive and publication-ready graphs that can be embedded in web applications.

3. Traditional Machine Learning

Libraries focused on classical statistical learning, classification, regression, and clustering.

Scikit-Learn: The gold standard for classical ML. Contains algorithms for SVMs, Random Forests, K-Means, dimensionality reduction (PCA), and preprocessing.
XGBoost: Highly optimized and scalable library designed for gradient-boosted decision trees, heavily utilized for tabular data competitions.
LightGBM: A fast, distributed gradient boosting framework by Microsoft, known for its high performance and low memory usage.

4. Deep Learning & AI

Frameworks tailored for building, training, and deploying neural networks on GPUs/TPUs.

PyTorch: Developed by Meta, widely preferred in AI research and production for its dynamic computation graph and intuitive Pythonic feel.
TensorFlow: Developed by Google, a comprehensive ecosystem for scaling deep learning models from research to production.
Keras: A high-level API specification running on top of TensorFlow, allowing fast prototyping of neural networks.

5. Specialized Libraries

Libraries built to tackle domain-specific DS/ML tasks.

Hugging Face Transformers: The industry standard for Natural Language Processing (NLP) and Large Language Models (LLMs), enabling state-of-the-art text, image, and audio models.
OpenCV: The premier library for Computer Vision, used for image processing and video analytics.
SciPy (Stats): Specifically for probability distributions, statistical tests, and frequency analysis.

6. MLOps & Deployment

Libraries to track experiments, package models, and deploy them in production.

MLflow: Manages the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
Streamlit: Turns data scripts into shareable web apps in minutes, perfect for creating quick ML user interfaces.
BentoML: A unified model serving framework to package and deploy ML models into scalable endpoints.

Coding Fungus

Friday, May 22, 2026

Python Library Categorisation from DS, ML perspective

1. Data Processing & Manipulation

2. Exploratory Data Analysis (EDA) & Visualization

3. Traditional Machine Learning

4. Deep Learning & AI

5. Specialized Libraries

6. MLOps & Deployment

No comments:

Post a Comment

System Design Interview : 7 Trade-off for higher level

Report Abuse