OOP for AI — DeepSkal 2026

DataKit

A Python library for loading, batching, and preprocessing image and audio datasets — built with clean object-oriented design for ML pipelines.

9,390 Images
2,698 Audio clips
46 Tests passing
90% Val accuracy
Python 3.11 NumPy Pillow Librosa ABC / OOP TensorFlow
// Interactive Demo
Preprocessing Pipeline
CenterCrop, RandomFlip, and Padding run live in your browser — a direct port of the Python transforms.
Pipeline(CenterCrop, RandomFlip, Padding)
CenterCrop
RandomFlip
0.50
Padding
Original
select an image
After Pipeline
click Apply

// Model Inference
Fine-tuned MobileNetV2
Each model is trained on 80% of its dataset using the DataKit pipeline. Click any test image (held-out 20%) to run inference directly in your browser — no server.

Oxford-IIIT-Pet — Breed Classifier 37 classes · ~90% val accuracy

Live
Loading model…

UTKFace — Age Estimator regression · MAE ≈ 6 yrs

Live
Loading model…

ESC-50 — Sound Classifier 50 classes · audio → mel spectrogram → MobileNetV2

Live
Loading model…

BallroomData — Genre Classifier 10 genres · audio → mel spectrogram → MobileNetV2

Live
Loading model…

// Library
Structure & Design

Dataset Hierarchy

Dataset (ABC)
├── LabeledDataset (ABC)
│   ├── ImageDataset
│   └── AudioDataset
└── UnlabeledDataset (ABC)
    ├── UnlabeledImageDataset
    └── UnlabeledAudioDataset

Preprocessing Pipeline

Transform (ABC)
├── CenterCrop       (image)
├── RandomCrop       (image)
├── RandomFlip       (image)
├── Padding          (image)
├── MelSpectrogram   (audio)
├── AudioRandomCrop  (audio)
├── Resample         (audio)
├── PitchShift       (audio)
└── Pipeline         (any → any)

Quick Start

from src.image_dataset import ImageDataset
from src.batch_loader  import BatchLoader
from src.preprocessing import Pipeline, CenterCrop

ds            = ImageDataset("data/", lazy=True)
train, test   = ds.split(0.8)
loader        = BatchLoader(train, batch_size=32)
pipe          = Pipeline(CenterCrop(256, 256))

for batch in loader:
    out = [pipe(img) for img, label in batch]

Key Design Choices

• ABCs prevent incomplete instantiation
• All attributes private, exposed via @property
• split() shuffles indices, not data
• BatchLoader uses a generator (yield)
• Callable classes for all transforms
• CSV labels auto-cast: int → float → str
• Pillow for RGB-safe image loading
• librosa.load(sr=None) → (y, sr) tuple