Cron Job Monitoring

Catch missed, stuck, or failed cron jobs and scheduled tasks with simple HTTP check‑ins and reliable on‑call alerting.

watch face and green check marks, symbolizing scheduled cron jobs monitored successfully

Cron Job Monitoring:

Cron jobs and scheduled tasks are vital to most applications for powering data pipelines, backups, emails, reporting, and more. When they break, they’re much less visible than a website outage, so cron jobs often fail silently for days or weeks before anyone notices. Cron job monitoring paired with on-call alerting means you’ll know about inevitable issues before they impact users.

Simple Check-Ins:

Your scheduled jobs “check in” on start and/or successful completion by sending a simple HTTP POST request to our API. If the next check‑in is late, we create an incident, page the right people via loud critical alerts to our iOS and Android apps, post the incident to your Slack channel, and keep alerting and escalating until someone on your team acknowledges. Or, if it can wait, we can defer the alerts until your preferred business hours, to avoid waking you up at 3am.

TLDR:

Built for small engineering teams and solo founders, HeyOnCall pairs cron job monitoring with on‑call alerting in one product for superior reliability and simplicity. Get impossible-to-ignore critical alerts that bypass Do Not Disturb, plus Slack, email, webhooks, and more. Defer important-but-not-urgent alerts until the morning so you don’t burn out your team with 3am wakeups. HeyOnCall is designed from the ground up to reduce false positive alerts (getting alerted when your cron job is actually fine) and false negatives (missing an outage due to broken monitoring or alerting).

Screenshots

cron job checkin shell script sending heartbeat to HeyOnCall in terminal

Screenshot of an inbound liveness (heartbeat) check in HeyOnCall for monitoring a database backup cron job, showing the timeout, status, and check-in timestamp.

Critical Alerts in the HeyOnCall mobile app

Cron Job Monitoring: HeyOnCall vs. Alternatives

Differentiator	HeyOnCall	Alternatives: Roll Your Own	Alternatives: Enterprise
Monitoring	Simple, flexible HTTP POST check-ins on start and/or successful completion.	Not typically a roll-your-own solution: it requires hosting a separate server (on different infrastructure) to receive the check-ins.	Varies, but similar.
Business hours schedules	Defer important-but-not-urgent alerts (like a backup script failing) until your preferred weekdays and hours.	No differentiation between urgent and non-urgent alerts. Burns out your on-call team waking them up for things that can wait until the morning.	High complexity: requires multiple schedules, overrides, and routing rules.
Alerting	iOS/Android “Critical Alerts” bypass Do Not Disturb and volume/mute settings. Repeat until acknowledged. Will wake you up!	No mobile app. No critical alerts. Emails or Slack alerts are easy to miss for hours.	Varies, but similar if configured correctly.
On‑call	Integrated on-call rotation schedules and escalations.	No on-call schedules or escalations. You probably end up with a noisy Slack channel that everybody ignores.	Separate products for monitoring and on-call: requires integration glue between them, so there are more moving parts to fail during an outage.
Silencing	Quick silence with selectable timeout at the trigger, service, or organization-wide level. Ensures you don’t keep getting alerts while you’re fixing the issue, and ensures you don’t inadvertently stay silenced forever.	Nope. Keep getting bombarded with alerts while fixing the issue.	Mute one monitor at a time. Forget to unmute after the incident is over, so you miss the next incident.
False positives (noisy alerts)	Customer-level: False positives are reduced to your preferences via customizable consecutive-failure thresholds / timeouts. Platform-level: False positives are reduced through continuous control group self-checks, extensively tested codebase, and code paths designed to differentiate our network incidents from yours.	Tracking state history adds complexity and room for error. Can’t easily differentiate between failures of your monitoring system and failures of your application.	Filter rules and thresholds are split between separate monitoring and on-call products.
False negatives (missed alerts / blind spots)	Customer-level: Missed alerts are reduced via: alerts repeat until acknowledged; configurable multiple delivery channels per user; configurable multi-level escalations. Platform-level: False negatives are reduced via extensive CI test suite, continuous production self‑checks, and external monitoring.	Tends to be weakly tested, running on separate infrastructure (hopefully), ignored for months, and silently failure-prone.	Customer-specific glue (webhook integration) between separate monitoring product and on-call product fails silently (webhooks/auth headers/network issues), resulting in missed alerts right when you need them.
Pricing	Simple, flat $/month pricing. Free tier forever.	In-house engineering time ($$$) to build and maintain.	Annoying $$/user/month or $$/monitor/month with big enterprise sales teams and long-term contracts.

Cron Job Monitoring FAQ

Why do cron jobs need monitoring?

Anyone who has operated a production application for long enough will tell you that cron job schedulers break, jobs fail with exceptions, run out of memory, break when software dependencies change, and otherwise fail to finish successfully.

Scheduled jobs for data pipelines or backups often rely on external APIs for their input or output, making them vulnerable to network issues, authentication issues, upstream outages, and external API changes.

These jobs run fine for weeks, months, or years without issues, then suddenly fail in ways that often go unnoticed for a while.

Monitoring makes sure your cron jobs succeed, and makes sure you’re the first to know when they don’t.

But I’m not using cron.

We’re using “cron jobs” generically to mean any scheduled job: any piece of code that runs at a regular interval. This includes:

crond: crontab
Kubernetes: CronJob
Ruby on Rails: Sidekiq, Resque, Delayed Job, Active Job
Python / Django: Celery, RQ
PHP: Laravel Task Scheduler
Node.js: node-schedule, node-cron, Agenda, Bree, Bull
Java: Spring @Scheduled
Go: gocron

If you have a piece of code that runs every minute/hour/day/etc., then this page applies to you.

What can go wrong that monitoring catches?

Job never starts (scheduler down, cron misconfigured, worker down)
Job finishes without success (exceptions, external failures)
Job starts but never finishes (deadlocks, hangs, infinite loops)
Environment drift (missing or changed environment variables, secrets, or permissions)
Breaking changes when upgrading software dependencies
Upstream issues with external or internal API providers

Who needs cron job monitoring?

Imagine the intern accidentally drops the production database, and then you discover that your backup script has been broken and failing for the past year.
Imagine your data ingest cron job doesn’t run for a week, and then you get an email from a customer asking you why your app isn’t updating.
Imagine your marketing emails aren’t being sent, and the head of marketing asks you why conversion rates are way down.

Would these issues affect your revenue or reputation? If so, you probably need monitoring.

This includes just about any team where scheduled jobs power your product’s data pipelines, emails, backups, etc.

Who doesn’t need cron job monitoring?

If you don’t have any scheduled jobs, you don’t need any scheduled job monitoring.

If you’re okay with your cron jobs failing silently for weeks, months, or years, you probably don’t need monitoring.

How does HeyOnCall monitor cron jobs?

Add an HTTP POST check‑in “heartbeat” from your script or worker to our API.

When do I get alerted?

You choose how long of a timeout needs to pass without a check-in before we alert. Your timeout should include the opportunity for retries and a grace window for runtime variation.

You probably don’t want to get woken up by a loud alert at 3am if your once-an-hour data ingest cron job is running slow and late to check-in by 5 minutes, but then successfully finishes a few minutes later. (It’ll already be resolved by the time you open your laptop.)

On the other hand, if your cron job is production-critical and has been down for a longer period (several chances to retry), you might want to get woken up and start debugging.

There are real tradeoffs in setting your timeout threshold, but we typically recommend setting a timeout that is long enough that the issue is unlikely to fix itself.

But I don’t want to get woken up at 3am!

Same here! :) Many cron job issues are important to fix, but can realistically wait until the morning, or until Monday morning. We call these “important-but-not-urgent”.

That’s why we built Business Hours Schedules, which you can enable on a per-monitor basis, to defer any alerts until your preferred business hours. So you can fix the backup script at 10am, not 3am.

Who gets alerted?

Alerts route to the on‑call person in your on-call rotation schedule.

This works for solo founders too: until you add teammates, the default on-call schedule is just you, 24/7!

How do I get alerted?

Alerts are sent via the free HeyOnCall mobile apps for iOS and Android.

We have special permissions to deliver “Critical Alerts” on both iOS and Android. Critical alerts bypass the phone’s Do Not Disturb and silent/vibrate modes. They are LOUD and impossible to ignore.

The alerts will keep getting sent repeatedly (at a configurable interval) until you acknowledge the incident.

That sounds incredibly annoying! How do I turn it off?

Acknowledging the incident stops the alerts.

You can also configure non-critical alerts, which will be delivered as normal push notifications. These respect your phone’s normal Do Not Disturb and silent/vibrate modes.

We also have a vibrate-only mode, which will still buzz in your pocket, but won't make any sound.

During an incident, you can also silence all alerts at multiple levels up to organization-wide, so you don’t keep getting paged while you’re in the middle of trying to debug things.

What if I don’t notice the alert?

We have escalation rules built in. You can configure HeyOnCall to, for example, page the next person after 15 minutes, then page the team’s manager after 30 minutes, then page the CTO after 60 minutes. Any one of those people acknowledging the incident will stop the escalation.

Can I get alerts in Slack?

Yes. Connect Slack and our bot will post messages into your desired Slack channel for team visibility and collaboration. You can use this Slack channel for team awareness, while you rely on the iOS/Android critical mobile alerts for wake-you‑up-at-3am paging.

What makes HeyOnCall better than alternatives?

HeyOnCall helps you avoid burning out your engineering team by sharing the responsibility with an on-call rotation schedule, while deferring any important-but-not-urgent alerts until your regular business hours.

Alternative solutions often require you to stitch together separate monitoring and on-call alerting tools using brittle webhooks between them. For example: would you really trust your monitoring cron job to be able to send a POST request to your third-party alerting service exactly when you’re in the middle of a flaky network outage? :facepalm: Not exactly a recipe for reliability.

HeyOnCall integrates cron job monitoring, website monitoring, on‑call schedules, and alert delivery in one product, which removes fragile links and makes the overall system more reliable.

Critical alerts. On-call schedules. Business hours schedules. Escalations. Monitoring for websites, APIs, cron jobs, and SSL certificates. All built-in and battle-tested. Designed for developers, by developers. Simple, flat pricing. Free tier forever.