About HeyOnCall

We’ve built a reliable, simple on‑call and monitoring solution for small teams and solo founders, so you can rest easy with confidence that your app is up.

Who we are:

Humberto Evans

I'm a lifelong entrepreneur who gets excited about early-stage startups where the goal is to build useful things. Give me a good team and a big idea – that's where I want to work. I have a deep appreciation for the outdoors that I indulge with a fair bit of trail running in the mountains.

Mike Robbins

I'm an avid DIYer, electronics hobbyist, and open source contributor. I've been programming since age 7, and built my first database-backed public website at age 12: a fairly popular directory of Java applet games! I relax with biking, hiking, and long soaks in the hot tub.

We’re two engineers who met at MIT in the Department of Electrical Engineering and Computer Science. We took a previous company through Y Combinator. We each took a few years off and worked at other startups in the Bay Area. With the lessons learned from operating production software systems, we built HeyOnCall and launched it in 2022.

What we’ve built:

HeyOnCall combines website/API monitoring, heartbeat/cron monitoring, SSL checks, incident management, on‑call schedules, escalations, alert delivery, and mobile app alerting into one integrated product.

Who it’s for:

Small engineering teams and solo founders who want reliable paging, simple setup, and fewer moving parts.

Why we built it:

Since the late 1990s, we’ve been responsible for running busy user-facing web apps, from a server in the basement, to a VPS, to a colocated server, to a PaaS, to a Kubernetes cluster.

Like it or not, with any of these hosting platforms, there’s always downtime. Misconfigurations happen. Networks fail. Disks fill. Backups break. Certificates expire. And so on.

We couldn’t find the tool we wanted to give us the confidence that our app was up and our cron jobs were running, and to page us reliably when they weren’t. In previous companies we ended up building in-house tools or stitching together a bunch of other tools to handle these needs because the right solution didn’t exist. So we built it: HeyOnCall.

Our hard-learned, opinionated design choices:

Don’t wake me up at 3am for a transient blip that will fix itself.
- Configure reasonable timeouts (≥ 5 minutes) to reduce non-actionable alerts.
Don’t wake me up at 3am for a problem that can wait until the morning.
- Configure business hours schedules to defer any alerts that can wait.
Don’t add fragile moving parts between monitoring and alerting.
- Integrate monitoring and on-call alerting into one well-tested product to increase reliability.
Don’t burn out your team with on-call responsibilities.
- Make it easy to share the responsibility with an easy-to-edit on-call schedule.
- Make it easy to assign responsibility for different services to different on-call schedules.
- Defer all non-critical alerts until regular business hours.
- Reduce false positives by configuring reasonable timeouts.
When something really is broken, don’t let alerts get missed.
- Critical alerts (optional, special push notifications) break through the phone’s Do Not Disturb and volume/mute settings.
- Alerts repeat until acknowledged.
- Configure escalations to ensure other teammates are alerted if the on-call person misses the alerts.
Don’t rely on your hosting provider’s monitoring (if any).
- Monitoring from inside the same network misses many layers of the stack.
- External monitoring aligns with what your users/customers experience.
Periodic heartbeats are the only reliable way to monitor distributed systems.
- All web systems are distributed systems.
- Heartbeats can be initiated from your code (e.g. a cronjob check-in) or from ours (polling, e.g. an HTTP probe).
Your custom internal monitoring scripts need external meta-monitoring.
- You may have a custom internal cronjob that checks your data pipeline once a minute, but that cronjob should be monitored externally.
Make monitors flexible enough to give confidence that everything is running.
- For example: assertions for HTTP status codes, custom headers, redirects, etc. to make sure your app is behaving as expected.
Catch ticking time-bombs before they explode.
- For example: catch SSL certificates with failed automatic renewal before they actually expire.
Don’t set pricing in $/monitor or $/user.
- Shouldn’t have to ask finance for budget just to add a new monitor or teammate.

Questions or ideas? We’d love to hear your feedback. Your input helps us make HeyOnCall better every week.