July 29 2023 • Episode 015
Jason van der Merwe: Strava - How To Use Negative Experiments To Create More Customer Value
“Experimentation is the tool that we use to measure changes in our product for users, user journeys and user experience. Experimentation isn’t an end game or goal. We rely heavily on experimentation to understand how the changes we’re making on the product impact outcome metrics like retention and subscriptions”.
Jason van der Merwe is the Director of Engineering, Growth at Strava. Strava is the popular exercise tracking and social networking service, with more than 110 million users and 8 billion activities uploaded.
With a background in iOS development, he manages multiple engineering teams at Strava. Jason is responsible for the Growth, Insights and Optimisation teams, ensuring operational efficiency and cross-functional collaboration across Strava teams.
Undertaking studies in Computer Science Engineering at Stanford University, Jason majored in Artificial Intelligence and Machine Learning, with sub-major studies in Mathematics and Computational Sciences.
When not working, he’s cycling, running, cooking, or building something. Jason is currently building furniture for his house and learning various woodworking techniques.
Get the transcript
Episode 015 - Jason van der Merwe - Strava - How To Use Negative Experiments To Create More Customer Value
Gavin Bryant 00:03
Hello and welcome to the Experimentation Masters Podcast. Today I would like to welcome Jason van der Merwe to the show. Jason is the Director of Engineering and Growth at Strava, the popular exercise tracking and social networking service. With more than 110 million users and 8 billion activities uploaded. He graduated from Computer Science Engineering at Stanford University majoring in Artificial Intelligence and Machine Learning, with major studies in Mathematics and Computational Sciences. Welcome to the show Jason.
Jason Van Der Merwe 00:44
Thanks, Gavin. It's good to be here.
Gavin Bryant 00:46
Jason Let's get started by giving our audience a little bit of your background and experience please.
Jason Van Der Merwe 00:45
Yeah, so my background is mainly in mobile engineering, specifically iOS. So, what I started doing, started learning how to code, to really unblock myself in being able to create apps and products, and try things out and ship them. So that's what I did for a while, ended up at Strava, over eight years ago. And that's when I was placed onto the Growth team, kind of, we formed the growth team, about a month after I arrived, so I was kind of a fresh recruit, and just kind of put on the team. And that's where I started to learn about the world of Growth. And so, the early days, I was coding and building things for quite a few years until I kind of transitioned to the leadership function, and now I've been responsible for multiple engineering teams at Strava.
Gavin Bryant 01:48
Excellent. So thinking to back to your early days in mobile, is that where you first got your interest in experimentation, trying and testing a lot of different things?
Jason Van Der Merwe 02:01
Yeah, definitely, I remember, even early days at school, I would make a version of the app or two versions of something. And then I would go and just give my phone to people and see what happened. I remember experimenting with heat map SDKs, where you kind of like see where people were tapping on the screen, and what was useful. And so yeah, there were definitely classes at Stanford that really, were always promoting, going and testing with users, they're trying different things that was, you know, build something, build an MVP as fast as possible, get into the hands of the users learn, iterate, and so on. So I guess it was kind of always something that was instilled in me this idea that the outcome is not the first iteration of the code or the software you're delivering, it's kind of like you're on a journey with your user, there's always some better version or something better out there that you're trying to uncover. And it's kind of your goal to try to figure out what that is. And so I think that's what I love about experimentation is there's no necessary end or best state, there's something better, and you're trying to figure out, you're trying to understand your user better, trying to develop empathy. And experimentation is a tool to get you there.
Gavin Bryant 03:24
One of the things that I wanted to quickly ask you about was your remit at Strava as Director Engineering Growth. I've seen a lot of companies that those two functions are typically split out. Have you found that co-locating those teams it just drives greater efficiencies and more effective outcomes for the customer.
Jason Van Der Merwe 03:46
Yeah, that's great question. It's actually a question I get a lot, which is very eye opening for me, because I've been doing it for so long at Strava that it feels like home. But for us, the way our kind of product org is organized is into two main organizations. There's a core product, and then there's growth product. And so the org that I'm responsible for the engineering is growth, product engineering, which includes product management, design, engineering, product marketing, growth marketing, analytics. And so we are organized around cross functional teams that are highly collaborative, highly cross functional with many functions, kind of organized around different Team remit. So a lot of our teams are around the lifecycle moments. So we have like a new user acquisition team, and then a new user activation team and subscriber acquisition team. And then those teams will have folks from every function to allow them to tackle the business problem, like how do you get more new users? How do you retain those new users, you know, allows them to tackle those problems very holistically versus growth being purely a marketing function, or purely a product function is etc.
Gavin Bryant 05:04
So effectively like a Tiger team or a SWAT team where those teams are anchored around the customer journeys and they are able to work and act autonomously to solve all of those customer problems at hand?
Jason Van Der Merwe 05:21
Yeah, exactly. And you know, it's not a unique thing to just drop, I've definitely taken homage and taking inspiration from other great organizations that do growth at a larger scale. But growth teams-- product teams are pretty much organized horizontally. If you think about it, they don't own just a surface in the product like you might have for a typical vertical product team, right? Who might own you know, like, let's say one of the teams that proportional used to feed that a team like new user activation? That's very much a horizontal concern, where like, what does the user do in their first seven to 28 days? Where can we apply some work to reduce the friction to increase their experience. And so a lot of the growth teams are very horizontal by nature. So they're working across many different spaces. And so you want that remit to be pretty wide and allow them to go and solve problems. And that's why-- I think that's one of the coolest parts about as an engineer working in growth is that I've worked on almost every pieces of code as an engineer across the app, because we've worked on like the recording experience, or the Activity View experience, or, you know, the signup, flow or increasing performance of certain API, so that when you click the Sign up button, you get to the app [phonetic 06:42] as soon as possible. So you get to work with all kinds of problems. Because there's no specific like constraints or walls around what you're supposed to care about, you're supposed to care about a kind of a user and business metric. And there's a lot of potential solutions yet to get to experiences that preset metric.
Gavin Bryant 07:02
Yeah, the remedy is quite broad for growth isn't working through that whole user lifecycle from acquisition right through to winback I guess.
Jason Van Der Merwe 07:11
Yeah, you know, I think, we have a very tight partnership with our growth marketing partners. And what's pretty cool is that, a user doesn't see the difference between an email or push notification, or your in app message versus like the product, everything's part of the product. And there are a lot of tools that we can create, either we bought or we create internally, and I've written about them, that we kind of give more power to our growth marketing partners to be able to do things without engineering. But at the end of the day, there's always a crossover between those channels that they're working on and the product itself. And to a user, like they don't see the difference between, you know, a coach Mark that pops up, you know, somewhere in the app and a push notification or an email, right, it's all just the product experience. And so we work very, very collaboratively on that and kind of stuff and don't see it as a marketing concern or a product concern. It's just like a growth concern.
Gavin Bryant 08:13
One of the things that I wanted to ask you about is the Strava experimentation journey. You’ve been working around growth and experimentation for some time now. What's that journey look like over time?
Jason Van Der Merwe 08:26
Yeah, do you want to talk about exactly like how we go from an idea to, you know, an experiment that ships?
Gavin Bryant 08:34
Yeah, let's zoom out a little bit. From what I can see on the web, Strava has been at experimentation for a few years now. So thinking back to those earlier days, how things evolved over time?
Jason Van Der Merwe 08:50
Yeah. So, it's a great question. When we started, all we knew was that we should run some experiments. And so and I kind of said this thing that in a podcast with sub club, that revenue cap that I did a few months ago. When you start out, you kind of just need to learn how to run an experiment. But what that means. And unfortunately, it's not as simple right, as we would like, there's a lot of parts to it, there's just setting up the experiment and getting your cohorting. And the ability to like, understand what's happening in each cohort, and then actually evaluate the success of your experiment. And then what happens to most folks, right, do you run your experiment? And you know, okay, we got a user to do this action more. And then you start questioning, Hey, is that action action actually useful? So early days, we were doing simple experiments around the new user experience. Things like, you know, getting users to add a profile photo, or to set up more there to kind of like profile information. And then, as we were doing experiments, we were getting a better sense of what does a growth team do? And what is that we care about. And we had a lot of kind of advice from growth experts at companies like Facebook. And one of the kind of big unlocks for us was understanding what this activation moment meant. So the idea of an activation moment is like, how do you get a user to do a certain action or get to a certain milestone in their experience, such that they stick around after being a new user? So a lot of people call it the aha moment, like, what's that moment in your product, that the user goes, Oh, I get it, I'm going to stick around. And so I would say about six months in, we identified our activation moment as one upload and one follow in the first seven days. And we did that a couple ways. We did that through user research, you know, just looking at what are users trying to do? You know, what are they trying to accomplish in your app, most of the time, we were just trying to, you know, help users do what they wanted to do.
Jason Van Der Merwe 11:00
The second part was looking at a lot of correlation analyses on what actions were most correlated with user retention at the one week-- second week, second month, you know, six months marks, and then combine that also with just common sense, and her product is trying to get people to be active, people want to come and be active on Strava. And activation, you know, for us, the aha moment is your first time you upload and see that upload, you know, and see the stats, and then the social network, part of it is a great reinforcing kind of motivation machine. And so that was-- You know, once we defined the activation moment, we did, you know, hundreds of tests on it. And we started to really get into a flow of we are trying to reduce friction, to for users to get to those two, those two steps, maybe we could kind of divide them into two separate, you know, two separate steps in a way because there was like one upload and one follow.
Jason Van Der Merwe 11:58
And so we were able to, to kind of go after those two, those two verticals for a while. And that was a roadmap for a long time. And through that, we just started getting better and better experimentation faster at going from idea to rate and hypothesis to okay, what are we actually going to change? How do we design this thing? How do we run this thing? How do we analyze this thing, and over time, you just get better at when you do it a lot. I remember one summer, we always did brainstorms as a team, I think that's really fun and important to know, you take the idea, because the problem we're trying to solve them and you create generate a ton of ideas around that. We had a spreadsheet of like 50 things we should do. And we had them divided into like things that needed design and things that didn't need design. And I’d just crank through those. I was like, how many of these things can I run these tests? Can I rent and I remember running into problems of not enough space for more experiments, you need an audience, you need a large enough volume of users to build to run, you know, simultaneous experiments or etc., I was running out of users to run experiments on. And that was really fun, though. You know, you're waking up every Monday waiting to see if an experiment has finished yet, and we went to the next one. And for us, we still to this day run experiments for about two weeks. Because Strava is so cyclical in usage in terms of how our users use a product, right? Most people exercise on the weekends. And so we give every user a full seven days to kind of- You know, we evaluate their actions over a seven day period, because if you signed up on a Monday, there's a very good chance that you're only going to record exercise activity the next Sunday.
Jason Van Der Merwe 13:46
So we give everyone seven days, and we need about seven days. We want to see seven days of user cohorts. And so if as a result, you need two weeks for an experiment. And so, you know, we're always waking up on Monday and being like, okay, are we done? What did we learn? What happened? What are we ready for the next one?
Jason Van Der Merwe 14:05
So we did that for a long time. And as you go, you develop a better way of doing just the process of ideation, you know, okay, we have a test brief, like, how do we make sure we are documenting our ideas. So we would come up with a test brief that every experiment had to go through. And we were experimenting on the process of experimentation as we went to try to make it more efficient, then how do we communicate out the learnings of our experiments? And then we would go through various meeting formats. Was it an email at the end? Was it a Google slide deck? Eventually, we kind of ended up on this Google Slides format, where every experiment has one slide at the end.
Jason Van Der Merwe 14:45
So like experiment results, and for about six years, we just put a slide into these slide decks and eventually had to make multiple slide decks for each year, because there's so many, but it's awesome cuz I can just go through and be like, Oh, we ran an experiment during this four years ago. Here's exactly the one slide. You know, here's it has been designed as the result has everything on it. We recently changed that to be in Confluence with his a lot better kind of architect, you know, just better written up and more information, it's easier to search. But that was process change that the team really wanted to do. And I was resistant too, because I was so used to do it in one way. But you know, so you see, you learn different things. And I think ==one of the coolest things we learned, we built on the way it was our internal experimentation reporting framework, which is basically this way of evaluating our experiments, our data scientists were finding themselves writing the same SQL query over and over to evaluate an experiment. And eventually, he was like, let's just automate this. So we built this thing, we call it one ring, because he was a big Lord of the Rings, still is a big Lord of the Rings fan, and so one ring to rule them all. And so he wanted one SQL query to rule them all. And so we have this thing called one ring. Now, that allows us to just any experiment you have, you just go on to the web page, like on your experiment name, and it shoots up every single metric that we care about in the company, about 250 of them. And so again, just along the way, you start to get better and better and better at these things. And now, now, we're running hundreds of experiments in a much more automated, sophisticated, fast way, but there's still a lot that we can improve a lot this there's a lot we need to build and get better at etc.
Gavin Bryant 16:37
Yeah seems like quite a natural and organic process that when you encountered friction points in your experimentation flywheel that you’d address them through a process change or automation. Yeah, it's an interesting way that it evolved there-- One of the things that really stuck out to me was you mentioned early on in the piece that experimentation is hard. It's difficult. It's challenging, but it just comes down to doing sets and reps, sets and reps, sets and reps to build that experimentation muscle to become more effective and more efficient at it, which was really interesting, and a good message for the audience. One of the things that you've mentioned a couple of times to this point is that whilst you have, you know, figured out how to develop a culture of experimentation and scale out within Strava, you've also consulted those within industry, other growth teams, high performing growth teams, who are some of those other organizations that you have learned a lot from?
Jason Van Der Merwe 17:41
Yeah, you know, it's always so great to learn from people who have gone before you. Especially larger companies where concerns around like, how do you actually set up a lot of people to do this can be interesting. So Dropbox is a company that I know, several folks there. And I would consult with them and just ask them questions. All the time. During interview processes, were interviewing engineers or product managers from other companies, I would make sure I had enough time just to ask them questions. And I remember asking someone at Twitter, who was interviewing for a role, like, Hey, where's your localization team setup? How do you go through? You know, during adding a new language, how does that look like at your company, and always have to be very cognizant of like, okay, you only have two minutes to ask them a question you're supposed to be interviewing them.
Jason Van Der Merwe 18:30
There was always so interesting to hear from at these big companies, Pinterest, as well, I spent quite a bit of time with John Egan at Pinterest, he has a great growth blog. He's an engineer background, he's now at Lyft. He really helped how we shaped our communications platform. And our strategy there, and now we actually have a whole team that just focuses on kind of communications and personalization. So spending time talking to him. Obviously, Facebook being a being a big one.
Jason Van Der Merwe 19:03
So a lot of those big companies. But the difference is Strava is smaller than all of them by a lot. Most of my time in Strava. We've been around 100-150 people, we're not 500. And so that process of scaling from the start, just as we are now was a bit more unique in terms of- there's not a lot of people-- a lot of companies that kind of stick around that size and Duolingo has done some great work I remember what size they are, I think they're similar size to us. But they've written a lot about their journeys and their kind of growth experimentation journeys recently. So I would encourage a lot everyone's read what Duolingo was put out. That's one of the reasons I write about just the things that we do now because I wish we had I wish I'd been able to read that kind of stuff. When I was um If you're making decisions and trying to organize the team and try to figure out what's the right step, because, you know, took me eight years to get to where I am right now in terms of the knowledge around growth, and it's probably could have picked me a lot faster if I knew a lot of these things upfront, but you know, the journey was fun for me. But I think in a world now, where things are moving way more fast, way faster than they were, you know, eight years ago, your new companies, they don't need to go through an eight year journey. figure this stuff out. Like it's not rocket science, it's pretty cut and dried, in many regards.
Gavin Bryant 20:36
Let's talk a little bit about Strava experimentation culture that you mentioned in the early days that experimentation was somewhat challenging but over that journey you've now performing hundreds and hundreds of experiments. How would you broadly describe experimentation culture at Strava?
Jason Van Der Merwe 21:01
Yeah, you know, experimentation is a tool that we use to measure changes in our product for users, right? It's not a it's not the goal. It's not the plan. It's a tool that we use to figure out how we change things. And so we heavily rely on experimentation to really understand the impact on our output metrics, things like retention, or the subscription, how the changes we're making in our product can impact those metrics. And so many of the things we do we wrap them in an experiment, whether it's new columns or email campaign, versus, you know, a new checkout page or reducing friction, you know, if we can test, a performance change, like an API performance change, and we can, if we can't do that, we will, it's always doesn't always make sense. We're always tried to understand how much did this really impact the user, the user journey, and the user experience. At the same time, we're also trying to make sure that it's really easy to do those things. Because no one wants to spend a bunch of time trying to set up an A/B test, or go through the mechanics of an A/B test. Right? It shouldn't be hard to do A/B testing it is to do the feature or the change itself. And so a lot of experimentation culture, I think we have is trying to improve the platform so that we can run tests, you know, without even thinking about it.
Jason Van Der Merwe 22:31
And for the most part, we try to A/B test everything. There are things that Strava that are very difficult to A/B test because of the network effects. Right? You have a social network experience. And so there's things we run into this that Jimmy I'm like, all that, so that's so easy. It should be so easy to test. And then someone explains it. And I'm like, Oh, that's really challenging. Like, you know, for instance video, when we take him out with video and our feet, you have both the uploader experience video and the viewer experience and A/B testing that or you know, A/B testing the viewing experience, the uploading experience, but if you upload something you can view, there's just all these complications around like how do you A/B test something network effects, but other than that, we try to A/B test as much as possible, again, to measure that the changes that we're seeing.
Gavin Bryant 23:16
Yeah, so key message there, make experimentation as easy as possible so people can then use experimentation to use- or to measure the impact of their work and the impact on user. Interesting one for you now that I want to ask, what are you obsessing about with growth and experimentation right now?
Jason Van Der Merwe 23:40
Oh, that's a really good question.
Jason Van Der Merwe 23:48
The first thing that comes to my mind that question is, I think this input to output metrics, relationship that is quite puzzling and challenging. So what I mean by this is, you know, there's your output metrics are the big lofty metrics that your company is, like, it's your subscriber numbers, it's now you know, it's churn, it's those kind of the big numbers. And input metrics are the ones that you've defined along the way that kind of points to you hope kind of leads to changes in the metrics, right? So that might be number of times you visit the feed. You know, for us, it's like upload or there'll be some could be a particular and a step and journey like you know, accepting contacts sync or accepting the push notification ask, right.
Jason Van Der Merwe 24:49
So, input metrics are defined around specific steps that you want the user to take at different moments. And ideally, your input metrics match to your output metrics in a causal way. Right? So, let's say for subscriber retention we care about, did they view heart rate data, right? And we think, right, viewing heart rate data is a great input metric. If that goes up, then subscriber retention will go up. And, you know, you start to identify these potential causal relationships through correlative analysis, right. So there's a lot of like, correlative analysis that your data scientists can do. Or you can just simply look at the math and be like, okay, but you know, this one's really heavily used. And then when we see this one goes up, the retention goes up.
Jason Van Der Merwe 25:37
So maybe there's a closer relationship. And then what you try to do is you try to prove the causal relationship between the those two metrics. And that is very difficult to do. So if something like subscribe, like subscriber retention, that's an output metric, that is very difficult to observe a change in, in an experiment, because it's just very difficult to move, you probably have a high baseline already, which means like, the sensitivity to that metric is pretty small, or at least, the sensitivity in a positive way, is very small. And you might need your input metric to move heavily in order to see your active metric move.
Jason Van Der Merwe 26:17
So you might need like heart rate retention, heart rate usage to double in order to see a 2% change in subscriber retention, but you don't know. That's often how it works. And I think what's so tough then is how do you make decisions when a lot of your input metrics are correlated with output metrics, but you haven't proved causation? And how much when do you almost admit defeat on finding a causal relationship? Or rely on the kind of correlative relationship. And that's really challenging to do. Because there's a lot of like- Products are not simple, right? Like, there's a lot of complex user behavior in your product. And so just because something is correlated, doesn't mean, it's necessarily the reason why, you know, retention is going up. And so something we're just working a lot on is how do we prove causal relationships? Or how do we get maybe better around correlation analyses, in a way that allows us to assume, is maybe some level of causal relationship there. Because you don't want to be the middle ground where you're just spinning your wheels, not sure. And that's not convicted on your input metrics, and how they might input you up, it affects your output metrics. But at the same time, you don't want to do a bunch of work into a general area of that isn't going to have an impact.
Gavin Bryant 27:48
Or on the other extreme, just completely relying on maybe an outdated product intuition.
Jason Van Der Merwe 27:56
Exactly, and that might change over time. I think it really good examples, something we've discovered the last few months, and this will feed into your questions on the negative test in a second. As I said, our activation metric that we defined a long time ago has always been one upload and one follow up for seven days, the upload piece has never changed Strava as a place where you add your activities, the follow up piece has changed slightly. In the fact that people are a little bit more skeptical around following someone when they open an app nowadays, when they sign up for a new app. I think people have moved away from the Facebook model.
Jason Van Der Merwe 28:38
They are little bit like-- They're much more concerned about privacy much more concerned about their data. Where's this going? Who can see this? Why can they see this? Why do I need to follow someone. You have networks like tik tok and slack, where you don't need to follow people, right? There's just there's like content out there that could be from, you know, someone I know, or someone completely random doesn't matter. It's his job, in some ways is a is this kind of more old school social network, where we do want you to have personal relationships, reflected on the network, right, we do want you to find the person, your best friend that you read with every week, every week. And follow that because we are that very close, we pray that close knit community as well as broader communities. But as a result, people have been a little more skeptical to follow someone in the first seven days. And as a result, that metric importance has gone down quite a lot over the last few months. And we tried to figure out how much do we deprioritize that compared to the upload moment. And this is where, you know, if I was giving advice to another company, I would be like, You should never have a activation metric that has two prongs to it like we did that. I think you know Strava is unique in that makes sense. And so one of the things we've been doing is negative tests to try to prove causal relationship. And this is where if you're trying to figure out does this input metric matter to my output metric, negative tests are very helpful. And so what I mean by that is we have got to added friction or remove the ease of finding friends on Strava in first new users to see how does that impact retention, you know, so we removed the ability to find some friends as the several steps in onboarding and we saw Okay, follow rate went down, how much did retention go down. Because sometimes it's really hard to make a metric go up and see guys are up with metric go up, right? You don't know. You don't know if you're going to be successful in that. But you can make your metric go down, you always have control over that, it's pretty easy to make an experience that's worse, or just, you know, remove the ability to do that. And we did see a change in retention go down, just not as much as we expected. And so we saw a lot less sensitivity between follow rate and retention than we were expecting, and actually slightly different on Android and iOS. iOS has a lot less of a of a relationship there. And so that was kind of an interesting insight, because that has changed from several years ago. And that was a kind of a little bit of an unlock for the team that focuses on new user activation. The unlock for them was okay, we're gonna focus on the Upload moment more for users in their first seven days and will kind of focus on the community building as a later stage in the user journey.
Gavin Bryant 31:40
Okay, let's talk a little bit more specifically about negative test and a nice segue there. What are some of the circumstances or scenarios that you've experienced when it's good to perform a negative test?
Jason Van Der Merwe 31:57
Right now, it's, we're using a lot. Actually, there's always tests popping up that I, in our experiments, review meetings, and I'm like, Oh, you guys are removing another test that I or something I built. So it's kind of on brand right now to remove stuff that I did in a negative test. I don't think it's about me, but you know, I'll take offense anyway. I think for us, once you get to a point where your user experience is getting complex, and there's a lot of things to it, and you're not sure what works, the negative test is a great first step to say, alright, let's just remove some things and see what happens.
Jason Van Der Merwe 32:35
So one of the growth marketing teammates on the new activation team, she was wondering, all right, how much does our email series work. And it was funny, because I was writing my article, at the same time that she was having these, she was contemplating doing this. And my reference, an early test we did seven years ago-- six years ago, when we did remove our new user, email onboarding. They saw no impact. And she did the same thing and was also seeing similar things recently on, hey, this doesn't have as much impact on metrics I care about, and we care about. And so she redesigned. With her teammates, she redesigned that experience and the new emails we have now are performing incredibly, it's really impressive. I'm surprised how well they're doing. But she first had to say, all right, let's potentially throw away what we have, let's see, does it actually make a difference? All right, it doesn't work. Let's go to the drawing board and try something completely different. And so I think there's so many cases where people just assume what they have works. It's really easy for us to do, right? It's, we see it, we might like it, it might look beautiful, when we go, okay, this works. And this is how it should be.
Gavin Bryant 33:22
Yeah.
Jason Van Der Merwe 33:51
So that's a great opportunity to remove stuff. And the folks on the team right now have been doing a really good job at questioning everything. And this is where having the new teammates is really awesome because they come in with a fresh mindset. And they're especially if you find people that are wanting to challenge the status quo, they'll challenge everything. And so a bunch of folks challenged our feature education. Again, I wrote an article about how we built this feature education platform at Strava. It's like this nice coordination platform. As a result, people started doing more feature education.
Jason Van Der Merwe 34:23
Each of those coach marks and different future education moments, had a big impact was awesome. And then they came in and they were like, recently, like after we have tuition teacher education, we're gonna delete it all and see what happens. And they removed it all. And user retention went up. And it was like, dang, I feel so proud of that. But it was so cool, because that was something I would not have done. I probably wouldn't have been like, alright, let's just remove everything. I was working for several years to get us to do more. But they were right in the sense of, piece by piece we were adding feature education and innovation they were having an impact and useful, but at a certain point, you kind of overshoot the runway and you do too much. And because you know, you don't build product, or once you build it piecemeal over time, you never have the opportunity to test everything versus nothing. And so a negative test is a great chance-- is a great tool to test everything versus nothing, kind of in retrospect. And so, right now, there's a lot of negative tests happening, before we do additive tests. It's like, okay, let's just strip this down, see where we're at, and then build this back up. Because if you strip it down and things are better, then you have a simpler starting point, right to build from then if you try to build complexity onto complexity.
Gavin Bryant 35:46
Yeah I liked the example that you gave with the email onboarding flow that was an opportunity there to further amplify the experience and to increase retention even though that the existing process was sound and it works so yeah it allows you to re-baseline take stock and then to be able to you know, provide a further enhanced experience.
Jason Van Der Merwe 36:15
You know, things change, right? So, the new emails we have now will probably cease to be as impactful in a few years, we'll have to do the same thing. Things just, you know, it's the entropy in a system, right will only ever increase, right? Things decay over time. And I think one harsh reality about experimentation, or just generally growth work as a whole, is that you're doing a lot of work just to keep things at the same impact that they are right now. Like, you can't just build an onboarding experience that has, let's say, results in 50%. And you use a retention of the second week, and expect that to stay the same. In a year's time, that metric is going to go down over time, on its own, everything will dictate your metrics will decay, user experience will decay if you don't touch it. And so a lot of the work you're doing is simply to keep metrics level, and ideally, keep metrics going up. But there's so many macro factors that go into the performance of your product, whether it's, you know, the mix shift of your audience, right over time, as you go from high [phonetic 37:29] product market fit users to people that are a little bit, you know, more exploratory, or you go from organic to pay, or right now, where, you know, folks are a little bit more skeptical and spending money because they're unsure of the economy, or, you know, they have more choice, because COVID pandemic is over rice, they're more choice with their time. So they may spend want to spend less time your pot, but there's so many factors that change that you have to constantly be evaluating the performance of your product, and the experiences. And so sometimes just keeping a metric level is actually impressive, and a really tough thing to do.
Gavin Bryant 38:08
Thinking about negative test was there any resistance in the business within the product and growth teams when this was first floated
Jason Van Der Merwe 38:20
No, I think it came very naturally organically. You know, trying to kind of actually give ourselves data to make decisions on Strava, in many senses, has always had a lot of people who really want to be data driven. I think there was a little bit of resistance on the first email negative tests, because it was kind of like, you know, it's a little bit like, hey, like, you know, these things are cool, they always look beautiful. Strava's design and branding has always been awesome. I look great. Why? What's your problem with it? And also seemingly, like not that important, like, why would you want to turn off the emails. So they can be a little bit of resistance about that. But other than like, but for the most part, not really.
Jason Van Der Merwe 39:06
One of the most impactful negative tests we ran, that's in my articles around our performance of our kind of API's and just the general loading, start time performance. And, and I don't think there was a lot of resistance there. Because again, we really wanted to understand the impact and retention, the performance side, but this is the only way to do it. There was no other way to really easily do it. I think we started to do negative tests around like subscriber revenue, which I have done by mistake in the past, there would probably be a lot more resistance, you know,, like, you have to be very thoughtful around how you do experiments that result in a loss of revenue. So for instance, we did this whole new onboarding flow four or five years ago where we took away the subscriber upsell and onboarding flow coincided with that later, later somewhere else.
Jason Van Der Merwe 40:03
So it wasn't really like a true negative test. But in fact, really kind of was negative test, because we removed this screen entirely and saw, you know, our trial except go down. And I hadn't exactly told the revenue folks that cared about the ad, and I hadn't really predicted how much this would go down. And so they weren't the happiest, understandably, and so there are some areas where you're probably gonna get a little resistance. But I think we have a culture now where if we really did want to understand the true friction of like a trial experience, or something like that, we would try that. And actually, we have a great example where there's a part of our product [phonetic 40:45] I am going to be kind of high level here. But there's a part of our product where we've always sold, done a really good job, well, we thought, we've done a really good job of selling subscriptions. And we decided to do negative tests and basically turn them off. And turn off a lot of different kinds of upsells, and just general feature education in one spot, but including trials, and when select files [phonetic 41:07] go up as a result of turning them off entirely. Basically, we were probably annoying users in a certain area of product. And we decided we would, you know, just turn it off. There was a lot of reasons we did that.
Jason Van Der Merwe 41:21
Some of it was technical complexity, technical costs, etc. And the result wasn't just neutral. Like we were hoping, okay, this won't be that bad. And we'll get to rethink the system. Not only was it not neutral, it was positive, like we actually sold more subscriptions, because we took away a subscription upsell, right. And that was an amazing learning. That was very counter to a lot of what we had thought. But again, it was really cool that the culture there, folks from our kind of comms and product team, as well as our CRM team, were really down to try this. And so you know, props to all them for being open to this.
Gavin Bryant 41:59
As far as the execution of those experiments, how the negative tests are performed, are you still applying the same sample sizes durations and statistical rigor given that it is, you know, an A/B test of sorts?
Jason Van Der Merwe 42:14
Yeah, so this is typical tests are all done automatically the one right? So you just run it, and it tells you is this, basically we use P value? So is this stats, save or not stack sake? And how many days of stats think [phonetic 42:29] is it? So we typically wait for two days at a 95% confidence interval. So that's automatic, it's great. Don't have to do much. And you can see, in and out of stat state over time, or has it basically leveled out. So that's cool. And really easy to do. Depends on the part of the product in terms of how long we run the test for if it's something in the feed. With so many users in the feed, we can run something for a day, probably and get enough volume, we still typically will run things for mostly at least a week. So that we get, you know, the usage from Monday, all the way through Sunday of the product. But we don't always have to do that.
Gavin Bryant 43:15
My next question was … can a test be too negative? You mentioned that people need to be treading carefully around revenue orientated experiments. But do you think there's some areas people need to be wary or maybe off limits?
Jason Van Der Merwe 43:35
Yeah, for the most part, I believe really heavily in like teams, and individuals can make decisions use their judgment and discernment. And they'll figure things out. And, you know, we make mistakes as humans. It's part of it's part of our DNA, right? And people learn a lot people learn from doing right. It's not like I've read some book, and was like, oh, no, I know, experimentation, like, everything I know, is because I did wrong. The first time or multiple times, right? So negative tests, I've done the wrong test multiple times. And people have to kind of go through those journeys themselves. And I think that, you know, that makes people smarter. If they have more, you know, stories under their belt, they have better judgment, discernment.
Jason Van Der Merwe 44:23
So for the most part, I believe, in lot less, like rules and procedures around what people do, because I didn't have that, you know, I figured out eventually, you know, I think people we hire now are much smarter than me. So they learn these things a lot faster than I did. And, you know, it's kind of a, you know, the Give, give a person a fish, right or teach them to fish analogy. So, even on the revenue side, like people will learn, the best thing I can do is create a framework and safety net. So We have really great observability, right? If we run a test that hurts revenue metrics, we'll know within a few hours. So we've run tests where we try something and been like, Oh, we're dead. And this is not looking good. Turn it off. And you know, the impact of that is small. And then you learn. And I think, go back to the meta point of what the point of experimentation is, the point of experimentation is, it's a tool to learn. And so you have to be really focused on what are you trying to learn, and is that learning going to be interesting novel and useful in the future. And if you're running an experiment with that, there is no learning or you're not, you don't really care about the learning, then you shouldn't be running experiment. But if you're trying to change something, and do a negative test, with a very clear learning of, I really wonder what's going to happen to, you know, to the user experience, if we remove X or Y or whatever, and that learning will help you make more decisions and more projects in the future, then there's a value to that, right. And that value has some monetary, you know, value to it. You know, it's hard to put dollar value on, on learnings. But when we removed the-- When I kind of did a negative test on the trial screen for new users a few years ago, may not cost the company money, significant money. But we learned about how important it is to sell at a certain moment. And then user journey that we didn't really know before. And ever since then, that is the one rule is you don't move the screen. You don't ever have the screen. Because we know it's so important where it is and how it works. And, you know, it's a great learning, it was worth it in the long run. The short ride [phonetic 46:35] it kind of work. But yeah, so I think that's, I think for the most part, like people try stuff, make sure you're focused on on learning and value. And it's okay, if you have metrics, but there will be some guardrails, like, you know, for the most part, you're not supposed to roll out an experiment that hurts revenue, you can run an experiment that hurts revenue, but you can't roll it out. And if it's really hurting revenue, you need to turn it off, you know, soon.
Gavin Bryant 47:05
To your point having the right guardrail metrics in place so that an experiment can be paused or stopped if needed. Okay, let's wrap up with our fast four closing questions number one, what's next for experimentation at Strava?
Jason Van Der Merwe 47:13
Yep, exactly.
Jason Van Der Merwe 47:25
I think to the-- You know, you asked about what am I obsessing about? And I've talked about correlation and causation. That is something that the data science team is working hard on? How do we get better at understanding that relationship, there's a lot of ways you can do that through modeling, through negative tests through historical data. So we're looking at all kinds of different avenues to try to give us a better sense of how do we understand that relationship. So I think that for us would be a huge unlock, because then we will have a great framework in place to allow us to move rapidly and input metrics, without worrying about output metric relationship every time. And so and that just speeds you up. Because you're not questioning your metric, you're not questioning the value of your work. So I think that is probably the next big step for us. And maybe a minor step that we're constantly doing is just improving our experimentation. Platform itself, we have a team that gets to work on it. And so it's just improving the tooling around it and to reduce friction for our own internal users.
Gavin Bryant 48:33
Number two what's the most important lesson you've learned about experimentation?
Jason Van Der Merwe 48:46
It's a great question, I would say, is that you are probably wrong. You know, when I say you, I mean, talk to myself there, I am probably wrong in what I think is going to happen here. And someone else is probably wrong more often, than we're right. And in many cases, and over time, you develop intuition and understanding context based on repetition. So, you start to get much better at it. But even then, like, the one time, you know, the one or two times we wrong can often be usually very wrong. And so trying to just constantly check your assumptions at the door, and allow for as many ideas as possible from your team is really important. And I think experimentation, kind of culture is a really great inclusive culture for people to work in. Because the idea is saying like, we all have assumptions that are probably wrong about this. And everyone's ideas are welcome because we want to think about this problem from many different angles. And so whether you're straight out of college whether you are a designer giving you know feedback on performance or you know engineering giving me feedback on a you know, email all those ideas about, we may not be able to test all of those ideas. But we want to hear those ideas, we'd like to ideally test as many different ideas from different angles. And I don't mean anarchy, I mean, you know, thoughtful brainstorming, but some of them are most successful experiences that come from, you know, places where you wouldn't expect that to come from. And so, you know, right now, this is almost like a little bit of a running joke and growth at Strava, where I will announce where without bet on yes or no on this test, whether it will work or not. And then most of the time, like, half the time, I'm wrong. So I'll just be like, alright, I would bet against this experiment working, and then it works, or I bet this is gonna work. And then it can be wrong. And so it's just, you know, check your assumptions inti the door, and you're probably wrong. And try things like try things and be willing to do something new.
Gavin Bryant 50:53
Working around growth product experimentation requires a great deal of humility doesn't it
Gavin Bryant 50:57
Okay, number three what are you learning about right now that we should know about?
Jason Van Der Merwe 50:59
And if you don't have it, you will, it will be created for it.
Jason Van Der Merwe 51:18
I've been, I've been really enjoying a lot of the recent guests on Lenny's newsletter. And Lenny's podcast, some of these kind of leaders have these big, big organizations and big products over time, and how they think about their products. The usefulness, I think, really interesting to hear from his name is Scott, who is the head of product and strategy at Adobe. And he's talking about just general Adobe product strategy. And then he goes into onboarding experience and how important the onboarding experience is, and how over time you have to adapt onboarding experience because product market fit of your user or like how interested your user is in your product or skill they are is going to change, like so for Adobe, you know, to use Photoshop 10 years ago, you had to have a PhD, right? It was a complicated product. And nowadays, you know, there's Photoshop on your phone, and it's super cheap, right before you just pay a lot of money. 500 bucks, now you pay 10 bucks a month, use Photoshop, and now you're user in Photoshop is a lot more casual. And so they had to adapt their onboarding for a lot more casual users rather than like kind of the pro designers or photographers. And so that's your new user experience has to change.
Jason Van Der Merwe 52:35
So hearing from folks like that, where there are thinking so big in terms of the step function changes in the product strategy of their business, but then there are still thinking about these kind of experiences and kind of almost optimizations around work because people think optimization or small tweaks, but these optimizations along the way is really interesting, because at the end of the day, point of growth is to connect users with the value of your product. And so both people have to care about the value that is being created in their product, right. So you have to care about the core product experience, and that it's getting better and that you're delivering delightful experiences to users. Even if you're not the one building those, you still have to care about those and have an opinion and understand those because the whole point is you're selling that, right, a growth person is literally a salesperson for the product. So just hearing from those folks is really interesting. So I would encourage anyone else to just go listen to the Lenny's podcasts and read his newsletters.
Gavin Bryant 53:35
Question: What are the top three books or resources that you recommend to our audience you just touched on Lenny's newsletter and then Lenny's podcast. Is there anything else that you're reading or listening to at the moment?
Jason Van Der Merwe 53:52
Um, no, I mean, I think, you know, there's a lot of people who just write about growth. So Lenny’s [phonetic] newsletters a great overall one, John Egan, has a website where he blogs about similar kind of growth, things that I do that he's done it at a really high level. So a little bit more of an engineering side. So he'll go into- He'l talk about an the importance of certain types of metrics, or he'll go into how to set up your email servers to make sure that you're not getting blocked. Things like that. So he goes across a very wide span of topics, which is really cool. And then I think behavioral psychology is really important in the growth world. And so I love Freakonomics, the podcast, just to listen to because I think just learning about how humans think and work and care about that will apply to your product at some point. So I love the Freakonomics Podcast just you'll learn about different topics and so it's not you know, specifically voice growth for product focus, but I think it's a great one
Gavin Bryant 55:02
Excellent! Let's leave it there. Jason, Thank you so much for your time today really appreciate you having on the pod
Jason Van Der Merwe 55:08
Yeah, thanks for having me.
“At Strava we’re constantly experimenting on the process of experimentation. How do we make experimentation more efficient, and how do we communicate the learnings of our experiments better?”.
Highlights
Learning mobile development at Stanford University - “something that was instilled in me was that the outcome is not the first iteration of code or the software you're building. You're on a journey with your user. There's always a better version of the product that you’re trying to uncover. Experimentation is the tool to get you there”
Organising for growth - Strava split product into Core Product and Growth Product. Jason’s Engineering and Product Growth team include Product Management, Design, Engineering, Product Marketing, Growth Marketing and Analytics. Team organise around key lifecycle moments - New User Acquisition, New User Activation, Subscriber Acquisition etc.
You want the remit of Product and Growth teams to be purposefully broad so that they are empowered to solve all related customer problems. There should be no constraints placed on teams around what they should care about. Teams should care about users and business metrics
Product experience - a user doesn’t differentiate between a push notification, an email or an in app message. It’s all part of the product experience. It’s all a growth concern
Getting started with experimentation - when you’re starting out with experimentation you need to learn how to experiment. Strava were performing simple experiments around the new user experience, trying to understand what was happening in different cohorts and evaluating the performance of each experiment
Strava’s Activation (A Ha) Moment - the Activation moment was identified as one upload and one follow in the first seven days. Strava identified the Activation moment in two ways (1). Conducting user research to understand what users were trying to achieve (2). Performing correlation analysis to understand what actions were most correlated with user Retention - week one, week two, month two, month six. User Activation was a high-priority experimentation focus for a long time
Increasing experimentation velocity - comes from performing lots of sets and reps. The more sets and reps that are performed, the faster teams can move from Idea > Hypothesis > Experiment. Over time the business became more effective at designing, executing and analysing experiments
Strava were constantly experimenting on the process of experimentation. How do we make experimentation more efficient, and how do we communicate the results of experiments better?
Early in the experimentation journey, progress is better than perfection. For six years, Strava documented experimentation results in a simple Google Slides format. Only recently have experimentation teams switched to Confluence, documenting experiments in more detail, in a searchable format
Automate repeatable tasks - Strava built an internal Experimentation Reporting Framework to analyse and evaluate experiments. Data Scientists were writing the same SQL queries over and over to evaluate experiment performance. These queries were eventually automated
Experimentation shouldn’t be an end game or goal. Experimentation is how you measure the impact of product changes on users, user journeys and user experience. Strava rely heavily on experimentation to understand how the changes they’re making to the product impact outcome metrics like retention and subscriptions
Strava’s biggest experimentation challenge - being able to link short-term, experimentation input metrics to long-term business output metrics in a causal way. Data Scientists are performing a lot of correlative analysis to identify potential causal relationships between the two (I.e., users who regularly view Heart Rate data being a proxy for increased user retention). This is very difficult to do. Human behaviour is very complex
WHAT IS A NEGATIVE TEST - a negative test is a type of A/B test where you purposefully remove an element or component of a user experience and measure the impact on user behaviour
WHY PERFORM NEGATIVE TESTS? - a product, user experience or feature can become complex and bloated over time due to the addition of many layers of product change. Products are built in piecemeal fashion and can easily become over-engineered. Product experiences and product ecosystems decay over time. What worked at one point, may not be effective anymore. It can be difficult to understand what is truly driving value for users in your product
BENEFITS OF NEGATIVE TESTS - negative A/B tests enable product teams to strip away and remove product complexity to understand what works. Firstly, it’s a great way to understand what elements of the product are actually driving value for users. Secondly, it helps teams to challenge status quo and dispel strongly held organisational myths and assumptions about the product. Thirdly, it can highlight opportunities for the product experience to be enhanced or improved. Fourthly, it’s easier to build the product back up from a simplified starting point
HOW TO PERFORM NEGATIVE TESTS - you need to have clear experimentation guardrail metrics in place to monitor and measure the impacts of Negative Tests. Be prepared to pause or stop experiments if guardrail metrics are negatively impacted
Strava experimentation culture - allow teams and individuals to make decisions based on their own judgement. As humans, we all make mistakes. It’s part of our DNA. Making mistakes is how we learn. Ensure that you have experimentation safety nets and guardrails in place to provide people with a soft landing
In this episode we discuss:
How mobile development was a gateway into experimentation
How Strava organise for experimentation and growth
An overview of Strava’s experimentation journey
Learning about experimentation from Dropbox, Twitter and Duolingo
An overview of Strava’s experimentation culture
What Jason’s obsessing about with experimentation right now
Strava’s biggest challenge with growth and experimentation
What is a Negative Test?
Why you should be performing Negative Tests on your product
How to use Negative Tests to create more value for users