so I've given this talk not you know
this exact talk but a version of this in
a quality discussion with customers and
you know we talked about how things have
changed in the cloud cadence and the
combine engineering and ship flap and
you know at the end of the presentation
they say wow that seems pretty drastic I
don't know if you are ready to do that
and the reality is that you know our
testing approach at Microsoft has been
evolving for quite some time I would say
for more than a decade so before I go
into what happened during the cloud
cadence I'll just give you a quick
history of what happened before that
because the customers that you might
talk to they may be at different phases
in this transformation and you know I
won't spend too much time in the h3 but
hopefully that will give you a sense of
how to kind of guide the customers
through this in case they are somewhere
you know prior to where the cloud
cadence happened in bsts
so you know I'm going to take you back
to 90s you know so for as long as
Microsoft shipped product we we always
had three distinct disciplines in in a
product team you know PM Devon Tosh PM's
gather customer requirements wrote specs
dev wrote code and design code and test
road tests roughly we would have a 1 is
to 1 is to 1.5 ratio it kind of varied
by teams but that's the general ratio we
used now within test we are two distinct
disciplines now many people may not know
this but this was a unique set up at
Microsoft where within test we would
have a software design engineering test
these are the s debts who developed the
automation the test infrastructure etc
and then software test engineer or the
STS who ran the automation or who ran
manual tests and this is a key point the
software design engineering touched we
are hired you know by the very similar
qualification as these of design
engineers or the developers you went to
the same colleges and if you hired from
industry would pretty much higher
developers and then convert them into SD
it is you know remember this point where
I come and talk about the combine
engineering because
this is this is important in in
particularly the way the test discipline
was set up at Microsoft so how did it
work well it worked reasonably well back
in the days you know we achieved
commercial success with big products
like Windows and Windows and Office one
of the benefits of this model was that
when we are ready to do a product
sign-off you will have the quality
discipline of the test discipline bring
in a very formal sign of criteria and
and formal measurements in quality and
so that give us a pretty good confidence
in you know declaring a product ready to
release it also developed deep expertise
in testing because the test discipline
was solely focused on testing they were
thinking about this day in day out so
that was the great thing but did it
really work though and the answer is no
it did not work there were problems the
problems were simply masked by the fact
that a we had commercial success of our
big products and B there was a long
product cycle but there are numerous
problems you know so the developers just
through the core over to the wall to the
testers s debts the s debts road
automation and then through that overall
to the ste the the software test
engineers so and these ste is the way
they responded is by just keep adding
more and more STS particularly the
vendors there was really no growth
opportunity for them that ste is because
they really didn't have any mobility
they couldn't go anywhere it was very
expensive to maintain this this set up
and testing became water-like and cause
product delays but again we couldn't see
some we didn't feel as much because our
product cycle was long we've shipped
Windows every two years or three years
so this sort of worked by V around 2000
late 90s it became very clear in the
company that this wasn't working and we
had to change something so a company my
decision was made to get rid of the STS
from the test discipline VRS debts and
STS I've no more STS they are gone
and we did that it was actually very
painful because those STS remember the
STS didn't have the same qualification
as the S debts and so a lot of them we
try to find them new roles in the
companies you know if some of them did
but many
many of them didn't so this this sort of
improved the model a little bit in the
sense that now you have s DT is one you
know responsible for not only writing
automation but operating the auto
automation so they they own the whole
thing so they were naturally incentive
as to a right good automation be you
know right more automation instead of
just throwing tests over to the another
team to take care of just running the
test they were now responsible for for
for it all but the core problems still
remain the developers would you know
through the code or wall to this s
that's a step are constantly trying to
catch up so we got clever and we said
you know what we're going to introduce
the thing called quality milestone or an
mq now these are milestone you
reintroduce or will have after a product
is released and before the next product
is about to start will block off a
certain period of time and say we you
know whatever the quality debt or test
that we've accumulated in the previous
release will just catch up and fix it
there now clever idea but it didn't work
it didn't work in practice
for just a couple of reasons one is that
now people knew that there was a
milestone coming up called quality so
they were just before the quality over
to that milestone and the other issue is
that your milestone dedicated to quality
so people would conjure up all kinds of
quality initiatives and things that they
think is creative inside the quality
realm and try to you know schedule that
work cause priority inversions and
sometimes that what didn't get done and
we just accumulated more more debt so a
clever idea that didn't really work so
the test was still a bottleneck but we
again we survived because we are in this
waterfall waterfall world
then came the cloud cadence so that I
have all of the cloud cadence around 20
2008 timeframe 2010 and it brought new
pressure on the system now there is
expectation that the you know we are
running a much faster faster cycle and
the expectation just continued to
increase you know faster faster faster
we long you know the gone are those
longer stabilization phases we don't
have
the opportunity to create a betta and
give it to customers do dog food you
know so those kind of validation phases
are gone which are crutches in the past
but they are now gone you know you're
living in the world of micro services
these micro services are deployed
independently so there is pretty much
pretty significant complexity in terms
of getting those services right the
quality right on an independent cadence
you know talked about how we had to
support no downtime deployments these
services need to stay up all the time
and so so what did we do well we knew
how to ship software for last 25 years
so we said well just use the same
approach just try to do it faster you
know if we went from two-year cycle to
six-month cycle to three-week cycle and
just try to figure out how to do
whatever we knew just do it faster so
our initial approach was same model run
faster he pushed for kind of getting
automation more streamlined and we got
clever again and we said oh what are the
ways we can deal with this is that we
don't need to run all the tasks guess
what you know we can be very smart about
which tests to run we'll pick some tests
here and pick some tests there and
that's how we'll survive but it was just
a matter of survival
it became very clear to us that the
model wasn't working and so the you know
we started seeing all kinds of issues
you know testing was a major bottleneck
by this time particularly bsts I think
bill or somebody mentioned that we had
some sprints where we do a three-week
sprint cycle we finished that then we go
through another three week of
stabilization by the time is a sprint
got deployed would take another three
weeks and this dead start running around
trying to stabilize the system and
deploy it the meantime though then the
work on the next sprint is already
completed and and so they're just you
know they're trying to catch up and in
the cycle would continue you know same
issues lack of accountability on the day
of the short the short version is that
we recognize that this model wasn't
working and we in fact we were not the
first one to recognize it there were
services before us like Bing was one of
the major services at Microsoft that
that saw this
and we we started seeing observing this
is you know based on the practices some
of the companies born in the cloud they
were following in the industry so so we
we we knew that maybe we needed a new
model in the in the cloud cadence so
that's what we we get to this point my
rest of the talk is about what happened
in the cloud cadence I just want to give
you a flavor of what happened before
that because you might run into
customers who probably still have the
world where you have Ste Zoo are running
around writing a running manual test and
there is not a lot of emphasis on
automation so you have to kind of bring
them along in the journey before you
kind of talk about some of the other
stuff you know I'll walk you through so
what happened what happened in the cloud
cadence in you know this is sort of
pretty much sums up the three big things
that we changed we changed the quality
ownership we fixed the quality
accountability so that's number one
the second thing is that we understood
that in order to ship frequently out of
a release branch you need to have a
master that is also in a pretty good
shape it's in always a shippable state
you know you see you saw Bill talk about
how you work in the master and then the
release will be released it's the
quality is not just about getting the
release branch right it's it's actually
quality you know starts in the master
branch and keeping it in a suitable
State now that you know the statement
about a lot of things sort of the code
flow and you know sort of how the branch
mechanics top that we'll talk about but
from testing perspective we focused on
two things one is this concept of shift
left shift left testing and I'll talk
about that in a second and then the
second thing was energy kicking it a
little getting rid of all the test
flakiness in the system the other thing
that we understood is that there is no
place like production this this is a
this is sort of I would call this the
shift right part of the strategy so on
shift flat you know run tests close to
the code run more unit tests to me ship
Fridays you know run tests close to
production because there is no place
like production and it's a set of
practices about
sort of both safeguarding the production
as well as ensuring quality in in
production so we you know even sense we
got rid of the the testing that was
happening in the middle you know sort of
the your integration style testing
functional testing that used to happen
in the lab that was the big departure
here all right so I'm going to walk
through each of these concepts in a
little bit more detail quality ownership
so though we did we did combine
engineering you heard this term before
you know we've talked about this
combined engineering in a nutshell is
you know those two disciplines Devon
test two roles taking those two roles
and merging them and putting it on a
single discipline single role call an
engineer so we got rid of the two SDNS
that roles just one role engineer the
key thing is that when we did this that
there is so first of all there is a that
individual has a combined responsibility
for both dev and and test so and it so
it's not just an organizational change
where you bring the dev and test him
together
it's an actual discipline merge you know
if you think about the set of
qualification or requirements of SDE for
set of qualification required per se
duties you merge them into a single set
that that's what this was and so
everyone had to learn new skills a lot
of times when I talk about this the
first question I get is this so what
happened to those s Nets they learn how
to write code well the reality is that
you know you remember the qualification
I mentioned earlier they knew how to
write code they got a little bit rusty
in terms of their design skills but this
was also a learning for the for the
developers because developers now have
to learn how to write tests write
automation run - you know do manual
testing do exploratory testing you know
things like that
so this required learning on both sides
and I think that's a key point when a
lot of times people talk about combine
engineering they say oh ok that means I
need to train my testers to be more like
there's no no it actually goes both ways
the other key concept here was that we
you know the idea behind this is that
you want to reduce handoffs there is no
you know in a short cadence you don't
have the opportunity to start somewhere
you know write your code give it to
another team to test and then give it to
maybe another team to do performance
testing and give it to another team to
do deployments the basic idea was that
we wanted to reduce handoffs in the in
the team and give an end to an
accountability to a feature team tour to
an engineer inside a feature team so
this was a big cultural shift across the
company this change happened in one team
but then it over few years every team
across Microsoft changed now different
divisions took a slight different
slightly different approach to doing
this in some cases like us in bsts when
we did combine engineering it was pure
like you know we just merged the two
disciplines there is no other team that
is responsible for quality every feature
team owns its own feature area featuring
equality so in some other orgs they they
had you know they still left another
small team and to kind of look after the
live site telemetry or live site
instrumentation you know things like
that but ultimately if you kind of
fast-forward now just about all teams
that Microsoft follows this model how
did we how did we make this transition I
think this is this is important thing to
talk about because it like I said it's a
pretty drastic change there our first
transition that I talked about where we
got rid of the SD rolls was very painful
and we had learned from that a lot so
when we rolled out this change
particularly bsts
fortunately for us there were a couple
of other teams that had done this at
Microsoft so we went and talked to them
we learned from them we had a lot of
discussions in the org kind of getting
the team ready to do this one of the
things that we were very concerned about
was that you know there is all these
things that
test him does some of it what I
described at the time is dark matter
like nobody understands what they do but
they do it and somehow that's magic
happens and the in the right quality you
know happens at the end
so we meticulously went and inventoried
everything that the test team does it
was a spreadsheet giant spreadsheet I
forgot how many rows but there was rows
of like we you know it's not just like
we done automation or we write
automation it was all the little things
that the test team dead to kind of keep
track of quality in the org and we made
sure that all those responsibilities
were reassigned to somebody in the org
basically to these new roles that was
that was very key the second thing we we
were very clear about is that this is
not just changing roles and
responsibilities we are going to have to
change the way we test period if we
continue to test the way we were testing
before it's not gonna work in this new
world so this is where you know I'll
talk about the shift left testing
testing in production those concepts
were not only internalized but practiced
in the org and we give ourselves about
twelve months to go through this
transition now remember when we when we
did this six months later we were
shipping TFS 2015 so the the litmus test
was getting the quality right for TFS
2015 so we we said we will give
ourselves about twelve months it means
in during that time the the SDS and as
debts will will start off kind of
basically in their old roles but slowly
evolved into doing the combined
responsibility so you have a you may
have a feature team within that the the
people who were former as debts they
continue to do more of the S Network and
the former SDS continue to do more of
that SD works but sprint by Sprint the
ratio kept changing and eventually you
know after six months you cannot
recognize it there from the tests in the
org so that's kind of how how we managed
it now at the end of the transition
there were people some people who didn't
quite make the transition
and you know that was that was the sad
reality but we we supported the
transition through training through sort
of just the development of the new
skills letting people practice practice
sprint after spinning after sprint and
so kind of just giving yourself a more
practical timeframe to go do this is key
well I think feel free to ask me
questions otherwise this will yeah go
ahead
so in this new model where everyone on
the team is an engineer and how do you
take the responsibilities that were
previously spread across the quality
organization and when he ate
responsibilities on a team where
theoretically everyone has the same
skill set of responsibilities I'm
wondering how that how the division of
labor actually occurs is all right now
when I cook on the team I'm on for
instance our test automation occurs with
different engineers and they're
automating to test but there and they're
writing code but they're not writing
features right and it's on the same team
so theoretically we're doing this too
but it seems like it's different from
what you're describing - yes it is
different and that's I think it's
important thing to clarify in the
beginning it looked like what you just
said so think let's take a particular
feature team in the world old set up we
had five developers maybe five testers
and that constituted a feature team we
bring them together under a single
engineering manager so now that any
remain manager has ten engineers working
for him responsible for the same areas
sprint one after combine engineering
happened it probably looked very similar
to what you just described that the the
the the tester was still spending most
of time developing tests the developers
were spending most of the time
developing code and design but the North
Star was clear the North Star was that
an engineer who owns a feature owns it
end-to-end they can take lot of help
they can get lot of peer reviews of the
of the of their test plants of their
design of the
their telemetry they can get in fact
they were encouraged to get a lot of
help in terms of peer reviews but the
expectation was that the next sprint
guess what they will be the one writing
test automation for the feature that
they own maybe they start with a small
feature that where they do that and and
the same is true for the tester the
tester started picking up small features
of the backlog and they said we'll own
these features end-to-end all the way
from design phase to deploying to
production and and then monitoring into
production so it started off like that
and then over time we expected devs to
pick up more and more the tesa
sponsibility and vice versa you flip
roles at times also and and that's how
in that's that's why what I mean by
allowing the team the 12 months duration
to sort of transition into this new
world you speak to a little bit about
what happened to like your team's
velocity and particularly the
development velocity then producing some
business value did that suffered during
this transition period and you guys feel
like you're back to where it was yeah so
on the velocity I don't know if Aaron
showed you a chart that showed sort of
our feature one of the ways we measured
a Pilate was the velocity was just
number of features delivered in a every
year on average per sprint and if you
look at that chart it's it's constant
it's been constantly going up since 2012
I believe we've been tracking and now
2017 so the short answer to your
question is no with the feature velocity
did not did not drop because remember
you still have the same number of
engineers in the future team you just
took the two separate teams you put it
together yes you're spending little bit
more time in terms of learning and
development and training sort of new
skills but there was also an efficiency
gate through this process and that is
the key efficience again is that you are
not handing things off to another team
when you hand things off to another team
guess what happens this is context
switch this is like one thread waiting
for the other thread to complete and
then then it has to pick up the con
and kind of run run again that constant
back input that is to happen between dev
and test that's gone and so you you gain
quite a lot so the longer and you
absolutely gain velocity you absolutely
gain more capacity in the short run you
could argue that hey there is there is a
some period of training and learning so
you so but it's it's it's a good
investment in just building out the
rounding out the skills and you'll see
as I talked about in a second the change
was pretty profound across New York it's
not just by feature team we got rid of
in fact I can talk about that now we we
got rid of basically this notion of
specialization there is you there are no
handoffs
you don't take a feature you write it
you design it you give it to another
person to test it then you give it to
another person to deploy it maybe
there's another person like we'll talk
about is a branch mechanic whose job is
to push code around maybe there's
another person whose job is to make sure
the product is ready and it's got the
right performance metrics like all this
different there's another person who's
testing the deployment configuration
testing you know things like that we we
took the core principle that there are
no central teams there are no
specialized teams that do certain tasks
that was a core principle but at the
same time we understood the importance
of specialization specialization is
important creating a central team where
you hand things off to in a fast cadence
is is a problem so we didn't want to
lose specialization so we did form a
bunch of V teams and I know what
deliberately call them V teams because
these are not dedicated teams the right
now in pranks or there is only one team
you could call that a dedicated team
that does you know sort of the EPS team
or the team that runs our central
engineering system it's a small small
feature team but they again they do core
engineering work for the they they're
contributing to the engineering system
the nobody is handing things off to
their team so set of V teams we form one
of them
was best architecture we team now this
was a new new thing we didn't have this
before we had an architectural we team
that looked after the product
architecture we didn't have anybody
looking after the test architecture and
remember we we learned that we knew that
we had to change the way we test which
means we had to rebuild our test
infrastructure we had to rebuild the way
we ask our tests from the ground up so
we we picked our senior-most engineer in
fact bill you know it's a partner i see
in the orc senior-most i see he said
you're going to lead this team and he
had a set of other engineers from across
york part of the v team and this team's
job was to like I said it not only build
the next architecture for - but champion
set of practices that we were talking
about yes this question can you please
explain the concept of each team via
team means virtual team so these are
members from different parts of the
organization okay it's not a dedicated
team reporting to a single manager
that's what I mean
we had tests sorry tenets champs V team
so what are ten attempts so these are
people who are looking after your
subject matter experts who are looking
after some specialized activity that we
do in - whether it's making sure the
product is accessible whether it's
making sure product has good performance
and reliability it's its global ready
you know things like that this used to
be again you know largely be done by the
by the test team in the past in the new
world we refactor this responsibilities
again every feature team is responsible
for making sure that their feature is
accessible is performance is global
ready but we we would have a we team of
experts from throughout the
organizations whose job is to build deep
expertise in this type of activities
this type of work
so the subject matter expertise is still
valued in the org specialization is
still valued in the org the the main
difference is that it's not you know
sort of consolidated into a
dedicated team I mentioned performance
we team you know this is important
because when you're looking at service
performance product performance
oftentimes you find bottlenecks in let's
say you are a and one of the top-level
feature and you're doing performance
testing for work item tracking and you
find water like somewhere deeper in the
system which is owned by somebody else
so we formed a performance V teams job
was to identify common bottlenecks
across the entire product and come up
with the right design solutions and
drives that you know this kind of work
you cannot just farm it out to
individual feature team because the
performance is an end-to-end end-to-end
problem is not isolated to a particular
layer of the product of the product
couple of other B teams be marred don't
even ask me what to be more stand for
because I'd right here at the moment I
may not be able to figure that out but
it's B mods are the people who look
after our daily build health and the CI
health and these guys are constantly
watching the builds and the runs and if
there are any failures in the in them
they do a quick triage and assigned to
the appropriate owners so we formed a
beam or team and and you'll see that
over time the the size of the beam or
team also shrunk as well as the what we
expected be much to do also shrunk as
the system and the engineering system
got better
finally we retained a small vendor
vendor we team that own sounds really
hard to automate type of tests he knows
like config tests you know we TFS being
deployed on on Prem different
configuration environment but again over
the last three years this team has
constantly shrunk because we every year
we ask the question why do we need so
many vendors who need to do this manual
testing let's go automate that or let's
figure out a different way of running
those tests so that's that's the end of
sort of what happened in terms of
changing the quality ownership and
bility so good question about what
tenant champ sweetie my donors in the
word tenant I really do yeah so tenet
means
so performance is a tenant accessibility
is a tenant like an aspect yeah aspect
of the product there you go yeah an
attribute of the product yeah
quality here to the other question was I
heard from Scott good three in a person
take some time ago the move to
everything being done through the
command line
did that help in automating some of
those aspects that the vendor team was
having to do manually no vendor team is
actually you know there it today our
render teams doing things like we have
TFS on Prem and can be deployed on so
many different configuration that we can
automate that in theory it will require
a significant amount of investment and
you know for things like that where the
cost of automating you know
significantly more than you know kind of
cost of just running it through
extensive the questions so it's it's a
matter of trade off that's right
initially it was a matter of survival
because remember we came from a world
where you're half the team that is
basically running tests and writing
tests to world where suddenly that
responsibility is is kind of you know
distributed out to the org and so
initially just you know as we
inventoried the whole list of things
that the test team was doing and we knew
that there was a good chunk of testing
doing this manual testing and even then
we had the vendor team even test him
used to retain a vendor team that would
run this kind of hard to automate tests
we didn't want that to drop on the floor
the creaky corrected are going through
this was we are shipping TFS 2015 in six
months it needs to be as good quality if
not better then it was in the previous
model not only that we are you know
shipping whenever you would continue to
ship to the cloud every three weeks so
in in going through this the key
criteria was that nothing should fall on
the floor nothing should go slip through
the crack so even if we were doing
something that was not
optimally design or efficient we just
continue to to run that in the new world
until we figured out a way to do it
better doesn't make sense the initial
thing was take whatever we have just
refactor give it to different set of
people meaning give it to the feature
team but don't drop it on the floor even
if it looks questionable like why are we
why are we running this test it's not
adding any value just keep running it
for now until you figure out that there
is there is a different way to do that
yeah question so if I understand well
these are like will CH all teams does
this seem that these people have other
assignments and if this is so was the
ratio between their capacity in these
assignments and some other things that
they do yeah so these are the same
people that we have in the feature teams
these in and so let's take a specific
example let's take accessibility for
accessibility we have subject matter
expert by location so in Redmond we have
two people who are some accessibility
except export and we have another couple
people in North Carolina our locations
and maybe another couple people in India
they they are deep expert in
accessibility techniques philosophy etc
but they are engineers inside a
particular feature team they just happen
to have this secondary responsibility
accessibility happens to be one of those
tenets that requires quite a bit of it's
not work but there's a quite a bit of
responsibility on that subject matter
expert so the person who's the
accessibility champ probably spends half
their time doing accessibility and half
the time doing feature work but the idea
is that that responsibility also over
time rotates so it's not the same person
every single sprint we pitched it to the
same person for a given release so TFS
2018 there's 1% maybe TF is 2019 we'll
try to give that respond you to somebody
else so nobody is doing this for for
life if you will
in the number of experts vary by by the
tenant that we are talking about I
performance via team I think it has
about dozen people yeah the question so
in this model I see that now individual
is probably taking care of many things
now one who did testing had to take care
of so many tenants and things now as a
developer he has other responsibilities
so I imagine you manage that by maybe
making smaller features and things like
that
did you have any challenges with
entering coverage because now my
responsibility as a little is even
smaller and how did you manage that end
to end were there any gaps that were
unveiled by these things so if I
understand your question correctly so
yes you know it it appears on the
surface it looks like your engineer is
now doing twice the amount of work you
know feature team because you know the
previously you have I'm our dev our
tester who's paired up with me and he's
doing half the part of the feature work
or the testing work now I am responsible
for the feature but remember now I have
if I am the manager of the feature team
I have twice as me you know sorry I'd
pass as many engineer size as I had
before you know so the net capacity
hasn't changed net capacities are still
the same what I give you more time one
of the two right you know yeah
[Music]
Wednesday, February 4, 2026
Combining Dev and Test in the Org
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment