Distributed Design and Delivery

In modern software, I like to work towards a general set of assumptions I call “distributed design and delivery”.

It’s a little buzzword-ish, but it clearly states what I think are safe global assumptions to work towards — and I’m going to use “D3” to reference the process and the SRE/Devops teams currently involved or managing these things.

The general idea is building groups that can fully own and understand not only their own domains, but also be in communication with other groups about the overall design and expectations for delivery.

If we want our developers to be good practitioners of distributed design and delivery, then we need to lead by example and provide solutions that are usable, measurable, and valuable for them.

I’d like to take a moment and clarify what each part of that name means, and the baseline assumptions for those things.

Distributed

Any group in the organization can do a thing, and advertise it as a service to any other group.

Decision making and knowledge needs to be loosely held, but to make all groups aware of the overall strategy and goals for their product.

Decentralized does not mean disorganized, so best practices and shared services should be widely socialized and agreed upon by every autonomous group.

Design

There is a clear vision and communicative customer for the final product we deliver, and all groups involved can adjust and notify others of alterations to the design.

The customer is directly available to answer questions, and decisions should be documented in a way that’s easy to query and visible for every stakeholder.

Delivery

Making the product available in the manner we promised, when we promised, by providing clear control of the necessary levers to perform all actions needed to bring blessed product to market, in the smallest and most manageable increments.

General Philosophy and Reasoning

A lot of teams engaged in digital transformation will try and take on too much, and then it fails to make an impact — or worse, see it become more overhead in a process that’s already overloaded.

This can lead to negative feedback loops around behaviors we want to encourage, but if the process and culture don’t make things easier at once, we need to acknowledge, measure, and adjust based on that feedback.

I tend to find most technical problems are trivial, and I tend to find most process and culture problems novel.

What this means in practice is that the technical implementations of things can be well understood only when the process and organizational communication beneath it fosters sustainable and reliable output.

I liken this to Anna Karenina, in that all happy organizations are providing the fundamental foundation for software delivery and will likely have an easy time doing technical things, but unhappy organizations are unhappy in different ways.

For an unhappy organization to get those foundational things reliable, they will have different issues and concerns than other organizations, and therefore should start changing their process and culture incrementally with clear priorities and goals.

Avoiding Constraints

If a devops or SRE team are the key holders for specific pieces of technology or process, then that team is now a constraint on scale for the organization.

Generally, there are many more developers than D3 people, so scaling developers is much more granular than D3 staff.

This will lead towards skill gaps, inefficiency, and eventually underperformance of teams that are now constrained.

Decentralizing access and basic use of a thing is generally good, bot that doesn’t mean that everyone is on their own. Having Subject Matter Experts in different domains is great, so long as the normal process doesn’t require their direct contribution for other teams to do their jobs.

Many times, I find that “expertise” becomes more about organizational and legacy knowledge than about mastery of the D3 principles, and that introducing incremental changes towards that goal helps make things move faster with less human intervention.

Focus on Shared Objectives, Own Your Part

Instead of being customer focused and goal oriented, teams should be customer oriented and goal focused.

This sounds trite, but if teams are focusing on the external inputs rather than their part of the internal output, then forecasting and feedback attenuate quickly when abstraction of tasks doesn’t give an equally simplified abstraction of work.

By having the person making the request for the thing directly involved with those making it, we remain focused on doing the immediate thing, while understanding the overall goal and general guidelines in case something unexpected happens.

In practice, this tends to push organizations to limit breaking up of units of work across enormous domains, in essence making one large development team that builds half the platform, but several ops teams focused on small functional slices of the operational work.

Real Life Is Messy

Unexpected work and bad process will always be an issue — if existing processes were working well we would not be so obsessed with digital transformations.

So when planning team structure and communication lines to start defining inputs, outputs, and functional domains for work, try and encourage teams to make baby steps from where they are rather than define exactly how they’re going to get to the strategic goal.

To start with, just try and remain consistent in every new promise or assumption advertised to other teams, and if that becomes too much, take a step back, or even stop making that advertisement.

The whole underpinning of a functional production model should be trust, by acknowledgement of problems and unexpected new discoveries we show others that we tolerate risk taking and rolling back as needed.

Don’t Do Work You Don’t Have To

Too many devops and sre practices are overly focused on eliminating specific things which don’t actually impact the delivered product, like making pipelines that allow for excessive variability without waiting to be asked for those cases.

Getting a big request and committing to trying just one small spike of that should be encouraged, and if it doesn’t work we adjust around to solve it as a cohesive system.

Many times, we’ll find that a use case with basic assumptions and limited configurability will be sufficient for most work, and then use an iterative process to use live feedback to measure impact and delta of new features.

Always Measure Everything

A major part of most unhappy organizations’ problems stem from a lack of objective measurements, and an agreement on what those should look like.

A team which is sprinting towards a short term goal needs to be aware of what other teams are doing, and how their work impacts those other teams.

To do so, we need to make sure that part of the deliverable from development to ops includes relevant metrics and monitoring, thresholds and alert settings, and to advertise these measurements in a way that is naturally accessible to those who need it.

Too often, unhappy organizations will have developers make the thing, throw it to QA who tests the thing, who throw it to Release who deploys the thing, and then to Ops who manage the thing.

Segmenting these domains isn’t necessarily a bad thing, but if the teams involved don’t have a clear expectation of the inputs and outputs, and have ways to measure the success of their part of the process, then risk is introduced at every handoff.

Ambiguity is the enemy of quality, so having clearly defined measurements which we can use to ensure that our work is not impacting reliability is essential.

Final Thoughts

Doing good work when the foundation is in place is the norm, but until the services and knowledge are at a level where teams can easily consume and improve them, it will require heroic efforts to maintain or scale delivery for unhappy organizations.

By keeping the focus on doing smaller incremental changes before undertaking major transformative projects, organizations can get themselves in a better place to have the bandwidth and reliability to make substantial changes.

In many cases, the biggest barriers to digital transformation are legacy processes, particularly the problems and burnout that emanate from those processes in everyday use.

Solving those issues requires that teams be given the agency and autonomy to do whatever they think is best to do their jobs, while ensuring communication and visibility to allow external stakeholders to know what they need to do their jobs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.