On Programming and Applications Development: 2008

Sunday, October 26, 2008

The Ultimate Continuous Integration Tool

Warning: this entry is out there. I can not be blamed for wasting your brain cycles. However, by taking things to the extreme, some value may be gained.

I'd like to continuously integrate all the code I write, and I want this to happen very continuously. And that would happen if my code is integrated on every key stroke.
My IDE, the build machine (CI server), and the source control system will be very smart. Every time I type key on the keyboard, they will mark a revision of the full code base and start running the build. Let's call this code revision a candidate revision (CR). The build, of course, includes compiling the code, running all unit tests, perhaps even functional tests, and whatever else we want to include. If the build passes, the CR is automatically checked into source control. If the build doesn't pass, then the whole thread is ignored. Obviously, since I won't stop pounding on the keyboard, the very powerful build machine will be running multiple builds at the same time. After all, the computers are not yet that advanced, that we should expect the build machine to be able to run builds instantly. Oh well.
At the same time, the system will be doing the reverse: It will be continuously merging all successful revisions from source control (SC) into my local code base, since everyone else is so productively writing code at the same time as I am.
How could this work? Well, my IDE will be so well integrated with the super powerful CI server. The CI server will be the single authority deciding whether a build is passing, and thus would be checked into source control. The CI server will be running many builds at the same time, but will act as if all the changes occurred sequentially.
The CI server will be receiving a stream of candidate revisions (ok, just deltas) from multiple developer machines. It always starts from a successful source code revision (CR0). Based on the order a candidate revision (CR1) is received, it will run the build and determine if it should be checked into SC. At the same time, the CI server is also processing other candidate revisions (CR2, 3, ...) from myself and other developers. The CI server will be running multiple builds at the same time: a duild for CR1, another for CR2 (on top of CR1), another for CR2 on top of CR0, in case CR1 fails, and so on.

Here is a diagram of the process:

This would be really nice. Don't you think?

Sunday, October 19, 2008

ssh connection with public/private key pair not working on Leopard?

Solution: try passing the private key file to the ssh command using -i:

ssh -i identity_file user@server

The problem
I tried setting up a DSA public/private key pair on Leopard. but I didn't accept the default private key file name. Leopard didn't prompt me for the passphrase, and instead was prompted for the normal username/password.

Project Development Knowledge: Sharing and Enduring

The previous entry introduced the problem. The following is a discussion around practices that can help address it.

Standups

Daily standup meetings [1], aka daily scrums, are a great way for the team to share information about the current tasks being implemented by different team members. It's a quick forum, at the end of which, each team member will have basic knowledge of what other team members are up to. This can be very effective in eliminating duplicate efforts, as well as helping team members relate to other work affecting their own. Team members who notice the potential for cooperation and knowledge sharing during the standup, should get together after the meeting to carry on with further, more detailed examination of their work.

Pair programming

Controversial as it may be, this practice is very effective in achieving higher team productivity through continuous knowledge sharing. The risk of the knowledge leaving the team is greatly reduced. And if we enhance this practice by frequent pair rotation, it's even more effective, as it spreads more knowledge throughout the development team, and it encourages cross-pollination of ideas.

Colocation

Having to communicate information to other team members highlights a hand-off transition, that is better avoided. It's much better if all the team is there, witnessing and participating firsthand the effort underway, rather than being communicated what happened. Colocation of the team, or as much of the team as possible, is very effective at eliminating that need. And when information is communicated within a co-located team, it's at least an order of magnitude more effective, efficient, and complete, compared to other means.

Code reviews

When a pair spends a day modeling a piece of functionality, or refactoring a key part of the system, they are likely to want to tell other team members about it. Invariably, everyone in the team is interested to know what others are doing, and how certain problems are being addressed. While pair rotation helps here, getting the whole team to participate in a code review session can achieve some of this benefit to a wider audience. It's also a great forum for seeking guidance, sharing opinions, and exploring novel ways to address issues.

Automation

Automating a certain task is an excellent way of sharing how it's done. Automation enables other team members to achieve the task, as well as serve as a documentation on how to accomplish it. For example, instead of me asking you how you query for the balance sheet, I can either use an automated script to get the information, or I could learn from the script how it can be done.

Self documenting, readable code

Code can be considered a misnomer in this regard. We'd like to have code that does not require deciphering. We'd like to be able to know what the code is doing, and how it's done, clearly, with minimal effort. Herein lies an argument against clever programming techniques, that make it harder to reveal intent and side effects. Use code reviews to highlight less than obvious techniques, and have the team workout what it's most comfortable with. For example, if multiple team members are having issues with a piece of code that uses reflection, consider first informing more members how the piece works, and if necessary, consider an alternative implementation. Enabling higher team effectiveness is valued higher than programming cleverness.

Good check-in comments

These can go a long way in telling the story of project development. A check-in delta tells you what changed. The comment tells you why. When a developer has to go through source code history to understand the rationale behind a certain change or design decision, these comments can be very helpful in that regard. Try to be helpful to the consumer of the comment. For example, instead of typing "implemented story #123", provide more information: "Story #123: added capability to classes x and y to access context z to achieve a and b."

Wikis

Project wikis have been around for while. Dare I say that they are the norm these days? Wikis are an excellent source of day-to-day knowledge needed by the team. For example, database connection information, URLs to local servers and useful documentation, pointers to tools, project and domain specific acronyms, etc. They are also useful in documenting repetitive development tasks, that are yet to be, or are difficult to automate.
One particular use that I wish to highlight is documenting errors that are encountered during development, whether or not they are corrected. For example, if while starting a server we notice that we are getting an exception, we should start by documenting this fact. We then add the solution once we figure it out. Certain problems have the tendency to re-occur, while some are unlikely to occur once fixed. Nonetheless, a similar issue might arise and the fix may prove useful beyond the original use.

A searchable, archived email list

I have not seen this one on many projects. However, a sizable portion of the project information is exchanged using emails. For example, feature discussions between developers and BAs, the resolution of design issues, technical announcements, in the form of the introduction of a new build tools or code libraries, and project course-changing decisions, to name a few. An automatically archived project specific email list that is searchable can be very valuable for future development. Using this knowledge base can aid in understanding why things came to be the way they are.

This list isn't exhaustive, and each one deserves its own discussion. I meant to hint and introduce. None of these practices preclude another, and you can attempt all of them.
A bit of warning though: do not over do any of these, and keep the practices light. After all, we are after agility. Don't let these, or any other practices slow down the gemba.

As usual, I welcome, and appreciate, the reader's feedback.

[1] http://martinfowler.com/articles/itsNotJustStandingUp.html

Saturday, October 11, 2008

Keeping Project Development Knowledge: The Problem

This is a two part entry. This entry is an introduction to the problems caused by the loss of development knowledge, that face project teams. The second part will examine various practices and techniques that can be used to tackle these problems.

By project development knowledge, I'm referring to the multitude of information required, and acquired, by the development team, that pertain to the project. This includes technical knowledge, concerning languages and tools, as well as development methodology, processes, business domain, etc.

Is there a problem? To help answer this question, consider the following events:

A long time team member is leaving the team.
An issue is identified with the software that the team is building. However, we know we've encountered this problem before. If only we can remember how we solved it.
A new team member is joining the team, and he's started asking questions about the project, or is encountering some issues setting up his environment, and we have to rely on memory or some veteran team members to answer these questions.
A BA notices that he is being asked the same question more than once by different team members.
The tech lead is noticing that the team members are not following coding standards, time and again.
Two team members are working separately on fixing the same problem.
A new team member is brought to take care of a code module, that no other team member knows about. For example, when code is inherited from another software company after their contract has ended, a team member is replacing another who was working alone on a piece of code, or when a certain area of the code has been dormant for a long time, that now requires changes.

These events represent times where project information is missed. This can be attributed to the following reasons:

Information leaves the team, along with parting team members.
Information is forgotten, thus, it needs to be reproduced.
The information exists, but is not readily accessible; the information is not easily communicated.
The information exists, but we don't know that.

How do we tackle these problems? The next entry will discuss some approaches that can help address some of these problems.

Sunday, September 28, 2008

Building and Maintaining Outstanding Systems

A quote from "The Leader's Handbook", by Peter Scholtes, under 'Outstanding Systems versus Outstanding People', subtitle 'What do leaders do?':

Seek to create and maintain outstanding systems. The ideal is: outstanding systems, achieving excellent results, with the ordinary efforts of average people.

Pretty insightful, challenging, and most notably, sustainable.

Friday, August 22, 2008

Incremental Release Always Advisable? Not so fast!

The benefits of the Agile practice of incrementally releasing software are well established. However, there are certain situations where it may not be the best fit for your client. This post considers one such scenario.

In our example, the client is replacing an existing old system, which is currently being used to run the client's business. The client is launching a new project to adapt a new software package to address many of the shortcomings of the current system. However, the package will require a non trivial amount of customization. At this point, there are two options:
1- Implement all the customizations necessary to the new package, migrate the data, then deploy the new package into production, and start using it instead of the old application.
2- Implement a minimal set of of customization to the new package, and release that into production. Use the new customized functionality in the new system, while completing the business process in the old system. Once the functionality has stabilized, customize the next step in the business process, deploy it to production, and so on.

The benefits of the second approach are well established, and there is no need to reiterate them. I'd like to concentrate on some of the challenges of this option.

To aid the discussion, assume that the business process for the client includes the general steps A, B, and C. And assume that we start by customizing the new package to fulfill the needs of step A in the process, continuing B and C in the old system. To be able to do this, we have to perform the following:

Transfer the data captured in the new system to the old system, after completing step A.
Figure out how to trigger the data transfer from the new system to the old: at which event should the data be transferred to the old system?
Decide what to do if the data may be changed in the new system after it has been transferred to the old system: Should we push the data to the old system every time it changes? If not, at what events should this happen? Is it acceptable to prevent further data changes in the new system after it has been transferred?
Work out the changes to the reporting needs of the client: Which system is able to provide which pieces of the data?
Figure out the implications of having the data unavailable in the old system until it has been transferred. For example, a data entry to capture the requirements of step A in the process might be captured in the new system, but the data will not be available in the old system until it has been transferred. Will this have any implications on the business?
Work out the changes to the client's business process. For example, is there double data entry required to maintain the consistency of the configuration data? How and when should users use the new system as opposed to the old system?

We also may have to answer the following questions:

If there is a data validation step that needs to be performed as part of step A, that requires the knowledge of previous transactions on related data from the old system, how will it be performed?
Is there a need to sync data changes from the old system back to the new system?

There are also requirements on data maintenance. The data in the old and the new system must be kept in sync. Otherwise, the syncing operations will fail to map the data to the correct business entities.

It should also be noted that the new system may not be amenable to syncing out of the box, and that certain changes to the way it functions may be necessary to allow the integration to work properly. For example, we may need to introduce a new status to mark whether or not the data has been, or is available to be, transferred to the old system.

Implications of the phased release approach
Given the above, we can see that the second approach has the following implications, as opposed to the first approach:

The client can only implement a limited data restructuring or data cleanup in the new system, because this may break, or make very difficult, the process of mapping data between the new and the old system.
It also follows that there are certain realizations of the benefits of moving to the new system will be delayed, until all the steps in the process have been successfully moved to the new system.
In a similar vein, the business benefits that may be gained, or process improvements that can be attained, by moving to the new system will be delayed, because the new system will be constrained by the need to synchronize data to, and conform to the business process imposed by, the old system.
The old system will need to be up and running for a longer period of time, because the time required to build the customizations to the new system, plus coding for the synchronization logic, will require more time than simply implementing the customizations, then go live. I am assuming here that the team size remains the same.
A careful consideration of the client's budget is due. If the budget is tight, there may not be enough slack to accommodate the extra effort required to add the integration functionality between the new and the old systems.

And you should also consider:

The cost to the business, in terms of delayed ROI, and time to market benefits, resulting from delaying release time until all the necessary functionality is developed in the new system. Your client may not have sufficient incentive to warrant the cost.
The risk of deploying all the functionality of the new system in one step. It may be the case that your client has ways of reducing the risk to the business, that makes the risk/reward balance weigh in favor of this option. For example, the client may have acceptable manual processes, or is able to revert to the old system, if part of the functionality in the new system is encountering issues.

Conclusion
There are certain situations where an incremental release to production may not be advisable. Although there is evident value associated with releasing fast, the cost and risks associated with this option may outweigh the benefits. Other approaches may weigh better on this scale. This entry introduced a situation where there are factors at play that should be considered before choosing an approach. Such factors include the client's budget, the benefits of going live faster, the cost of developing the integration code, the cost of delaying the release, and the risk in releasing all the functionality of the new system in one step.

On Programming and Applications Development