Articles & research papers

Usability Engineering

Pitting Usability Testing Against Heuristic Review

Consider this scenario: You are managing the Intranet applications for a large company. You've spent the last year championing data-driven (re-)design approaches with some success. Now there is an opportunity to revamp a widely used application with significant room for improvement. You need to do the whole project on a limited dollar and time budget. It's critical that the method you choose models a user-centered approach that prioritizes the fixes in a systematic and repeatable way. It is also critical that the approach you choose be cost-effective and convincing. What do you do?

Independent of the method you pick, your tasks are essentially to:

  • Identify the problems
  • Prioritize them based on impact to use
  • Prioritize them based on time/cost benefits of fixing the problems
  • Design and implement the fixes
  • In this situation, most people think of usability testing and heuristic (or expert) review. Empirical evaluations of the relative merit of these approaches outline both strengths and drawbacks for each. Usability testing is touted as optimal methodology because the results are derived directly from the experiences of representative users… The tradeoff is that coordination, testing, and data reduction adds time to the process and increases the overall man- and time-cost of usability testing… As such, proponents of heuristic review plug its speed of turnaround and cost-effectiveness… On the downside, there is broad concern that the heuristic criteria do not focus the evaluators on the right problems (Bailey, Allan and Raiello, 1992). That is, simply evaluating an interface against a set of heuristics generates a long list of false alarm problems. But it doesn't effectively highlight the real problems that undermine the user experience.

    There are many, many more studies that have explored this question. Overall, the findings of studies pitting usability testing against expert review, lead to the same ambivalent (lack of) conclusions.

    Pitting Usability Testing Against Heuristic Review (Link leads to a cached Google page since the original link is dead, good piece of content none the less)

    Severity Ratings for Usability Problems

    Severity ratings can be used to allocate the most resources to fix the most serious problems and can also provide a rough estimate of the need for additional usability efforts. If the severity ratings indicate that several disastrous usability problems remain in an interface, it will probably be unadvisable to release it. But one might decide to go ahead with the release of a system with several usability problems if they are all judged as being cosmetic in nature.

    The severity of a usability problem is a combination of three factors:

    • The frequency with which the problem occurs: Is it common or rare?
    • The impact of the problem if it occurs: Will it be easy or difficult for the users to overcome?
    • The persistence of the problem: Is it a one-time problem that users can overcome once they know about it or will users repeatedly be bothered by the problem?

    Severity Ratings for Usability Problems

    The User-Reported Critical incident Method for Remote Usability Evaluation

    Because of this vital importance of critical incident data and the opportunity for users to capture it, the over-arching goal of this work is to develop and evaluate a remote usability evaluation method for capturing critical incident data and satisfying the following criteria:

    • tasks are performed by real users
    • users are located in normal working environments
    • users self-report own critical incidents
    • data are captured in day-to-day task situations
    • no direct interaction is needed between user and evaluator during an evaluation session
    • data capture is cost-effective
    • data are high quality and therefore relatively easy to convert into usability problems

    Several methods have been developed for conducting usability evaluation without direct observation of a user by an evaluator. However, none of these existing remote evaluation methods (nor even traditional laboratory-based evaluation) meets all the above criteria. The result of working toward this goal is the user-reported critical incident method, described in this thesis.

    The User-Reported Critical incident Method for Remote Usability Evaluation (PDF, 1.8 MB)

    Preference and Desirability Testing: Measuring Emotional Response to Guide Design

    An important role of visual design is to lead users through the hierarchy of a design as we intend. For interactive applications, a sense of organization can affect perceived usability and, ultimately, users' overall satisfaction with the product.

    What stakeholders should be able to say is, "We should go with design C over A and B, because I feel it evokes the right kind of emotional response in our audience that is closer to our most important brand attributes."

    Opinion- There Is No Mobile Internet

    It’s time to stop thinking about the Internet and online communication in the context of a device, be it desktop, tablet or mobile. Advances by Google and Apple have heightened consumer expectations, which now require stricter focus from us to create seamless online communications — communications that work everywhere and that get their point across. We need to embrace a device-agnostic approach to communicating with connected consumers and forget the idea of a "mobile Internet". There is only One Web to experience.

    There Is No Mobile Internet

    The Mobile Playbook from Google

    Mobile is more central to business success than ever before. Most executives know this, but they get hung up on exactly what to do and how to do it.

    Google's now second edition of The Mobile Playbook offers the latest best practices and strategies for winning in mobile, like how to address the price transparency challenge and face showrooming head on, the age-old question of when to build a mobile website and when to build a mobile app, and what it really means to build multi-screen marketing campaigns.

    The Mobile Playbook

    Usability and User Experience Surveys

    According to Perlman (2009), "Questionnaires have long been used to evaluate user interfaces (Root & Draper, 1983). Questionnaires have also long been used in electronic form (Perlman, 1985). For a handful of questionnaires specifically designed to assess aspects of usability, the validity and/or reliability have been established, including some in the [table below]."

    This wiki has a list of generic usability survey instruments that can be adapted to specific websites. Often, it is good enough to replace the word "system" by "web site". There are more than 15 questionnaires listed here.

    Usability and user experience surveys

    How Pocket Built a Research Lab for Mobile App Testing in Just a Few Hours

    You’re ready to run a user study for your product. You’ve learned how to recruit participants, write an interview guide, interview people, and summarize results. But there’s just one problem: you don’t have access to a research lab. Learn how Pocket built a lightweight research lab for mobile app testing in their office.

    How Pocket Built a Research Lab for Mobile App Testing in Just a Few Hours

    Questionnaires in Usability Engineering- A List of Frequently Asked Questions

    The list on this page is a compilation of the questions the author has gotten on the use of questionnaires in usability engineering. Questions include:

    • What is a questionnaire?
    • Are there different kinds of questions?
    • What are the advantages of using questionnaires in usability research?
    • What are the disadvantages?
    • How do questionnaires fit in with other HCI evaluation methods?
    • What is meant by reliability?
    • What is meant by validity?
    • Should I develop my own questionnaire?
    • What's wrong with putting a quick-and-dirty questionnaire together?
    • Factual-type of questionnaires are easy to do, though, aren't they?
    • What's the difference between a questionnaire which gives you numbers and one that gives you free text comments?
    • Can you mix factual and opinion questions, closed and open ended questions?
    • How do you analyse open-ended questionnaires?
    • What is a Likert-style questionnaire? One with five response choices to each statement, right?
    • How can I tell if a question belongs to a Likert scale or not?
    • How many response options should there be in a numeric questionnaire?
    • How many anchors should a questionnaire have?
    • My respondents are continually complaining about my questionnaire items. What can I do?
    • What other kinds of questionnaires are there?
    • Should favourable responses always be be checked on the left (or right) hand side of the scale?
    • Is a long questionnaire better than a short one? How short can a questionnaire be?
    • Is high statistical reliability the 'gold standard' to aim for?
    • What's the minimum and maximum figure for reliability?
    • Can you tell if a respondent is lying?
    • Why do some questionnaires have sub-scales?
    • How do you go about identifying component sub-scales?
    • How much can I change wordings by in a standardised opinion questionnaire?
    • What's the difference between a questionnaire and a checklist?
    • Where can I find out more about questionnaires?

    Questionnaires in Usability Engineering- A List of Frequently Asked Questions

    Five Critical Quantitative UX Concepts

    As UX continues to mature it's becoming harder to avoid using statistics to quantify design improvements... Here are five of the more critical but challenging concepts. The author didn't just pick some arbitrary geeky stuff to stump math geeks (or get you an interview at Google). These are fundamental concepts that take practice and patience but are worth the effort to understand.

    1. Using statistics on small sample sizes: You do not need a sample size in the hundreds or thousands or even above 30 to use statistics. The author regularly compute statistics on small sample sizes (less than 15) and find statistical differences.
    2. Power: Power is sort of like the confidence level for detecting a difference—you don't know ahead of time if one design has a higher completion rate than another.
    3. The p-value: The p-value stands for probability value. It's the probability the difference you observed in a study is due to chance.
    4. Sample Size: Sample size calculation remains a dark art for many practitioners. There are many counterintuitive concepts, including power, confidence and effect sizes. One complication is that there are different ways to compute sample size. There are basically three ways to find the right sample size for just about any study in user research- problem detection, comparing and precision.
    5. Confidence intervals get wider as you increase your confidence level: The "95%" in the 95% confidence interval you see on my site and in publications is called the confidence level. A confidence interval is the most plausible range for the unknown population mean. But you can't be sure an interval contains the true average. By increasing the confidence level to 99% the author makes their intervals wider. The price for being more confident is that they have to cast a wider net.

    Five Critical Quantitative UX Concepts