The MAP evaluation relied on a quasi-experimental, matched comparison design. This approach is less rigorous than a true experimental design or randomized controlled trial (RCT), but it is often the strongest design available for comprehensive community improvement efforts. New York City designed the MAP initiative with insights from a wide array of social science and public policy research. Core concepts came from several knowledge areas, but there was no single theory of change. The initiative drew on principles from the social and behavioral sciences. It incorporated strategies inspired by research in economics, public policy, social welfare, urban planning, architectural design, healthcare, and criminal justice. These features made MAP a robust intervention, but they also led to ongoing adjustments and innovations.

The variety of strategies involved in MAP presented a challenge for the evaluation team. Researchers can never measure everything about an initiative. Choices must be made. Measurements are informed not only by social science theory and previous research findings but also by the priorities of individuals and organizations involved in an initiative. These priorities evolve. By the third or fourth year of an extended evaluation, the priorities articulated by officials in the first or second year may no longer be viewed as core components. An evaluation designed at the beginning of an initiative may end up measuring the wrong things. This risk is offset by including administrative data generated before, during, and after the initiative, but the variables created from administrative data are never perfect. Often, they are merely proxies for the more precise outcomes targeted by an intervention.

Establishing rigorous evidence requires careful measurement of resources and inputs, activities, short-term outputs, and long-term outcomes (Patton, 1982; Rossi, Lipsey, and Freeman, 2004). Much of this information must be collected by observing interventions in action and interviewing key participants. To measure the impact of public safety supports, researchers often need to collect data about community context, the perceptions and attitudes of individuals, and mediating effects on families and the social networks of individuals. Relying exclusively on official data could provide incomplete and biased information about participants and their response to policies and programs.

Because the MAP evaluation relied on a matched comparison design, it was unable to describe MAP as an “evidence-based” approach. The term evidence-based is typically reserved for interventions and policies tested with rigorous evaluation designs—often multiple random assignment studies. Few community-level interventions are evaluated with randomized studies. This is an unavoidable reality due to the inherent limits of large-scale interventions, such as small sample sizes, challenges to program fidelity, and time demands.

The MAP evaluation could not measure all possible mechanisms underlying MAP’s impact on communities. It estimated their collective effect by comparing MAP communities with similar communities not involved in MAP. While the strength of this evidence is limited, it allowed City officials to assess the value of MAP and to deem it successful.