Why Digitizing Harvard’s Law Library May Not Improve Access to Justice

Editor’s Note: The author of this post is a law professor whose research focuses on decision-making in the context of emerging legal technology.

By Brian Sheppard, Associate Professor at Seton Hall University School of Law

Harvard Law School and Ravel Law, a legal research and analytics startup, have partnered in an effort to make the law school’s massive collection of U.S. law and cases publicly available for free on Ravel’s website. The project, known as “Free the Law,” made waves because Harvard’s collection is second only to the Library of Congress in its breadth. Since most of this material was either unavailable or only available through a paywall, Free the Law has tremendous potential. But whom will it help the most?

According to its founders, the project’s goal is to increase access to justice. The intended beneficiaries, it seems, are those who cannot afford professional legal services. Thank goodness: The United States ranks in the bottom half of developed nations when it comes to accessibility and affordability of civil justice. Those who cannot afford lawyers must either get help from a legal aid clinic or go it alone. Unfortunately, clinics are not plentiful; they continue to suffer from funding cuts and shrinking bar association contributions, and Free the Law is unlikely to have a significant affect on their budgets. The question, then, is whether non-lawyers are likely to benefit from Free the Law when handling their own legal matters.

For that to be the case, the digitized collection must be the sort of thing that enhances non-lawyers’ ability to reach desired legal outcomes. In press releases, Free the Law touts that its materials comprise over 42,000 volumes and span all the way to the period before this country was founded. Digitizing such a massive corpus will take two years, so Harvard and Ravel must be confident that all of their labor will pay off. This raises a new question: To what extent is case law a “more is better” resource?

Harvard Law Library2


One problem is that the vast majority of cases in the collection will have virtually no persuasive value in a legal proceeding. Most will be irrelevant to the issues that a litigant will face. Even when a case might be relevant to the issues, it will probably be too old to help much. A 2013 empirical study by Black & Spriggs in the Journal of Empirical Legal Studies found that precedents at the U.S. Supreme Court and federal courts of appeals have about a twenty-year life span; the amount that courts cite them depreciates about 85 percent during that period.

So Ravel didn’t need to rely on Harvard’s endless stacks to provide access to the highly relevant cases — numerous libraries could have done the job. Besides, free websites like FindLaw, Justia, and the Public Library of Law already cover cases in the U.S. Supreme Court, federal courts of appeal, and many states’ appellate courts for that period, and Google Scholar has an extensive database of federal and state cases as well. Without more, the inclusion of law and cases from bygone eras might have the effect of obscuring useful precedent rather than enhancing it. Increasing the size of the haystack doesn’t make finding the needle any easier.

The obscurity of the common law is the reason that the game in legal research is not access to cases; it is the manner in which users interface with them.

Free the Law aims to provide more than access to cases, however; it also provides tools for analyzing them. Case law can be quite difficult to comprehend, particularly for those who are not trained to perform legal research. Even with the benefit of teachers and time, it can take years for law students to become proficient at identifying relevant cases, extracting rules and principles therefrom, and using them to craft arguments or draft legal documents. The obscurity of the common law is the reason that the game in legal research is not access to cases; it is the manner in which users interface with them. The more that software can reliably handle the hard tasks just described, the more valuable an increase in access can be.

The true value of the partnership between Harvard and Ravel is this: The former can provide unmatched quantities of data, and the latter can provide the latest tools in search and analytics. The issue, then, is whether Ravel’s tools can narrow the lawyer/non-lawyer performance gap to the extent that the size of Harvard’s collection becomes an asset that improves access to justice.

Ravel’s calling card is its ability to provide visualizations of legal concepts. Entering a term into Ravel’s search engine reveals a network analysis in which cases using the term are linked to each other based on citation. It looks like a complex web, with frequently cited cases serving as hubs, and citations serving as the strings that bind them together. The most cited hub cases are in larger bubbles, and the colors of the strings that expand outward from the hubs indicate the level of the citing court. While this might sound convoluted, Ravel manages to convey a great deal of information with surprising clarity. One need not have a law degree to know that these hub cases are a good place to start one’s research.

That said, it might take a law degree to know which search term to enter in the first place and, thereafter, to understand the significance of each thread that leads to the hub.

As to search terms, the margin of error appears to be rather slim. Take, for instance, a search concerning the well-known element of actual cause in negligence. In most states, such as New York, the term “actual cause” is used interchangeably with the terms “cause-in-fact” and “but-for cause.” Yet, the choice of which term to enter into the Ravel search field yields quite different results for New York, as does the choice of whether to enclose the terms in quotes (see, e.g., actual cause versus actual cause versus but-for cause versus cause-in-fact). This suggests that Ravel’s search program is more receptive to the sequence of characters that its users type (the syntactics of their expression) than the concepts to which they refer (the semantics of their expression).

Syntactical search is often less user-friendly because it does a poor job of filling in gaps in a user’s understanding. Small typographical differences can lead to large disparities in results. For our search, a user might need to know all of the terms for actual cause if he or she is to reach the full range of relevant cases; otherwise she is at the mercy of the term she happened to use. I have written elsewhere that it is important for machines to become better at semantic and pragmatic processing of language if legal tech products are to narrow the performance gap in a satisfactory way.

Just because courts have frequently cited a case does not necessarily indicate that the case lends a lot of support for the positions taken within it.

Understanding the significance of the strings in the visualization is just as challenging. Just because courts have frequently cited a case does not necessarily indicate that the case lends a lot of support for the positions taken within it. The character of citation usually matters more than frequency, and discerning that character is not always easy.

At a deeper level, we might wonder whether the complex manner in which the common law augments our legal rules can ever be boiled down to visualizations or other user-friendly interfaces.  According to many legal theorists, not only can the content of our law change, so too can the manner of proper argumentation and adjudication of cases. It is difficult to imagine how such complexity could be made comprehensible to the untrained eye through visualization or other analytic tools. Unless, of course, we someday make our law simpler and more computer-friendly.

Even if Ravel makes significant progress in this regard, there is a good chance that Free the Law users will not reap the full benefit. Ravel has already said that the users will not have free access to its most powerful analytic and research tools. Those will exist behind a paywall.

But there is an interesting twist here: Our ability to understand the dynamics of law, itself, will undoubtedly be helped by the Free the Law project. Importantly, that sort of topic is the province of scholars — like historians, empiricists, and philosophers — rather than litigants or lawyers. The Free the Law database could make it easier to trace the genealogy of a precedent from the pre-Revolutionary period to the twentieth century. Or to discover new parallels and differences between tribal and non-tribal adjudication. Or even to gain insight into the forces that converged to create this country.

Perhaps, then, it is the scholars who stand to gain the most from Free the Law. If we are lucky, it will increase access to justice too.