Thursday, June 28, 2012

Policy Statement About Compilations

I haven't posted about copyright in a while, and opportunity knocked via this morning's email newsletter from the Copyright Office. The office issued a policy statement about its examination of compilations, particularly claims of authorship in selection and arrangement of uncopyrightable material. I found it interesting for a number of reasons, primarily because of its potential deterrent effect on spurious copyright registrations in compilations of non-copyrightable facts or ideas. Below is mostly a summary with a few comments tossed in. The statement begins with a textual analysis of the Copyright Act, beginning with the definition of "compilation" found in §101:
A ‘‘compilation’’ is a work formed by the collection and assembling of preexisting materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship.
After a brief look at Feist, they cut right to the chase:
However, a question that was not present in the facts of Feist and therefore not considered by the Court, is whether the selection, coordination, or arrangement of preexisting materials must relate to the section 102 categories of copyrightable subject matter.
I'll ruin the surprise and tell you now that the Office answered in the affirmative: the compilation must result in a §102(a) category of authorship. The textual analysis continues with a look at §103:
The subject matter of copyright as specified by section 102 includes compilations and derivative works...
Thus, "Section 103 makes it clear that compilation authorship is a subset of the section 102(a) categories, not a separate and distinct category." Section 102(a) lists eight categories of protectable works, such as literary works, musical works and audiovisual works. For example, the Feist phonebook would have been protectable as a literary work had it been original. The list of eight is meant to be illustrative though, as confirmed by explicit legislative history and textual use of "includes" rather than something like "is limited to."

After marching through some of the history of the Act, its gradual expansion of protectable works over time, and the legislative history of the current Act, the office concludes that Congress intended the categories to be flexible (e.g. software is considered a literary work) to account for changes in creative expression yet wanted to retain control to designate entirely new categories. This leads to the next section, where the Office discusses the hypothetical registration of a compilation of yoga poses and the unreported decision Open Source Yoga Unity v. Choudhury, which found the copyrightability of the selection and arrangement of a series of poses to be a triable question.

Since a yoga pose is not included in the §102(a) categories, the copyright office would refuse to register a compilation of yoga poses. It might, however, register a compilation of photographs or drawings (of the poses). The Office also concludes that "section 102(b) precludes certain compilations that amount to an idea, procedure, process, system, method of operation, concept, principle or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work." For example, a set of exercises meant to be done in order probably wouldn't be protected because it is functional, not purely expressive, although a film or compilation of photographs of the exercises might be.

The Office regrets issuing past registration statements for "compilations of exercises’’ or ‘‘selection and arrangement of exercises.’’ Although this will typically just be a matter of rephrasing claims (e.g. registering a "compilation of photographs" or "compilation of literary works"), it's great to see the Office offering clear guidance on the subject matter of copyright to keep the courts from having to sort it out.

Monday, June 25, 2012

Summer Code Party at CfA

On Saturday, we hosted an event at CfA as part of Mozilla's Webmaker campaign. It originally began as a continuation of the #OAHack that PLoS hosted (see post below) but the dates aligned and we were pleased to bring together the related ideas of the open web, open science and open access into one event.

The Presentation
John Wilbanks' started our day with a presentation explaining that if government is a platform, science is a wiki. In its current state, it's a terribly inefficient one, however. He shared some statistics on references to traditional versus open-published papers, such as the number and variety of citations resulting.  So why is science such a terribly inefficient wiki and what can we do to improve sharing, reuse, collaboration and ultimately progress?

John Wilbanks Presenting
First, open content. John noted the NIH's open access policy and how that's changed the playing field for scientists and in many ways for publishers too, although no publisher has presented adverse event data, suggesting that fears over open publishing destroying traditional business models may be overblown. A few months ago, the OSTP requested information on public access to digital data and scientific publications, and a number of authors of replies were in attendance (example). We discussed FRPAA, the Federal Research Public Access Act and some of the political theater surrounding it, including the red herrings such as the acquisition of US taxpayer-funded research by foreign governments, which could certainly pay if they wanted. I began to think of the arguments over the EU's database directive, which provides protection to database producers based on the quality and quantity of information collected and arranged, preventing its extraction; this is separate from copyright, though copyright may exist in the non-factual aspects of the underlying data. Fortunately, attempts to create sui generis database rights went nowhere in the U.S., and things have worked out just fine without them for years. Indeed, even the EU report in 2006 on the effects of the sui generis right casts a skeptical gaze, noting its effects are "unproven." But that's another story for another day.

Another important point was that there can be no change in outcome without change in stakeholders. Noting the frequency of lobbying activity by organizations opposed to open access, John explored other ways of getting legislators' attention. For example, he kickstarted a campaign on the White House's We the People petition site that reached the threshold of 25,000 signatures in about two weeks, a quick and clear indication that this issue needs attention from policy and lawmakers. We all look forward to the White House reply.

The next step in making science more efficient is open data. Using an example of climate data, John pointed to the variety of data collected independently, for different reasons, and how making it openly available is crucial for context and understanding, especially by lay people. For example, there's research on runoff, ocean temperature, land surfaces, clouds and precipitation, solar energy, and much more. At the same time, raw data alone is only a small step toward making use of it: raw metadata and standards processes, document submission standards and archives are necessary too. There are also the existential questions about data, which led to the final requirement for making science efficient: open consent.

Here, John spoke of the time and expense of organizing participation in clinical and other studies, and the narrow scope of that consent. Yet we're constantly producing a stream of data in every activity we do. Why the disconnect? Surely part of it is legal, not having clear legislation or guidance (or conservatively interpreting current law, especially where there's no explicit guidance and little case law). John used the example of The Eatery, an app that allowed users to vote on how healthy their meals are, and other users can vote also. The millions of ratings demonstrated not only that we overestimate the health value of our food intake, but that in only 5 months, with no grant and no academics involved, 7.68 million data points were collected and now that data set is in high demand from researchers.

After seeing this example, services like 23andme, interviews, and more case studies, it became clear to John that there is a critical mass of people who prefer sharing as a form of control. However, one of the unintended consequences of informed consent is that data remains limited to a specific purpose, rather than the portable consent John is working on that would allow one to simply donate data to science. In other words, donating data to be used in any study by anyone. The Consent to Research project teaches users the core ideas of informed consent, allows review of a consent agreement, and prompts participation by allowing users to upload data while selecting the permissions granted to researchers: right to research, redistribute, public the results, and right to commercialize products derived from research. Along the way, users are required to watch a video explaining the potential for harm from sharing. What if, for example, your shared data is used to connect you to a crime? The potential social, legal and economic issues are only limited to your imagination. Consider things like paternity suits, analysis by employers or insurance companies, etc.

With open content, permission via informed consent, and the participation of people (who want sharing as a means of control), science can become at least a modestly effective wiki.

The Projects

Wherein I note that, "At CfA,
we infiltrate the civil-bureaucratic stack."
The projects spanned a number of open access and open web topics, beginning with the Adopt an Institution project. The team, which included participants from CfA, PLoS and Creative Commons (not necessarily in their official capacities) developed an outreach strategy for the app, began adding more universities and institutions to the app, and modified the database structure to allow multiple participants to "adopt" a single institution and indicate their affiliation (professor, student, administrator) and subject area of primary interest. This project is built in Ruby on Rails, the code is available on Github, and it's deployed using Heroku.

Open Science Hub Team
The Open Science Hub Team
Another team continued hacking on the Open Science Hub, a web site to collect and display info on the open access movement in a way that's more broadly accessible. They re-themed the site, built in Joomla, expanded administrator capabilities for adding new content, updated some of the articles, added a Twitter feed to the site, and discussed strategies for bringing the results of open science to a wider audience, including journalists.

Following up on their work scraping and displaying information about the Open Access petition mentioned above, and a subsequent conversation on Twitter about furthering the work, one team started work on a specification for a whitehouse.gov We The People API. There's a Python scraper here, proof-of-concept map, and more detailed specs and next steps here. One goal of the project is an easier way to see the time, actual location (not just what people entered - there could be ambiguity, such as Ontario, CA being California or Canada) and other info about petition signers. 

Finally, a group set out to survey the landscape of open music notation tools, prompted by the recent success of the Open Goldberg Variations project which released public domain versions of the scores and recordings of the pieces. We wondered how replicable the project would be, and sought to first identify musicians' needs. Conveniently, we had two classical musicians (and one jazz guy.. me). There's a surprising amount of variation in publishers' adaptations, arrangements, editions and other derivative works of public domain classical music (adding tempo, dynamics, etc.), and there's not an easy way of comparing versions. We set out to explore a "Github for music" and started by reviewing MusicXML, an XML-based format for music notation, then looked at Music 21 , a Python-based "set of tools for helping scholars and other active listeners answer questions about music quickly and simply." Also, some of the Javascript and HTML5 goodies that allow for entry and manipulation of music in the browser, either to or from Music XML. For example, the open source web-based music notation rendering API Vexflow is used in this HTML5 Cloud Composer project. For the "sheet music Github", the idea would be to input the public domain score and convert to Music XML, display it as notation that could be modified in the browser, save the modifications and convert back to Music XML, then save that result in a user's profile or separate version from the original public domain score.

A number of side conversations I heard were about front end responsive design, Unglue.it (a site that collects donations for purchasing literary works or reaching agreement with publishers to release existing works under Creative Commons licenses), and the economic implications of open access. All in all, we had a great time, many new connections were made, and lots of follow up scheduled. Thanks to all who attended, and thanks to Mozilla, PLoS, and CfA for their support.