Table of Contents
PAIR
About PAIR
In 2009, ARTFL celebrated an open source software release of PAIR (Pairwise Alignment for Intertextual Relations) with an alpha version of PhiloLine available for download at Google Code. PAIR is designed as powerful search tool to help scholars tackle and better understand the widespread problem of literary text reuse.
While PAIR was developed in response to the fairly specific phenomenon of similar passages across literary works, the sequence analysis techniques employed in PAIR were developed in widely disparate fields, such as bioinformatics and computer science, with applications ranging from genome sequencing to plagiarism detection. PAIR generates a set of overlapping word sequence shingles for every text in a corpus, then stores and indexes that information to be analyzed against shingles from other texts. For example, the opening declaration from Rousseau's Du Contrat Social,
"L'homme est né libre, est partout il est dans les fers. Tel se croit le maître des autres, qui ne laisse pas d'être plus esclave qu'eux,"
would be rendered in trigram shingles (with accents flattened and function words removed) as:
homme_libre_partout
libre_partout_fers
partout_fers_croit
fers_croit_maitre
croit_maitre_laisse
maitre_laisse_esclave.
Common shingles across texts indicate many different types of textual borrowings, from direct citations to more ambiguous and unattributed usages of a passage. Using a simple search form, the user can quickly identify similar passages shared between different texts in one database, or even across databases. By selecting a shingle match size parameter, the user can further narrow the search results to look for shared passages of specific lengths. Click on the FRANTEXT link below to search all of the aligned passages in the main ARTFL database.
Interested parties are encouraged to consult the release site for more documentation, including technical details, PhiloLine source downloads, and a freestanding Perl module.