Legal documents have quite unique characteristics different from other domain documents. They have their own structure, style and vocabularies on the surface and, moreover, are highly interlinked and inter-dependent below the surface. It is very common that certain areas of law are dominated by one or two central laws which are the foundation for everything and other laws. These characteristics make it difficult for people to understand legal documents because such characteristics are quite unfamiliar to them. To help the people, we designed a legal document navigator, LawNavi, which can search and navigate legal documents based on their structures, relations and semantics. We also used a thesaurus to classify legal document fragments based on their conceptual meanings because conceptual classification of legal documents can help users to understand meaning and conceptual relations between the documents.
LawNavi is a tool to make complex relationships between legal document fragments visible and accessible with conceptual classification (i.e., topic). Its goal is to help users to analyze and understand highly interlinked and inter-dependent legal documents effectively and precisely. Document fragments are basically defined by the hierarchy of document components such as chapters and sections. Sections in the bottom of this hierarchy may have one or more textual paragraphs. For those textual paragraphs, following four rules can be applied to define the granularity of document fragments:
We also defined following four types of relationships (links) between legal document fragments: hierarchical, contradicting, supporting, and explaining. The first one means a hierarchical (syntactic) link between fragments within a document, while the others are semantic links between two documents and we did not consider intra-document relationships for the last three in order to focus on the navigation between legal documents.
Legal documents can be visualized in the two different points of view: document view and thesaurus view. In document view, a legal document is visualized with the hierarchical structures of fragments and, in thesaurus view, the navigation starts from topic hierarchy of thesaurus. These two kinds of view are complementary, not discrete. They can be switched over dynamically.
Our approach to implement LawNavi is to combine Semantic Web technologies with Natural Language Processing (NLP) technologies like text mining and classification. Semantic Web provides a framework for assigning well-defined meaning to legal document fragments as well as representing and accessing relations between the fragments. So, the Semantic Web technology-based information service platform, OntoFrame, was used as a framework for LawNavi. Text mining and classification are required to identify links contained in legal text and classify the links by types and legal document fragments by topics.
Figure 1 shows the overall processes of LawNavi. Legal documents are converted into RDF triples based on the ontology schema which models document fragments including section, paragraph and document itself with hierarchical relations (contain) and linking relations (support, contradict, and explain) between the fragments. This conversion has two different steps: structure-based and semantics-based populations. The former converts the hierarchical structure of legal document fragments into RDF triples and the latter converts links and topics of document fragments into RDF triples. Semantics-based population first segments legal text by sentence and classify each sentences based on a thesaurus. It also identifies links in each sentence and classifies them into one of three semantic link types. Finally, it groups sentences by their topics and links in order to apply the granularity rules. Thesaurus is also modeled in ontology and populated into RDF triples. The populated triples are stored and also fed into a reasoning engine to get inferred triples. When a user asks to navigate, the request is represented in a SPARQL query and executed. Then, the results of the query are presented to the user through Flex-based visualization. Thesaurus as well as legal documents are converted into RDF triples according to the ontology schema. We used EuroVoc as a thesaurus since we are targeting EU law documents and EuroVoc covers broad domains related to EU although it does not provide sufficiently deep classifications we need.
To visualize the structure of a document and links between document fragments effectively, we used star-like network, not tree-like one since the latter does not use space effectively. In document view (see Figure 2), users can navigate the hierarchical structure of documents. While navigating a document, users can continue to navigate other documents linked from a fragment of the current document or switch over thesaurus view in order to navigate thesaurus starting from a topic assigned to a fragment of the document. On the contrary, in thesaurus view (see Figure 3), users can navigate the topic hierarchy of thesaurus. While navigating a topic, users can switch over document view in order to navigate document fragments classified into the topic. All navigations are operated in ‘click and expand’ style. Metadata and textual contents are modeled in datatype-properties and their values and can be listed below the network view when selecting a corresponding fragment.
A legal document navigator was designed to help people in legal domain to understand the complex relations between legal documents and implemented by combining Semantic Web technologies with NLP technologies as well as thesaurus. It makes the relations more visible, accessible and presentable to users with two different views to facilitate navigating thesaurus starting from a topic assigned to each fragment of documents and document fragments classified by topics. We plan to expand current prototype to a real service after collecting sufficient opinions from legal experts.
© Copyright 2012, Korea Institute of Science and Technology Information (KISTI)