Semantic Web Use Cases and Case Studies
Use Case: VSB– Virtual Science Brain
Hong-Woo Chun, Chang-Hoo Jeong, Sa-Kwang Song, Yun-Soo Choi, Sung-Pil Choi
Since lots of articles, patents, and reports have been explosively published, scientific researchers are worrying about how to analyze these documents and obtain information that they need. In other words, researchers would like to search for evidences to prove confidence, originality and justification for their hypothesis. Although the analysis of scientific data is a necessary process in the whole research and development, it is true that this process is a bottleneck as well.
It is because hundreds of thousands of texts are needed to analyze. Mostly such work has been manually conducted by human with their background knowledge. The manual method is a time-consuming and very expensive task, and there are also problems in reuse and sharing.
To solve the problems of the manual method, many automatic Information Extraction (IE) approaches using Natural Language Processing (NLP) and Text Mining technologies have been proposed. Biomedical named entity recognition and relation extraction research with respect to disease-gene association and protein-protein interaction are examples of the IE research. The IE system has recognized and extracted knowledge from a massive literature and the extracted knowledge has been accumulated in a knowledge base. In order to decide research topics or analyze technical trends, not only IE techniques but also effective searching methods for the extracted information are very important. While information extraction research has been one of the favorite topics, research about searching methods for the extracted information has not been an attractive topic relatively. In other words, it is overlooked even though various specialized searching and browsing methods are necessary to express the extracted information appropriately.
Virtual Science Brain (VSB) means a knowledge base that contains information from a massive scientific literature and a smart delivery system makes VSB valuable (Figure 1).
Figure 1 Virtual Science Brain
The VSB contains three kinds of knowledge: Relational knowledge, Structural knowledge, and Procedural knowledge.
Relational knowledge is about the semantic relations between entities in text.
Disease-Gene Associations and Protein-Protein Interactions are examples in the Biomedical domain. A named entity recognizer and a relation extractor are used to construct the relational knowledge. For the utility of the results, ID of external DB and links to the original data are provided.
Figure 2 Relational knowledge
Structural knowledge is about structuring and organizing text so that we can easily identify discourse roles of each sentence in the articles. As for a sentence, we recognize the section information such as Background, Conclusion. This knowledge can be used as a framework for identifying further knowledge, in our cases, procedural knowledge.
Figure 3 Structural knowledge
Procedural knowledge is has been considered as knowledge of how to do something or knowledge of skills. The procedural knowledge is the more sophisticated version of structural
Figure 4 Procedural knowledge
knowledge. Procedural knowledge indicates activities and processes related to R&D. In Figure 4, targets and actions are extracted to describe the process of motion compensation method.
Table 1 describes the number of knowledge currently extracted and Table 2 is the statistics for the relational knowledge.
Table 1 Statistics on VSB
Based on knowledge from literature, a smart search system is needed to delivery knowledge. Among the knowledge, the relational knowledge is used to develop the search system.
Four searching methods will be explained in the proposed research. These contain various browsing methods to assist to analyze a multi-faceted knowledge.
Slide navigation search with automatic query generator The proposed searching system has a familiar user interface (Figure 5). Searching process is started with a query, and the auto completion function recommends candidate queries.
Figure 5 First page view of slide navigation search
Figure 6 Slide navigation search view
Search results contain ranked documents as similar as those of the common searching systems (Figure 6). However, three differentiated functions are included in the proposed searching system as follows:
¡¤ First, six biomedical named entities are highlighted in the results, and the information about named entities are shown if mouse pointer is positioned over a named entity. The information about the named entities contains the corresponding concepts.
¡¤ Second, it is easy to use other services by selecting titles and highlighted named entities. Titles are links to websites of the original PubMed articles. Once a named entity is selected, a popup window shows up another service, a semantic network browsing.
¡¤ Third, a query is easily constructed by selecting a named entity or a verb from candidates. The candidate named entities and verbs are all possible entities or event verbs related to the previously selected entity or verb. For the first input query, candidate verbs are listed in the next pane. Once a verb is selected, the next candidate entities are listed in the next pane.
This service might be helpful for more specific search with more specific query, and the search results are displayed immediately when a query is changed like the Google instant searching service.
Semantic network browsing describes relations among named entities. Degree of network can be extended from one to three and the output is zoomable. A vertex and an edge indicate a named entity and a verb (relation), respectively. All named entities in the network contain the identifiers of external public databases such as UMLS, UniProt, BioThesaurus, KEGG and DrugBank. Thus, network involves not only the extracted information from texts but also information from other external databases. If a vertex is selected, synonyms are listed based on the frequency, and if an edge is selected, all relations between two named entities are shown with the evidence sentences. Moreover, links to websites for the original documents are also provided (Figure 7).
Figure 7 Semantic network browsing view
Top 5 is developed to obtain answers for the following question: ¡°What are associated with Pancreatic cancer?¡± This question can be changed by the following three templates: Subject(?), Verb(associate), Object(Pancreatic cancer). As for a query, the results show list of named entities or relational verbs based on frequency of co-occurrences (Figure 8). The co-occurrence indicates a sentence that contains both two named entities, or both a named entity and a verb. A Pie type and a bar type are the way to show the results, and the result in Figure 8 is shown by the pie type. As for the relational verbs, translated terms to Korean are provided for Korean users.
Figure 8 Top5 view
Find It (Dynamic Search Table) can suggest selected attributes for the selected diseases. A users can fill entities in the row of the dynamic search table and fill verbs in the column of the the dynamic search table. We¡¯ve regarded the verbs as attributes for the entity. This service can search for latent relations between two named entities (Figure 9). Currently, this service has four steps as follows:
(1) Input two named entities as a query and click Search button.
ü Co-occurred named entities and verbs are searched. In other words, named entities and verbs in the search results appear with two named entities in a sentence.
(2) Construct a dynamic table through dragging and dropping named entities and verbs from the results of step (1). Newly selected named entities and verbs would be inserted to the column and the row, respectively.
ü As for a newly selected named entity and verb, new co-occurred named entities are searched automatically in the dynamic table.
(3) Select a named entity in the dynamic table generated in step (2), evidence sentences are shown in the different windows.
(4) Select a sentence, the original documents can be shown.
Figure 9 FindIt view
Conclusion and further work
Researches about information extraction from literature and construction of a knowledge base such as Virtual Science Brain have been actively conducted. The information extraction technique can play an important role in obtaining useful knowledge in almost real time. To increase values of the knowledge, (1) links to ontology or other databases and (2) right knowledge delivery systems are needed.
In the proposed approach, an entity ontology and a relation ontology have been constructed and all entities and their relations have been assigned links to the ontologies and identifiers of other databases: UMLS, UniProt, BioThesaurus, KEGG and DrugBank. In addition, researches about various searching and browsing methods for the extracted knowledge are presented.
In the proposed approach, three types of knowledge and four smart searching services are introduced, and they utilize a multi-faceted scientific knowledge effectively. By using the proposed searching system, we expect that researchers can decide encouraging topics and analyze technical trends correctly.
Key benefits of Semantic Web technology
l The values of knowledge from literature can be increased if the knowledge has links to an ontology or other databases.
l Ontology-based data can be easily integrated with another data that is also constructed based on an ontology.
© Copyright 2010-2012, Korea Institute of Science and Technology Information (KISTI)