Semantic Web Use Cases and Case Studies
Use Case: VSB– Virtual Science Brain
Hong-Woo Chun, Chang-Hoo Jeong, Sa-Kwang Song, Yun-Soo
Choi, Sung-Pil Choi
KISTI, Korea
Oct. 2012
.files/image002.jpg)
General Description
Introduction
Since lots of
articles, patents, and reports have been explosively published, scientific
researchers are worrying about how to analyze these documents and obtain
information that they need. In other words, researchers would like to search
for evidences to prove confidence, originality and justification for their
hypothesis. Although the analysis of scientific data is a necessary process in
the whole research and development, it is true that this process is a
bottleneck as well.
It is because
hundreds of thousands of texts are needed to analyze. Mostly such work has been
manually conducted by human with their background knowledge. The manual method
is a time-consuming and very expensive task, and there are also problems in reuse
and sharing.
To solve the problems
of the manual method, many automatic Information Extraction (IE) approaches
using Natural Language Processing (NLP) and Text Mining technologies have been
proposed. Biomedical named entity recognition and relation extraction research
with respect to disease-gene association and protein-protein interaction are
examples of the IE research. The IE system has recognized and extracted
knowledge from a massive literature and the extracted knowledge has been
accumulated in a knowledge base. In order to decide research topics or analyze
technical trends, not only IE techniques but also effective searching methods
for the extracted information are very important. While information extraction
research has been one of the favorite topics, research about searching methods
for the extracted information has not been an attractive topic relatively. In
other words, it is overlooked even though various specialized searching and
browsing methods are necessary to express the extracted information
appropriately.
The solution
Virtual Science Brain
(VSB) means a knowledge base that contains information from a massive
scientific literature and a smart delivery system makes VSB valuable (Figure 1).
.files/image004.jpg)
Figure 1 Virtual Science Brain
The VSB contains
three kinds of knowledge: Relational knowledge, Structural knowledge, and
Procedural knowledge.
Relational knowledge is about the semantic
relations between entities in text.
Disease-Gene Associations and Protein-Protein
Interactions are examples in the Biomedical domain. A named entity recognizer
and a relation extractor are used to construct the relational knowledge. For
the utility of the results, ID of external DB and links to the original data
are provided.
.files/image006.jpg)
Figure 2 Relational knowledge
Structural knowledge is about structuring and
organizing text so that we can easily identify discourse roles of each sentence
in the articles. As for a sentence, we recognize the section information such
as Background, Conclusion. This knowledge can be used as a framework for
identifying further knowledge, in our cases, procedural knowledge.
.files/image008.jpg)
Figure 3 Structural knowledge
Procedural knowledge is has been
considered as knowledge of how to do something or knowledge of skills. The procedural knowledge
is the more sophisticated version of structural
.files/image010.jpg)
Figure 4 Procedural knowledge
knowledge. Procedural knowledge
indicates activities and processes related to R&D. In Figure 4, targets and actions are extracted to describe the process of
motion compensation method.
Table 1 describes the
number of knowledge currently extracted and Table 2 is the statistics for the
relational knowledge.
Table 1 Statistics on VSB
|
Relational Knowledge |
5,329,235 relations |
|
Structural Knowledge |
935 abstracts |
|
Procedural Knowledge |
1,309 articles |
Based on knowledge
from literature, a smart search system is needed to delivery knowledge. Among
the knowledge, the relational knowledge is used to develop the search system.
Four searching
methods will be explained in the proposed research. These contain various browsing methods to
assist to analyze a multi-faceted knowledge.
Slide navigation search with automatic query generator The proposed
searching system has a familiar user interface (Figure 5). Searching process is
started with a query, and the auto completion function recommends candidate
queries.
.files/image012.png)
Figure 5 First
page view of slide navigation search
.files/image014.png)
Figure 6 Slide navigation search view
Search results
contain ranked documents as similar as those of the common searching systems
(Figure 6). However, three differentiated functions are included in the
proposed searching system as follows:
¡¤
First, six biomedical named entities are highlighted in the results, and
the information about named entities are shown if mouse pointer is positioned
over a named entity. The information about the named entities contains the
corresponding concepts.
¡¤
Second, it is easy to use other services by selecting titles and
highlighted named entities. Titles are links to websites of the original PubMed
articles. Once a named entity is selected, a popup window shows up another
service, a semantic network browsing.
¡¤
Third, a query is easily constructed by selecting a named entity or a verb
from candidates. The candidate named entities and verbs are all possible
entities or event verbs related to the previously selected entity or verb. For
the first input query, candidate verbs are listed in the next pane. Once a verb
is selected, the next candidate entities are listed in the next pane.
This service might be
helpful for more specific search with more specific query, and the search
results are displayed immediately when a query is changed like the Google
instant searching service.
Semantic network browsing describes relations among named entities.
Degree of network can be extended from one to three and the output is zoomable.
A vertex and an edge indicate a named entity and a verb (relation),
respectively. All named entities in the network contain the identifiers of
external public databases such as UMLS, UniProt, BioThesaurus, KEGG and
DrugBank. Thus, network involves not only the extracted information from texts
but also information from other external databases. If a vertex is selected,
synonyms are listed based on the frequency, and if an edge is selected, all
relations between two named entities are shown with the evidence sentences.
Moreover, links to websites for the original documents are also provided
(Figure 7).
.files/image016.png)
Figure 7 Semantic network browsing view
Top 5 is developed to obtain answers for the following
question: ¡°What are associated with Pancreatic cancer?¡± This question can be
changed by the following three templates: Subject(?), Verb(associate), Object(Pancreatic
cancer). As for a query, the results show list of named entities or
relational verbs based on frequency of co-occurrences (Figure 8). The
co-occurrence indicates a sentence that contains both two named entities, or
both a named entity and a verb. A Pie type and a bar type are the way to show
the results, and the result in Figure 8 is shown by the pie type. As for the
relational verbs, translated terms to Korean are provided for Korean users.
.files/image018.png)
Figure 8 Top5 view
Find It (Dynamic Search
Table) can suggest selected attributes for the selected diseases. A users
can fill entities in the row of the dynamic search table and fill verbs in the
column of the the dynamic search table. We¡¯ve regarded the verbs as attributes
for the entity. This service can search for latent relations between two named
entities (Figure 9). Currently, this service has four steps as follows:
(1)
Input two named entities as a query and click Search button.
ü Co-occurred named
entities and verbs are searched. In other words, named entities and verbs in
the search results appear with two named entities in a sentence.
(2)
Construct a dynamic table through dragging and dropping named entities and
verbs from the results of step (1). Newly selected named entities and verbs
would be inserted to the column and the row, respectively.
ü As for a newly
selected named entity and verb, new co-occurred named entities are searched
automatically in the dynamic table.
(3)
Select a named entity in the dynamic table generated in step (2), evidence
sentences are shown in the different windows.
(4)
Select a sentence, the
original documents can be shown.
.files/image020.png)
Figure 9 FindIt view
Conclusion and further work
Researches
about information extraction from literature and construction of a knowledge
base such as Virtual Science Brain have been actively conducted. The
information extraction technique can play an important role in obtaining useful
knowledge in almost real time. To increase values of the knowledge, (1) links
to ontology or other databases and (2) right knowledge delivery systems are
needed.
In
the proposed approach, an entity ontology and a relation ontology have been
constructed and all entities and their relations have been assigned links to
the ontologies and identifiers of other databases: UMLS, UniProt, BioThesaurus, KEGG and
DrugBank. In addition, researches about various
searching and browsing methods for the extracted knowledge are presented.
In
the proposed approach, three types of knowledge and four smart searching services
are introduced, and they utilize a multi-faceted scientific knowledge
effectively. By using the proposed searching system, we expect that researchers
can decide encouraging topics and analyze technical trends correctly.
Key benefits of Semantic Web technology
l The values
of knowledge from literature can be increased if the knowledge has links to an
ontology or other databases.
l Ontology-based
data can be easily integrated with another data that is also constructed based
on an ontology.
© Copyright 2010-2012, Korea Institute of Science and Technology Information (KISTI)