Semantic Web Use Cases and Case Studies
Use Case: InSciTe– Technology Intelligence Service Supporting Decision-Making and Strategic Planning in R&D
Mikyoung Lee, Hanmin Jung, Pyung Kim, Seungwoo Lee, Dongmin Seo, Won-Kyung Sung
In order for researchers to obtain necessary information when finding technical emergence or establishing R&D strategies, they need either to get advise from professional analysts or to analyze various literatures such as academic papers, patents, and reports. Among the commercial tools available today for analyzing technical literature, little offer diverse functions for analysis, especially when dealing with large-capacity resources. Therefore, we have developed InSciTe (Intelligence Science & Technology), a technology intelligence service based on Semantic Web and text mining technologies, designed to maximize researchers’ research productivity and to support decision-making in R&D. The technology intelligence service collects information on a specific technology, generates diverse analytical information needed for decision-making and provides such information to decision makers.
Projects comparable to InSciTe include FUSE (Foresight and Understanding from Scientific Exposition) by IARPA (the Intelligence Advanced Research Projects Activity) of the U.S., and CUBIST (Combining and Uniting Business Intelligence with Semantic Technology) by Sheffield Hallam University of the UK. Their common approaches are to combine explicit metadata used by most services with implicit metadata hidden within text documents, in order to effectively find technical emergence. These projects are scheduled to start from 2011. FUSE aims to enhance inter-disciplinary, converged competitiveness by managing vast amounts and kinds of literature with a single, established system while also developing an automated method to support systematic and successive evaluation of technological potential based on information identified in the literature. Meanwhile, CUBIST aims to develop an enhanced Semantic Web search platform to allow business-related users to better understand large and heterogeneous data.
InSciTe, developed almost earlier than those projects by KISTI, is a service system to which Semantic Web technologies and text-mining technologies are applied. The goals of this system are to enhance the value of information use by extracting significant entities and relations between entities from research results and to support decision-making with enhanced value by converging the results of the aforementioned extraction with metadata on a semantic service platform. In addition, the system features not only an analytical service supporting decision-making but also services linked to Semantic Web open sources such as linked data, an reasoning verification service to explain the semantically inferred results, and a service to automatically generate summary reports on technologies and research agents.
InSciTe is a technology intelligence service to support the establishment of R&D strategy. It analyzes technology, research agents, and research results by using a multi-faceted viewpoint on their relations of competition and cooperation. We designed an ontology for this technology intelligence service so that technology, research agents, and research results can be expressed well in the form of knowledge (Figure 1).
This ontology models a Topic class representing technology, Person, Institution, and Nation classes denoting research agents, and Patent and Article classes representing research results, and consists of 17 classes, 57 datatype properties, and 37 object properties. Currently, the data of InSciTe include 670,000 research results (320,000 papers and 350,000 patents), 490,000 researchers, 90,000 institutions, 237 nations, and 70,000 technologies in the green technology field, and this is expressed in some 30 million RDF triples. The ontology, once organized in this manner, is stored and expanded by reasoning and used in various technology intelligence services.
Figure 1 Ontology Model for Technology Intelligence Service
InSciTe utilizes Semantic Web and text mining technologies in order to provide users with not only metadata but also new information obtained by combining the metadata with implicit data hidden within texts. InSciTe’s process is as follows: it extracts information from research results by using text-mining technologies and converts this information to Semantic Web data to load on the system, and offers various information analysis services from multiple viewpoints. We extracted relations between different technologies within texts using text mining and applied such information to the service. Also, we utilize Semantic Web technologies that allow easy linkage between different information, as well as reasoning, in order to ensure that a service can be created from various viewpoints.
The system architecture of InSciTe is as shown in Figure 2. Metadata including bibliographies are converted into semantic data through OntoURI, a tool for semantic knowledge management, and loaded onto OntoFrame, a semantic web platform. Meanwhile, the original text is used to extract technologies existing within the text and relations between such technologies through Scientific INtelligent DIscovery (SINDI), whose results are also converted into semantic data and loaded onto the OntoFrame. The semantic data converted in this manner are utilized in InSciTe’s analysis service.
Figure 2 System Architecture of Technology Intelligence Service
OntoReasoner is responsible for data loading and reasoning of OntoFrame. The performance of OntoReasoner is as shown in Figure 3. OntoReasoner was evaluated on the query performance using LUBM(16000), which has more than 2 billion of RDF triples. The soundness and completeness of the results were perfect and thus we focused on query response time which was measured in two criteria: elapsed time and query time per results (QTPR). The second one is calculated by dividing the elapsed time by the number of results. The following table shows the detailed evaluation results for each query.
Figure 3 Query Time Per Results (QTPR) for the 14 queries on LUBM(16000)
Focusing on core entities such as research results, technologies, and research agents, InSciTe provides information on trends of technologies and research agents analyzed from various viewpoints, and relationships of competition and cooperation determined by whether or not the research agents made research collaborations with one another. Centering on a specific technology searched by a user, related technologies, research agents (researchers, institutions, nations), and research results (papers and patents) are inter-linked and converged with various external information linked to them based on URI to generate various analytical information, which in turn is shown to the user through an appropriate form of visualization.
As shown in Figure 4, the results are displayed on a page consisting largely of a technology/research agent map service, a technology trends service, a research agent cooperation network, and a technology/research agent summary report, centering on the technology entered by a user. If a user clicks on the technology name and research agent shown in the representative service on the left, the basic analysis service (research results trend, new entrant research agent, etc) for the selected entity is displayed on the right. The basic analytical service of research agents additionally offers various information using external links with Semantic Web open sources including Linked Data (Figure 5). Meanwhile, the service that generates analytical results acquired by reasoning provides a verifying function so that the user is convinced of the results of reasoning (Figure 6).
Figure 4 displays the main page of InSciTe service when the search technology is “fuel cell.” By using the tabs at the bottom of the page, the user can move to one of the four main service pages. Current activation is Agent/Technology Map service, which is shown on the left, while the right side displays the basic analytical service for the chosen technology or research agent. In this page, “fuel cell” is selected and the right side displays research results trends and new entrant research agents as the basic service for “fuel cell.” The user can utilize other services including technology network service and research results list by using the up/down arrows.
Figure 5 The basic information provided by research agents service of InSciTe is linked to the Linked Data such as Geonames’ country information and DBpedia’s institution information.
Figure 6 A service to verify the results of reasoning involving analytical results of major research areas and collaborative research nations of “Japan,” a major research nation of “fuel cell”
We now look at a few representative services of InSciTe with their screen shots below.
l Agent-Technology Map Service (Figure 7): The technology searched by a user and its relevant technologies are searched and major agents for the concerned technologies are obtained. The results corresponding to the relevant technologies on the X axis and research agents on the Y axis are then shown with circles of varying sizes and colors. The more results there are, the larger the circle grows, and the same colors of the circles represent research agents who have performed collaborative research. Circles have different saturation depending on the ratio of research to projects comprising research results. This service allows users to understand relevant technologies on which main research agents of the searched technology focus, as well as competition/cooperation relations and research/project dependency between the research agents.
Figure 7 Agent/Technology Map Service (The screen on the left is for relevant technologies and competition/cooperation map of major research institutions for the “fuel cell” technology, whereas the screen on the right is for relevant technologies and papers/patents map of major research nations for the “global warming” technology)
l Technology Trends Service (Figure 8): The research results of the searched technology are divided into papers and patents, and a technology trend graph is generated using the number of results annually. When the graph is close to the X axis, representing the number of papers, the technology is considered more research-centric, whereas when it is close to the Y axis, representing the number of patents, the technology is considered more business-centric, and years are shown through nodes. Therefore, a look at how graphs move for each year helps predict the given technology’s level of development. Also through this service, a user can learn about the technology’s characteristics as well as the technology’s research maturity according to the annual graph shapes.
Figure 8 Technology Trends Services (The “fuel cell” technology trends graph on the left is business-centric with more patents, whereas the “global warming” technology on the right is research-centric with more papers.)
l Agent Network Service (Figure 9): The relations of competition/cooperation among research agents are identified through collaborative groups of the research agents. The collaborative relations of all research agents involved in the research of the given technology are grouped and expressed with circles. The greater the number of research agents comprising a group, the larger the circle grows, and the less joint research is performed, the greater the number of circles becomes. The size and number of circles indicate whether a single, representative research agent is dominating the market for the given technology, or multiple research agents are cooperating for balanced development of the technology. The network on the right in Figure 9 is a cooperation network within a group when a single research group is selected. A user can see the relevance between different research agents comprising a group through their cooperation network.
Figure 9 Agent Network Service (Left: Agent Research Group, Right: Agent Research Network)
l Technology/Research Agent Summary Report (Figure 10): Based on the results related to a technology, three levels of technological maturity (initial, growth and maturation stages) are defined, and patterns for each stage are automatically produced in a report format using an algorithm. In other words, the aforementioned analytical services are expressed in simple sentences and graphs in a report. The report provides the technology’s trends by determining the technology’s level of maturity based on the growth trend by year and type of research outcomes of the technology, while also offering information on its relevant technologies and their main research agents. Reports on research agents, meanwhile, are provided through links. All reports can be downloaded in MS Word format to be used as reports.
Figure 10 Technology/Agent Summary Reports (Left: Technology Report, Right: Institution Report)
Conclusion and further work
InSciTe, a technology intelligence service, provides multidimensional, diverse analytical information, unlike existing services, through Semantic Web technology’s information link and convergence functions. In addition, it features enhanced information accessibility through external connection with Linked Data, a verification service for rule-based reasoning, and a function to automatically generate summary reports. KISTI has a plan to expand this service to all types of research results (various forms of research results as well as papers and patents) in all fields, while adding new analytical services in various combinations through new connection and convergence of data, and creating prediction-based services. Along with this, InSciTe will evolve into a true Semantic Web-based service through connection with external open Semantic Web data such as LOD (Linked Open Data).
Key benefits of Semantic Web technology
l Academic literature data such as papers and patents are processed as an ontology, allowing free logical convergence and separation of two types of literature with heterogeneous data structures, as well as connection of services between heterogeneous data.
l Easy connection between individual services based on Semantic Web technologies allows comparison not only in the 2D matrix but also a multidimensional comparison centering on three core entities (research results, research agents, and technologies), enabling the development of advanced analysis services.
l A “reasoning verification service” is provided to allow users to check stages of reasoning and actual analysis results, thereby enhancing the reliability of rule-based reasoning services.
l Information accessibility is enhanced through connection with Semantic Web open sources using URI reference. If the service cannot be provided within the existing data, the scope of information provided by the service is expanded by using external open sources.
© Copyright 2005-2011, Korea Institute of Science and Technology Information (KISTI)