1. BioIE: Extraction of Biological Interactions

 

 

 

BioIE is a novel system that extracts biological interactions such as protein-protein interactions from the rapidly growing volume of biomedical literature in on-line resources such as MEDLINE and annotates the information with the terminologies of the ontologies in biomedical domain such as Gene Ontology. It delivers both the quality and the diversity of the extracted information by examining the grammatical functions of the arguments of interactions with Combinatory Categorial Grammar and by allowing for a wide variety of interactions as keywords.

 

2. AutoGO: Automatic Construction of Gene and Protein Ontologies

 

 

 

Our research group is interested in automatic extension of Gene Ontology (GO) with hierarchy/pathway information which is extracted from biomedical literature by information extraction systems such as BioIE. AutoGO is developed to validate the resource automatically extended with information extraction systems and to integrate existing bio-resources in a consistent way.

 

3. BiopathwayBuilder: Visualization of Inferred Gene and Protein Networks

 

 

 

In order to gain a full understanding of a biological process, we must be able to augment the known molecular interactions with discovered knowledge. We believe that a visualization system works as a means for accomplishing this task, as it provides an intuitive base for necessary information, among others. However, reported implementations have further problems: (1) The size of the information is not only enormous, but also grows very fast, which makes scalability and elision essential properties; (2) the available information is not only incomplete, but also unreliable; and (3) the usual information in the field, such as protein modification, is inherently complex, which makes it very difficult to make the resulting visualization intuitive enough for end users as well as field experts. We address all the problems above with a 3D visualization system.

 

4. BioNLQ: Natural Language Query for Heterogenous Database Access

 

 

 

Our research group conducts research on natural language database interfaces, where expressions in natural languages are transformed into corresponding expressions in formal database languages with a Combinatory Categorial Grammar. We utilize an extra level of representation for formal language queries in addition to the other levels of information for natural languages, i.e. syntax, semantics, and discourse. Addressed formal database languages include SQL, OOQL and CPL. We are particularly interested in providing a unified natural language interface for heterogeneous database access, which is essential in a biomedical domain.

 

5. BioContrasts: Knowledge Discovery with Protein-Protein Contrasts

 

 

 

Contrasts are effective conceptual vehicles for learning processes such as correcting, highlighting, contrasting, and grouping central concepts. Thus, they are useful for exploring the unknown. They can provide much invaluable insights and explanations about the observed phenomena. For example, contrasts between proteins in terms of their biological interactions can reveal what similarities, divergences, and relations there are of the proteins, leading to additional useful insights about the underlying functional nature of the proteins. BioContrasts Database is a database with protein-protein contrastive information. The database currently contains 41,471 protein-protein contrasts, which are automatically extracted from MEDLINE abstracts. With the web interface provided in this homepage, users can search for contrastive information of proteins of interest with their Swiss-Prot IDs or their names. Users also can attempt knowledge discovery with protein-protein contrasts through several templates of user interface.

 

6. Automatic Generation of Gene Summaries

 

 

 

An effective way to grasp new biological concepts is to start with their summaries. In particular, an informative summary can give the readers sufficient information, and a coherent summary will enhance the level of understanding and memorability together. When an automatic generation of a gene summary achieves both informativeness and coherency of this kind, people in a biomedical domain will be able to utilize it actively in order to gain professional knowledge with much ease.

 

 

 

7. E3DB: Database for Ubiquitin-Protein Ligases

 

 

 

The ubiquitin-proteasome system plays an important role in a number of diseases. Ubiquitin-protein ligases (E3s) are of particular interest as they determine the targeting specificity of the system. Substrate targeting specificity is normally dependent on the unique interaction between a particular combination of a ubiquitin-conjugating enzyme, an E3, and a target substrate. Thus, as many substrate proteins are targeted by ubiquitination, so are the corresponding E3s also discovered in eukaryotes. In order to help researchers to investigate E3 proteins regulated by ubiquitination, we provide an efficient method to identify proteins that are involved in the ubiquitin-protein ligase activity as well as to construct a database that organizes E3-related information including E3s, substrate proteins, associated proteins, related diseases and publications. To collect E3-related protein data, we first generate 52 combinations of databases for 13 underlying databases. We utilize such combinations to retrieve and integrate E3 and the related data. From such E3 data, we identify 917 distinctive proteins consisting of single component E3s and subunits for multicomponent E3 complexes.