|
|
 |
 |
| |
Digital Intelligence Research Cluster (incorporating
Digital Library, Information Retrieval, Natural Language Processing
& Human Computer Interaction research) |
This research cluster focuses
on developing intelligent text processing and retrieval
technologies, and integrating them into an advanced
search agent system and text mining tool bench. The
search agent system being developed will function as
a meta-search engine to perform intelligent retrieval
in Web search engines, digital libraries and textual
databases, and perform intelligent analysis and presentation
of the information retrieved from these sources. The
text mining tool bench will be developed to be used
by social science researchers to perform computer-assisted
content analysis of text.
Most search engines merely retrieve potentially relevant
documents and display them in rank order, without performing
sophisticated query processing, search strategy development,
analysis of search results, or user modeling. The search
agent to be developed by this research cluster will
have the following capabilities:
1. Query processing and formulation
- Collaborative querying: this technology
seeks to improve a user’s search query by mining
queries submitted by previous users and the search
results retrieved by those queries.
- Knowledge-based query expansion and query categorization:
a knowledge base of keyword-subject heading association
and keyword-subject classification association has
been developed by mining 16 years of Library of Congress
book records. This can now be used to expand a user’s
query with related Library of Congress subject headings
and related keywords, as well as identify the subject
area of a query.
- Expert system for Boolean search strategy formulation:
an expert system for formulating a search strategy
for Boolean retrieval systems has earlier been developed.
This can be integrated into a search agent to apply
to a larger number of Boolean retrieval systems.
2. Search result processing and mining
- Collaborative filtering: further filtering,
clustering and re-ranking of documents retrieved by
mining previous queries submitted by other users and
their search results.
- Multi-document summarization: this summarizes
a set of research abstracts retrieved by a search
engine into a single summary, highlighting common
and unique research concepts, methods and findings
across the abstracts retrieved.
- Information extraction: whereas search
engines retrieve whole documents, information extraction
technology analyzes the text to identify and extract
particular facts, e.g. names of terrorists, treatments
for a particular disease, etc. A technology is being
developed to help users to develop linguistic patterns
for extracting different types of facts.
- Automatic text clustering and categorization:
search engines mainly display a rank list of documents
retrieved, though more advanced search engines and
search agents cluster and categorize documents into
subject groups. Several types of text categorization
technology are being developed for topical categorization,
genre categorization, sentiment categorization and
document clustering, which can usefully be incorporated
into the search agent to help users to zoom into documents
of interest.
- Link and network analysis: Link and network
analysis technology maps out networks of researchers,
documents and concepts, and can be used to identify
important documents as well as related documents.
3. Search result presentation and interface
design
Several types of human-computer interaction and interface
design studies are being carried out. Specific types
of interfaces being developed include:
- Mobile interfaces for retrieving information on
small-screen mobile devices
- Multi-level interactive interface for displaying
multi-document summarizes
- Visualization interface for displaying document/concept
networks and social networks
- Children’s interfaces
- Virtual reality interface
- Information retrieval interfaces for subjective
relevance judgment and processing.
4. User modeling and profiling
Current research in user mental models, subjective
relevance, and children’s information processing
can be used to develop user profiling and personalization
technologies for more personalized information retrieval
and processing in the search agent.
5. Personal information management
Advanced search agents should provide facilities
for the user to archive and manage the documents and
information retrieved. Current research in Web annotation
and digital archiving will develop technologies to
allow users to annotate and archive documents. These
techniques can also assist to automatically seek and
retrieve other related documents in the Internet to
augment the document sets. In addition, text categorization
and link analysis technologies mentioned earlier will
help users to organize the documents archived.
The text mining and digital intelligence technologies
developed in this research cluster is potentially useful
for computer-assisted content analysis in social science
research. We propose to integrate the technologies into
a text mining tool bench with a unified Web interface
tailored for social science researchers. |
| |
|
| Staff
Members |
| • |
A/P Christopher Khoo |
| • |
A/P Dion
Goh |
| • |
Prof Schubert Foo |
| • |
A/P Theng Yin Leng
|
| • |
Ast/P Na Jin Cheon |
| • |
Dr Paul Wu Horng Jyh |
| • |
Ast/P Chang Yun Ke |
| • |
A/P Ravi Sharma |
|
| |
|
|
| Research
Projects and Grants |
|
| |
| Postgraduate
Student Projects |
|
| back to top
|
|
Title
of Project: G-Portal: A Digital Library Infrastructure
for Distributed Geospatial Information
Investigators: A/P Lim Ee
Peng (School of Computer Engineering), A/P Dion Goh, A/P
Theng Yin Leng
Funding: SingAREN
Description: G-Portal is an on-going
digital library project at the School of Computer Engineering
in Nanyang Technological University and staff at the Division
of Information Studies. The aims of the project include
identification, classification and Organisation of geospatial
and georeferenced content on the Web, and the provision
of digital services such as searching and visualisation.
In addition, authorsed users may also contribute resources
so that G-Portal becomes a common environment for knowledge
sharing.Research areas that this project addresses include: |
| • |
the development
of a reusable software architecture for building
geospatial digital library applications |
| • |
usability issues related to
designing interfaces for access to geospatial information |
| • |
querying of geospatial data |
| • |
classification of geospatial
information |
| • |
knowledge sharing and community
building |
|
|
| back to top
|
|
Title
of Project: GeogDL: A Digital Library
for Geography Examination Resources
Investigators:
A/P Dion Goh, A/P Theng Yin Leng, A/P Lim Ee Peng (School
of Computer Engineering)
Funding: SingAREN
Description:
GeogDL is a digital library application built
above G-Portal. The aim of this project is to assist
students in revising for the GCE 'O' Level Geography
Examination - an annual national examination conducted
by the Ministry of Education in Singapore. The digital
library contains past-year examination questions and
solutions supplemented with additional geographical
content for students to explore.
Research issues being addressed include: |
| • |
metadata
models for describing educational content |
| • |
user
interface design |
| • |
collaborative
environments for authoring and sharing of information |
|
|
| back to top
|
|
Title
of Project: MobiTOP: A System for the
Mobile Tagging of Objects and People.
Investigators: A/P Dion
Goh, A/P Theng Yin Leng, A/P Lim Ee Peng (SCE), Ast/P Sun
Aixin (SCE), A/P Kalyani Chatterjea (NIE), Ast/P Chang Chew
Hung (NIE)
Funding: A*STAR
Description: An
A*STAR funded project to develop techniques for the creation,
management, analysis and discovery of mobile tags, which are
media-rich information applied to real-world objects and people.
Research areas include user profiling, tag modeling and recommendation,
and user interface design. These deliverables will culminate
in the implementation of a mobile tagging system known as
MobiTOP (Mobile Tagging of Objects and People). Working with
pedagogy experts, the system will be deployed and tested in
the context of geography education. The project draws upon
earlier G-Portal work on geospatial data management and visualization.
|
|
|
Title
of Project: Collaborative Querying in Web-based
and Mobile Environments
Investigators: A/P
Dion Goh & Prof Schubert Foo
Funding: NTU AcRF funding
The objectives of the project are to design
and implement tools and techniques to support collaboration
in information retrieval environments. Known also as
collaborative querying, this approach aims to assist
users in formulating queries to meet their information
needs by harnessing other users’ expert knowledge
or search experience. The project will: 1. Develop and
evaluate algorithms for collaborative querying and mining
of query logs using supervised and unsupervised machine
learning techniques; 2. Identify information needs from
query logs; 3. Design and evaluate user interfaces for
collaborative querying. A collaborative querying system
for Web and mobile environments will be implemented,
including a suite of tools for automatic preprocessing
of query logs, mining of queries for collaborative querying,
information retrieval functions, and user interfaces.
User evaluation will also be conducted to ensure that
the system is both useful and usable. |
|
| back to top
|
|
Title
of Project: An Information Retrieval
Portal
Investigators:
Prof Schubert Foo and A/P Dion Goh
Description:
Currently, information retrieval resources are scattered
about various web sites making it difficult for researchers
to efficiently access them. In addition, while Java
is fast becoming a popular language among developers,
there are very web sites offering Java-based information
retrieval source code. This project thus aims to develop
a portal for devoted to information retrieval with emphasis
on source code in the Java programming language.
The project uses a Java-based open source portal solution
named JetSpeed that is part of the Apache project. Consequently,
while the creation of the portal is a major goal, this
project also aims to build a comprehensive, extensible
portal infrastructure based on JetSpeed that is reusable
across various domains. |
| • |
Identifying areas
of improvement in JetSpeed |
| • |
Identifying areas of improvement
in JetSpeed |
| • |
Implementation of a document
publication and review system |
| • |
Development of annotation and
rating/voting systems |
| • |
Development of an extensible
architecture for interfacing with different full
text retrieval engines |
|
|
| back to top
|
|
Title
of Project: A Digital Library of
Historical Resources
Investigators:
The Division of Information Studies and National Archives
of Singapore
Description:
This is a Division-wide project that is being conducted
in collaboration with the National Archives of Singapore.
The project seeks to build a Web-based digital library
of Singapore's history, containing historical multimedia
resources obtained from the NAS. Such resources are
broadly categorised into textual documents, images,
audio and video.
In addition to delivering a system for public use, this
project will also utilise these multimedia resources
as a test-bed for conducting exploratory research and
building advanced systems in a variety of areas. These
areas are intentionally broad to leverage on the strengths
of the Division, and include: |
| • |
Digital library
architectures |
| • |
Information Organisation and
metadata |
| • |
Information retrieval algorithms
and engines |
| • |
Information exploration environments
|
| • |
Authoring and publishing systems
for user-contributed resources |
| • |
Online exhibitions |
| • |
E-learning systems |
| • |
Usability studies |
|
|
| back to top
|
|
Title
of Project: Generating Executable Cognitive User Models.
Investigators: A/P
Theng Yin Leng
To reduce the use of extensive and time-consuming
real users testing ubiquitous learning systems, a tool is
being developed to automatically generate executable cognitive
user models to simulate a real user’s behaviour, as
a cost-effective means to rapidly iterate and test system
design and detect usability problems in web-based systems.
Executable cognitive user models are software agents that
simulate real end-users’ behaviour, as well as predict
end-users’ performance. The objectives of the project
are: 1. To investigate the potential of embedding theories
and models of human cognition and artificial intelligence
in a tool for constructing executable cognitive user models;
2. To specify the requirements of such a tool for an effective
and practical evaluation of web-based systems; 3. To determine
how executable cognitive user models can be investigated using
software agent technologies throughout the design process;
and 4. To investigate how executable cognitive user models
can be effectively combined with user testing to achieve the
best results. |
|
|
| Title
of Project: Design and Development of a Suite of Usability
Engineering Tools for Digital Libraries on Mobile Environment
and the Web
Investigators:
A/P Theng Yin Leng & A/P Dion Goh
Funding: NTU AcRF
Description: Institutions are spending millions
of dollars implementing digital libraries (DLs) and Web portals.
However, many studies have found the usability and effectiveness
of current DLs and portals to be poor. Although there has
been some research conducted over the last few years in understanding
user needs of text-based and geospatial DLs, there is little
work done in helping to make the usability evaluation process
of DLs less cumbersome and tedious. Better tools and techniques
are needed to help DL designers evaluate their systems in
ways that will improve usability to enhance users' experience
of DL collections and products. This project investigates
usability engineering techniques, a combination of qualitative
and quantitative techniques, applicable not only for text-based
DLs but also for geospatial DLs, on the Web as well as the
mobile environments. DLs of universities, public libraries
and national libraries have large user populations, in tens
and hundreds of thousands of users. Improvements in DL design
can have a major organisational, national and international
impact.
Collaborators: Recognising its importance, this proposal
has the support of the NTU library and the National Library
Board (NLB). Two research centres at NTU, Centre for Human
Factors and Ergonomics (CHFE, MPE) and Centre for Advanced
Computer Information Systems (CAIS, SCE), and the University
of Waikato (New Zealand), are internal and external collaborators
working with the project team to exploit the potential of
applying this research to the mobile environment, which is
fast becoming the popular platform for systems delivering
"on-demand" use.
|
|
|
Title
of Project: Bootstrapping a Machine
Translation Dictionary for Cross-Language Information
Retrieval Using A Comparable Corpus
Investigators:
A/P Christopher Khoo & A/P Chan Syin (School of
Computer Engineering)
Description:
In a multilingual information retrieval system (e.g.
multilingual Web search engines), cross-language searching
capability which permits the user to specify queries
in the user's native language but retrieve documents
in other languages is essential. Other researchers have
developed translation dictionaries for cross-language
retrieval by performing statistical analyses of parallel
corpora -- document collections in which each document
in one language has a sentence-by-sentence translation
in a second language. This study aims to develop a method
for constructing a translation dictionary in a situation
where there is no parallel corpus, but there are nevertheless
documents in both languages reporting the same event,
e.g. news articles in different language newspapers
reporting the same event.
This study seeks to develop a method for bootstrapping
a English-Chinese and English-Malay translation dictionary
using a training sample of manually paired English-Chinese
and English-Malay documents. The system first analyses
the set of manually paired English-Other Language articles
to construct a preliminary translation dictionary, and
then use this preliminary dictionary to identify other
pairs of English-Other Language articles. It then performs
"self-learning" by analysing these new pairs of articles
to improve its dictionary. Cross-language retrieval
experiments will be carried out to test the effectiveness
of such a dictionary. |
|
| back to top
|
|
Title
of Project: ACRC Digital Library
Investigators: A/P
Dion Goh & Prof Schubert Foo
Funding: WSCI and NTU Library
The ACRC plans to transform itself into an important
regional hub and one-stop center housing quality resources
in the specialized areas of Media, Communication and Information.
The purpose of the project is develop a digital library for
the ACRC to host its electronic collection that includes grey
literature, published literature in media, communication and
information, and to use the digital library as a platform
for conducting research in areas such as information retrieval,
information extraction, information organization, data/text
mining, collaborative systems, knowledge sharing, etc. The
digital library will also provide a platform to support knowledge
sharing and publishing by capturing, preserving and communicating
the intellectual output of SCI’s faculty staff and researchers.
Such a DL system can be further exploited to distribute SCI’s
digital works over the Web through a sophisticated search
and retrieval system. Availability and easy accessibility
of ACRC sources for local, regional and international users
would certainly enhance the image of NTU in general, and SCI
in particular.
|
| back to top
|
|
Title
of Project: Intelligent Search Agent for Information Extraction
and Synthesis on the Web
Investigators: A/P
Chris Khoo, A/P Dion Goh & A/P Chan Syin (School of Computer
Engineering)
Funding: NTU AcRF
A project to develop a prototype intelligent
search agent that performs information extraction and synthesis
on the Web. Most Web search engines and intelligent search
agents merely identify potentially relevant documents on the
Web without actually extracting the relevant information from
the text of the documents. Information extraction systems
developed so far require large training sets, are usable only
by experts and take a long time to train. The study seeks
to develop an intelligent information extraction system that
can be trained by ordinary users using a small number of examples
to extract relevant information from multiple Web sites and
integrate the information into a multi-document summary to
aid in knowledge discovery and knowledge acquisition.
|
| back to top
|
|
Title
of Project: Automatic Multi-document Summarization of
Research Abstracts
Investigators: A/P
Chris Khoo, A/P Dion Goh & Dr Paul Wu
Funding: NTU/SCI RCC
The objective of this study is to develop a
method for automatic summarization of sets of sociology abstracts
that might be retrieved by a digital library system or search
engine in response to a user’s query. The purpose of
the multi-document summarizer is to present an overall summary
of the set of documents, highlighting the important concepts
and relations found in them. The method includes an automatic
analysis of the discourse structure of sociology abstracts,
both at the macro-level (between sentences and sections) and
the micro-level (within sentences). The automatic summarizer
focuses on the extraction of variables and semantic relationships
between variables expressed in the text, and the integration
of the extracted information into a coherent summary.
|
| back to top
|
|
Title
of Project: Mining of Disease-Treatment Information in
a Medical Database to Support Evidence-Based Medicine
Investigators: A/P
Chris Khoo, Ast/P Na Jin Cheon & A/P Chan Syin (School
of Computer Engineering)
Funding:
This project seeks to extend automatic information
extraction technology and apply it to the medical domain to
extract disease-treatment information from medical abstracts
to support evidence-based medicine and knowledge discovery.
Current information extraction systems make use of linguistic
patterns and pattern matching to identify the pieces of information
to extract from unstructured text. The extracton patterns
are often constructed automatically by applying a supervised
learning technique on a set of manually annotated training
text. This project seeks to develop a technique to construct
the information extraction patterns without manual annotation
of text by performing text mining, automatic text annotation
and pseudo-supervised learning. The objectives of the project
are:
-
To develop an effective method to mine
information extraction patterns in a medical database
-
To develop a method to construct information
extraction patterns using pseudo-supervised learning and
automated annotation of training text
-
To develop a disease-treatment ontology
to model and represent treatment information found in
medical abstracts, and to summarize the information to
support evidence-based medicine.
|
| back to top
|
|
Title of Project:
Automatic Identification of News Frames Using Machine-Learning
Investigators: Ast/P Na
Jin Cheon & A/P Chris Khoo
Funding: NTU/SCI RCC
This project will develop techniques and a software tool
for automatic news frames analysis – automatically
analyzing news articles and categorizing them into one of
several pre-defined news frames. News framing analysis is
a kind of content analysis of news articles to identify
how the news is framed, including the perspective in which
the events are reported, how information is selected and
organized in the news article, and how the information is
expressed. News frames analysis is intellectual work usually
performed by human analyzers. The tremendous number of news
articles to be analyzed makes manual news frame categorization
a difficult and tedious task. This project thus seeks to
develop a method for automatic news frame categorization
using machine-learning and text mining techniques.
|
| back to top
|
|
Title
of Project: Automatic Sentiment Analysis & Categorization
Investigators: A/P
Chris Khoo & Ast/P Na Jin Cheon
The objective of the project is to develop techniques
for automated or computer-assisted sentiment analysis of various
genres of text. Sentiment refers to a person’s feeling,
emotion or attitude toward a subject, and can cover a variety
of emotional dispositions (e.g. anger, admiration, dislike,
eagerness, etc.). The appraisal theory (Rothery, 1997; Martin,
1995), which is based on the principles of Systemic Functional
Linguistics, is adopted as a framework for the study for its
clear explication of how sentiment is expressed in language.
It divides appraisal into Attitude, Engagement and Graduation,
with Attitude further divided into Affect (emotion), Judgment
(ethical/social evaluation) and Appreciation (aesthetic assessment).
Current work is focused on:
-
automatic categorization of product reviews
into positive (favorable/recommended) versus negative
(unfavorable/not recommended) sentiment
-
development of a sentiment meta-search
engine to identify documents and document snippets reporting
product reviews and categorizing them into positive and
negative reviews
-
automatic sentiment analysis of polical
news articles using a framework based on the appraisal
theory.
|
| back to top
|
|
Title
of Project: Text Annotation and Encoding Tool for Content
and Linguistic Analysis
Investigators: Dr
Paul Wu
Funding: NTU/SCI RCC
The purpose of the project is to develop a Web-based
software tool and graphical interface to enable researchers
to mark-up and annotate text, encode the annotation in an
XML format, store the annotation for further processing, and
display the annotation in a number of visual formats. The
text annotation tool will be designed to be general and powerful
enough to handle most types of content analysis and linguistic
analysis. The tool will handle several independent layers
of annotation, hierarchical annotation (where primitive units
are grouped to form more complex units), and overlapping annotations.
Such an annotation tool will be useful for many types of research
– content analysis, linguistic analysis, text analysis,
creating training documents for text mining and information
extraction, etc. A powerful text annotation tool, grounded
on a good representation formalism, is needed because a deeper
level of content analysis involves a deeper level of linguistic
coding. The validation of content analysis results also requires
evidence presented in linguistic coding.
|
| back to top
|
|
Title
of Project: Web Archiving
Investigators: Dr
Paul Wu
Funding: National Library Board
The Internet has increased the proliferation
of online publication and community worldwide. Due to the
fragility of digital medium, new approach needs to be developed
for preservation for future generations; a task imperative
in capturing a record of contemporary digital culture and
heritage. This project develops a digital repository and Web
annotation and cataloguing system for Web archives. By applying
intuitive mechanism, evidence of the subject matter and contextual
information will be captured as metadata. The metadata further
serves as evidence to monitor substantial changes of websites.
In sum, the objectives of the project are two fold:
• Evidence-based cataloging: Allow users to effectively
catalog websites collection for archival and preservation
purposes, reducing the turn-around time, producing verifiable
catalogue data/metadata
and thus, increasing the quality of the catalogue.
• Dynamic web content monitoring: Minimize the manual
efforts required to maintain the catalogue data/metadata of
the web archives through automatically verifying and monitoring
the dynamic changes of websites, filtering away unnecessary
attention paid to scrupulous changes and alerting only substantive
ones that need to be attended to.
|
| back to top
|
|
Postgraduate
Student Projects M.A.Sc.
and Ph.D. Projects |
| • |
Automatic Sentiment
Analysis of News Articles
Student: Armineh Nourbaksh (M.A.Sc. student)
Supervisor: A/P Christopher Khoo & Ast/P Na
Jin Cheon |
| • |
Automatic Information Extraction
and Text Mining in Medical Abstracts
Student: Wang Wei (M.A.Sc. student)
Supervisor: A/P Christopher Khoo & Ast/P Na
Jin Cheon |
| • |
Concept-based Information Retrieval
Student: Yin Ming
Supervisor: A/P Dion Goh and A/P Lim Ee Peng |
|
| |
| Completed
Theses |
| • |
Collaborative Querying through
the Mining of Query Logs
Student: Fu Lin (PhD, 2006)
Supervisor: A/P Dion Goh and Prof Schubert Foo |
| • |
Automatic Multi-Document Summarization
Using a Variable-Based Framework
Student: Ou Shiyan (PhD, June 2006)
Supervisor: A/P Christopher Khoo and A/P Dion Goh |
| • |
An Intelligent Monitoring Service
for Web Monitoring
Student: Tan Bing (M.A.Sc., 2001)
Supervisor: Prof Schubert Foo |
| • |
Chinese Text Segmentation for
Information Retrieval
Student: Li Hui (M.A.Sc., 2000)
Supervisor: Prof Schubert Foo |
| • |
Developing a New Statistical
Method for Chinese Text Segmentation
Student: Dai Yubin (M.A.Sc., 2000)
Supervisor: A/P Christopher Khoo |
| • |
Automatic Extraction of Cause-Effect
Information from Medical Abstracts
Student: Niu Yun (M.A.Sc., 2000)
Supervisor: A/P Chan Syin & A/P Christopher
Khoo |
| • |
Combining Multiple Sources of
Evidence for Information Retrieval
Student: Xi Wensi (M.A.Sc., 2000)
Supervisor: A/P Lim Ee Peng & A/P Christopher
Khoo |
| • |
Enhancing Play-out
Performance for Internet Video Communications.
Student: Yip See Wai (M.Phil., May 1999)
Supervisor: Prof Schubert Foo |
| • |
Chinese Text Retrieval System
Student: Lim Hong Koon (M.Phil., May 1999)
Supervisor: Prof Schubert Foo |
| • |
An Intelligent Web-based Helpdesk
for Customer Service Support
Student: Liu Shigong (M.A.Sc., May 1999)
Supervisor: Prof Schubert Foo |
| • |
Evaluation of Web-Based
Online Catalogue Interfaces : A Cognitive Approach
Student: Cheng Lu (M.A.Sc., May 1999)
Supervisor: A/P Christopher Khoo |
|
|
| back
to top |
|