Some Empirical Observations of Semantic Technology

1. Applications validate the importance of ontology in the current
semantic approaches. An ontology represent a part of the domain or
the real-world for which it represents and captures a shared knowledge
around which the semantic application revolves. It is the “ontological
commitment” reflecting agreement among the experts defining the ontology
and its uses, that is the basis for “semantic normalization” necessary
for semantic integration.

2. Ontology population is critical. Among the ontologies developed by
Semagix or using its technology, median size of ontology is over 1 million
facts. This level of capture of knowledge makes the system very powerful.
Since it is obvious that this is the sort of scale Semantic Web applications
are going to be dealing with, means of populating ontologies with instance
data need to be automated.

3. Two of the most fundamental “semantic” techniques are named entity, and
semantic ambiguity resolution (Also closely tied to data quality problem).
Any semantic technology and its application. Without good solutions to these
none of the applications listed will be of any practical use. For example,
a tool for annotation is of little value if it does not support ambiguity
resolution. Both require highly multidisciplinary approaches, borrowing for
NLP/lexical analysis, statistical and IR techniques and possibly machine
learning techniques.

4. Semi-formal ontologies that may be based on limited expressive power are
most practical and useful. Formal or semi-formal ontologies represented in 
very expressive languages (compared to moderately expressive ones) have in 
practice, yielded little value in real-world applications. One reasons for this
may be that it is often very difficult to capture the knowledge that uses the 
more expressive constructs of a representation language. This difficulty is 
especially apparent when trying to populate an ontology that uses a very 
expressive language to model a domain. Hence the additional effort in modeling
these constructs for a particular domain is often not justifiable in terms of the
gain in performance. Also there is widely accepted trade-off between expressive
power and computational complexity associated with inference mechanisms for such 
languages. Practical applications often end up using languages that lie closer to
less expressive languages in the “expressiveness vs. computational complexity 
continuum”. This resonates with so-called Hendler’s hypothesis (“little 
semantics goes a long way”).

5. Large scale metadata extraction and semantic annotation is possible. Storage and
manipulation of metadata for millions to hundreds of millions of content items requires
best applications of known database techniques with challenge of improving upon them
for performance and scale in presence of more complex structures. 

6. Support for heterogeneous data is key – it is too hard to deploy separate products
within a single enterprise to deal with structured and unstructured data/content 
management. New applications involve extensive types of heterogeneity in format, media
and access/delivery mechanisms (e.g., news feed in NewsML news, Web posted article in 
HTML or served up dynamically through database query and XSLT transformation, analyst
report in PDF or WORD, subscription service with API-based access to Lexis/Nexis, etc).
Database researchers have long studied the issue of integrating heterogeneous data, and
many of these come handy.

7. Semantic query processing with the ability to query both ontology and metadata to
retrieve heterogeneous content is highly valuable. Analytical applications could require
sub-second response time for tens of concurrent complex queries over large metadata base
and ontology, and can benefit from further database research. High performance and highly
scalable query processing that deal with more complex representations compared to database
schemas and with more explicit role of relationships, is important. Database researcher
can also contribute to the strategies of dealing with large RDF stores.

8. A vast majority of the Semantic (Web) applications that have been developed or
envisioned rely on three crucial capabilities namely ontology creation, semantic 
annotation and querying/inferencing. Enterprise scale application share many requirements
in these three respects with pan Web applications. All these capabilities must scale to 
millions of documents and concepts (rather than hundreds to thousands). Main differences
are in the number of content sources and the corresponding size of metadata.

, , , , , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: