Saturday, March 20, 2010
   
TEXT_SIZE
ESRI ArcGIS Web Mapping APIs
GeoEye- Be empowered with accurate, timely and accessible location intelligence.

GeoTemporal Reasoning in a Web 3.0 World

GeospatialEvents.pngThe Semantic Web envisages software agents that know how to reason over activities, events, locations, people, companies, and their inter-relationships. Learning more about customers through behavioral and Activity Recognition is here today through currently available Semantic Technologies and is a showcase for how these technologies will evolve. This article describes real world examples of Activity Recognition using a combination of industry standard RDF and OWL, reasoning with basic Geotemporal primitives and some well-known Social Network Analytics. The Semantic Web envisages software agents that know how to reason over activities, events, locations, people, companies, and their inter-relationships. Learning more about customers through behavioral and Activity Recognition is here today through currently available Semantic Technologies and is a showcase for how these technologies will evolve. This article describes real world examples of Activity Recognition using a combination of industry standard RDF and OWL, reasoning with basic Geotemporal primitives and some well-known Social Network Analytics.

Defining the Semantic Web

This article describes the design and use of a unifying query framework for geospatial reasoning, temporal logic, social network analytics, RDFS and OWL in event based systems. The goal is to provide an understanding of how to express queries that involve RDF/RDFS reasoning, geospatial primitives, temporal primitives and social network concepts. We assume the reader has some introductory knowledge of RDF(S), geospatial concepts and social network analysis.

The reason for such a framework can be answered by looking at the vision of the Semantic Web and understanding how companies use Semantic Technologies. Tim Berners-Lee, James Hendler and Ora Lassila’s Scientific American article (May, 2000) [1] provides a compelling vision of the Semantic Web. It contains some interesting use cases for what the Semantic Web will bring. These use cases assume that software agents know how to roam the web and reason over things, people, companies, relationships between people and companies and about places and events. Clearly these agents need a query capability that supports a combination of description logic, geospatial reasoning, temporal reasoning, and knowledge about the social relationships between people. 

The commercial vendors of Semantic Technologies also see a number of use cases that all center around events and require the aforementioned query capabilities. We currently see companies using large data warehouses with very disparate RDF based triple stores describing various types of events where each event has at least two actors, usually a begin and end time, and very often a geospatial component. These events are literally everywhere: in Health Care applications we see hospital visits, drugstore visits, and medical procedures. In the Communications Industry we see telephone call detail records, now with location. An e-mail and calendar database of a large company is nothing more than a social network database filled with events in time and, in many cases, space. In the Financial Industry every transaction is essentially an event. In the Insurance Industry claims are important events and they desperately need more activity recognition. In the Homeland Security Industry everything revolves around events and actors.

The Semantic Web community has made great strides in the area of ontologies and description logic, and some initial work in the areas of geospatial reasoning [2], temporal reasoning [3], social network analysis [4], and event ontologies [5]. All of this is based on RDF as the data representation. Based on this W3C standard the combination of all these different reasoning capabilities in one unified framework will propel further industry adoption of Semantic Technology.

The Query Components

Here, we show how a user can combine geospatial reasoning, temporal logic, social network analytics, and RDFS reasoning all in one query language. We will focus on the individual components, then combine them to find the group of friends of friends of a person, then find the most important person in that group, and determine if that person attended a meeting that occurred, given the time and place it occurred.

Temporal Reasoning
The temporal reasoning is based on James Allen’s Interval Logic [6]. This logic looks at all the 13 ways two temporal intervals can relate to each other. We provide predicates for each of Allen’s 13 interval predicates.

 temporal_primitives.jpg
 Figure 1. Allen's interval primitives.


Note that we do purely quantitative temporal reasoning. So if you provide a number of events with a start time and an end time or a duration then we can do queries like the following. This example will return all intervals ?i2 that happened in interval ?i1.

(select ?x (interval-during ?i1 ?i2))

Temporal reasoning uses the range queries to the fullest extent. If you want to find all the events that happened between Jan. 1, 2009 and Jan. 2, 2009, the triple store does a straight triple query with only one cursor scan. It is still possible to blow up the query time spectacularly by doing things like

(select (?x ?y) (point-before ?x ?y))

as that will generate every before/after pair. However, we do consider that to be the responsibility of the user. In many cases a query optimizer can warn for that or rearrange the clauses to bind ?x or ?y.

Geospatial Primitives

Our original intention of adding GeoSpatial capabilities was not so much to compete with currently available spatial products but instead make it very easy for RDF users to be able to deal with locations of objects very efficiently. In order to make this fast we implemented a variation of an R-Tree to encode two-dimensional data very efficiently directly in the triple indices [7]. Currently we support a number of predicates that can be used in the query language. Some examples of the predicates:

 (geo-distance ?x ?y ?dist) -> given, x and y, return distance

 (geo-within-radius ?x ?y 10.0) -> find y within 10 miles from x

 (geo-inside-polygon ?polygon ?place ?lon ?lat)

For our benchmarking we use the open source GeoNames database that can be freely downloaded from GeoNames.org [8]. The database contains nearly 7 million points of interest on Earth. From interesting points in nature, to populated areas, to schools and churches etc. Each point has 12 features such as ascii name, the local name, elevation level, longitude, latitude, population, etc. Actually, it is not a database but a csv file that programmers can turn into whatever they like. We obviously turn it into RDF triples. We can retrieve all 459 geo-points around Berkeley less than 4 miles away in less than 5 milliseconds. We would argue that the basic retrieval speed is comparable to or better than full-scale spatial databases. Here are some typical example queries that you can do on the GeoNames database:

Find the distance between Oakland and the one and only Berkeley in California.

(select (?dist)
    (q ?x geo:name “Oakland”)
    (q ?y geo:name "Berkeley")
    (q ?y geo:admin1_code "CA")
    (geo-distance ?x ?y ?dist))

Put in a Google map all the places within 10 miles from Oakland

(google-map (select (?name ?lat ?lon)
             (q ?x geo:asciiname “Oakland”)
             (geo-within-radius ?x ?y 10)
             (q ?y geo:asciiname ?name)
             (q ?y geo:isAt5 ?pos)
             (pos->lon/lat ?pos ?lon ?lat)))

 
 GeospatialEvents.png
 Figure 2. Geospatial Event Plot


Social Network Analysis (SNA)
Many RDF resources are about people and relationships between people, or between people and companies, or between companies and other companies. We added some Social Network Analysis methods to make it easier to reason about relationships and groups. The functions that we provide address the five basic questions from Social Network Analysis. (1) How far is person A from person B, (2) if there is a link between A and B then how strong is this relationship, (3) given a particular actor A, in what group does this actor ‘live’, (4) given an actor in a group, how important is this actor in the group and finally, (5) given a group, how dense are the relationships in the group and does this group have a leader or a set of leaders. The SNA library that we provide is fairly traditional. We provide a set of general function and we have a concept of a generator. A generator is basically a function that takes as an input one node and than creates a set of ouput nodes. The search functions and SNA functions that we provide take these generators as first class arguments. Lets give an example: say we have a database with relationships between people the generator ‘knows’ will take as an input a person and return a set of person by following fr:went-to-dinner-with and fr:went-to-movies in both directions.

(defgenerator knows ()
  (bidirectional fr:went-to-dinner fr:went-to-movies))

We can use this generator to find for example the shortest path between two people, in this case the query will return a list of persons.

(select ?x
   (shortest-path knows fr:Person1 fr:Person2 ?x))

Or we can use the generator to first create a group of friends and friends of friends in the ego-group predicate, and then we find the importance of each member using the actor-centrality measure. This predicate will start with the most important one first.
 
(select ?x
  (ego-group fr:Person1 knows 2 ?group)
  (actor-centrality-members ?group knows ?x))

Many of the centrality measures that are used to compute the importance of an actor in a known group need to compute the shortest path between every actor in the group. We have created special constructors to cache these groups in a transparent way so that most computations can be done in memory. We are satisfied with the current performance but we are also happy there are still many places where we can improve performance.

A Comprehensive Example

In order to provide an impression of the width and the breadth of the query language we give here a typical example that combines geo with temporal and SNA and RDFS reasoning.

(select (?x)
  (ego-group person:jans knows ?group 2)       
  (actor-centrality-members ?group knows ?x ?num)
  (q ?event fr:actor ?x)               
  (qs ?event!rdf:type fr:Meeting)
 (interval-during ?event “2007-12-01” “2007-12-31”)
  (geo-box-around geoname:Berkeley ?event 5 miles)   
  !)

In English this translates into: find the group of friends and friends of friends around the person “jans”, find within this group the most important person first. Find if this person was part of an event that was of type Meeting, and happened in a particular time interval close around Berkeley. Note that we seamlessly mix Social Network Analysis in the first two clauses, a simple database look up in the third, an RDFS inference about the type of event, and then a temporal and a geospatial constraint.

 socialnetwork.png
 Figure 3. Social Network Graph Viewer

This example shown above is done in Prolog. We expect early 2009 to have a SPARQL that will allow doing this identical query. The syntax of the SPARQL query will be slightly more contrived due to the fact that SPARQL normally only allows patterns that map directly on triples. Note that we introduced the non-standard ‘=’ or assignment construct.

select ?x where {
  ?group = ego-group(person:jans knows 2) .       
  ?x = actor-centrality-members(?group knows ?x) .
  ?event fr:actor ?x ;
       rdf:type fr:Meeting .
  FILTER (interval-during ?event '2007-12-01' '2007-12-31')
  FILTER (geo-box-around geoname:Berkeley ?event 5 miles)   
}

Query Optimization
The primary research effort for the current version of the query framework is to enhance query-optimization. Notice that in the example shown above, most clauses are not direct matches against the database but functors that do computations. Some of these functors can act both as generators and as filters (as is common in Prolog). In case a functor acts as generator we need to research better statistical predictions for how many solutions can be expected so that we can do better re-ordering of clauses.

Jans Aasman Ph.D. is president and CEO of Franz Inc., Oakland, Calif.; e-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

References

[1] 1st Scientific American article on the Semantic Web

[2] W3C Geospatial Incubator Group

[3] Gutierrez, C., Hurtado, C., and Vaisman, A. Temporal RDF . In European Conference on the Semantic Web (ECSW’05) (Best paper award), pages 93–107, 2005

[4] Mika, P.: Social Networks and the Semantic Web. Springer (2007)

[5] Raimond, Y. Abdallay, S., Event Ontology , 2007

[6] Allen, J.F.: Time and Time Again: The Many Ways to Represent Time. International Journal of Intelligent Systems, Vol. 6, No. 4 (1991)

[7] Wikipedia R-tree data structure

[8] GeoNames Data Access

Comments (0)
Write comment
Your Contact Details:
Comment:
Security
Please input the anti-spam code that you can read in the image.
AAG2010
ad_Geosiberia_2010
ad_realcorp_conference
by Zaragoza Online