OceanWorks / Apache Science Data Analytics Platform (NASA funded)
In typical investigations, oceanographers follow a traditional workflow for using datasets: search, evaluate, download, and apply tools and algorithms to look for trends. While this workflow has been working very well historically for the oceanographic community, it cannot scale if the research involves massive amount of data. SDAP establishes an integrated data analytic center for Big Science problems. It focuses on technology integration, advancement and maturity. SDAP brings together a number of big data technologies including a NASA funded OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data analytic platform), DOMS (Distributed in-situ to satellite matchup), MUDROD (Search relevancy and discovery) and VQSS (Virtualized Quality Screening Service) under a single umbrella.
Mining and Utilizing Dataset Relevancy from Data Access logs, Metadata and User Metrics to Improve Data Discovery (MUDROD) (NASA AIST funded)
MUDROD is focused on mining oceanic knowledge from the PO.DAAC user log files to improve the end user data discovery experience at PO.DAAC. There are three steps in the research: a) the oceanographic semantics were extracted from three resources of SWEET, GCMD ontology, and the keywords used by end users for searching PO.DAAC datasets, b) mining the linkage among different vocabularies based on user data discvoery sessions, and c) build the linkage among vocabularies based on a comprehensive approach by considering domain de facto standard, e.g., SWEET and GCMD, and the knowledge mined from the log files. The semantics is used to improve data discovery for ranking results, navigating among vocabularies, and recommending data based on user searches.
A Knowledge Gateway for Smart Management and Discovery of Planetary Defense (PD) Information (NASA funded)
PD aims at developing a planetary defense knowledge discovery engine to better assist the development and integration of a near Earth object (NEO) responding system. This knowledge discovery engine will serve as a cyberinfrastructure building block for conglomerate patches of existing knowledge (e.g. data, service, and model). By integrating, extracting, analyzing, and providing knowledge dispersed throughout different organizations and scientists, this planetary defense web portal is expected to advance discovery, innovation and education across government agencies and scientific communities.
An Automatic Approach to Building Earth Science Knowledge Graph (ESKG) to Improve Data Discovery (ESIP Testbed project)
ESKG proposes to develop an automatic approach to building a dynamic knowledge graph for ES to improve data discovery by leveraging implicit, latent existing knowledge present within the Web Pages of NASA DAACs websites. This project will strengthen ties between observations and user communities by: 1) developing a knowledge graph derived from Web Pages via natural language processing and knowledge extraction techniques; 2) allowing users to traverse, explore, query, reason and navigate ES data via knowledge graph interaction.
- Role: Co-PI and developer
Improve Earth Data Discovery through Deep Query Understanding (ESIP Incubator project)
One longstanding problem in Earth data discovery is understanding the manner in which one uses existing user queries to interpret the user’s search intent. While Google has a “did you mean this…” feature, other search engines are lacking in such technology, especially with regard to the utilization of e.g., fuzzy logic. To fill this gap, we therefore propose to develop a query understanding tool to better interpret users’ search intents for Earth data search engines by mining metadata and user query logs.
Polar CI: A Cloud based Polar Resource Discovery Portal (NSF and Microsoft funded)
The Polar CI Portal is a one stop portal that makes it easy for users to discover and access polar-related geospatial resources available across different online environments. In general, (1) a lightweight web engine framework is developed to search geospatial resources; (2) a data warehouse is built to harvest, store, search, and distribute geospatial information; (3) a harvesting middleware is established to harvest data from different online environments and convert them into formats that data warehouse supports; (4) a semantic-based query statement refinement is proposed to improve recall level and precision; (5) sophisticated functionalities (e.g. service quality monitoring and polar viewer) are used as a means of improving user experience and assisting decision-making; and (6) a quality of service engine is used to provide users with service quality information
Ecological Service Modeler (ESM) of IDRISI Geospatial Software
The Earth Trends Modeler (ETM) is an integrated suite of tools within TerrSet for the analysis of image time series data associated with Earth Observation remotely sensed imagery. With Earth Trends Modeler, users can rapidly assess long term climate trends, measure seasonal trends in phenology, and decompose image time series to seek recurrent patterns in space and time.
- Role: Technical lead
- ESM website
Global Diffusion Pattern and Hot Spot Analysis of Vaccine-Preventable Diseases
Spatial characteristics reveal the concentration of vaccine-preventable disease is in Africa and the Near East and that disease dispersion is variable depending on disease. The exception is whooping cough, which has a highly variable center of concentration from year to year. Measles exhibited the only statistically significant spatial autocorrelation among all the diseases under investigation. Hottest spots of measles are in Africa and coldest spots are in United States, warm spots are in Near East and cool spots are in Western Europe. Finally, cases of measles could not be explained by the independent variables, including Gini index, health expenditure, or rate of immunization.
- Role: Project lead
- Related paper
Automated Mobility Mode Detection based on GPS Tracking Data
New developments in global positioning systems (GPS) have facilitated the continuous collection of highly accurate locational data for moving objects including humans. The mobility mode implied in volunteered raw GPS tracking data can provide us with valuable information to understand the user. In this paper, we propose an approach that can be used to detect mobility mode by mining/analyzing the geographic location, duration, speed as well as spatial context information in the data. Other ancillary GIS layers such as building footprints are also included to help classify the data. Our approach has been tested on three datasets. Five mobility modes: “transporting”, “parking”, “walking”, “roaming” and “indoor” are detected in our results. A quantitative comparison with the human interpretation results suggested an overall accuracy of 95%. The proposed approach is expected to facilitate the research of many geographic communities and development of location-based service.