About Engineering & Science Print   
Overview
News & Events
Worldwide Partnerships
Visit Stevens
Faculty & Staff
Dean's Office
Alumni
History
Contact/Search

©Copyright 2007
Stevens Institute of Technology

 
Schaefer School of Engineering & Science News & Events  

        
Back 


May 13, 2009

Streaming Techniques for Statistical Modeling

Speaker:
Dr. Yihua Wu
Google, Inc.

Time: Wednesday 05/13/2009 3-4PM
Location: Babbio 110

Biography:

Dr. Yihua Wu received her PhD in Computer Science from Rutgers, the State University of New Jersey in 2007 and has been working in Google Inc. New York since then.  Her research interests are streaming techniques for statistical modeling of massive data with applications to databases and networking areas.  During her PhD, she extensively studied i) parametric modeling of skewed data sets; ii) graph modeling of individual's communication patterns; iii) sequential change detection on data streams.  Dr. Yihua Wu spent years of her PhD collaborating with researchers from AT&T Shannon Labs, Telcordia Applied Research, Narus Inc. to develop space- and time-efficient streaming algorithms on real world data sets and is holding two patents on that.  While working at Google, she designs and develops features and models to improve search quality.

Abstract:

Streaming is an important paradigm for handling high-speed data sets that are too large to fit in main memory. Prior work in data streams has shown how to estimate simple statistical parameters, such as histograms, heavy hitters, frequent moments, etc., on data streams. This talk focuses on a number of more sophisticated statistical analyses that are performed in near real-time, using limited resources.

I will first present how to model stream data parametrically; in particular, we fit hierarchical (binomial multifractal) and non-hierarchical (Pareto) power-law models on a data stream. It yields algorithms that are fast, space-efficient, and provide accuracy guarantees. I also designed fast methods to perform online model validation at streaming speeds. Then I studied the detection of changes in models on data with unknown distributions. I adapt the sound statistical method of sequential probability ratio test to the online streaming case, without independence assumption. The resulting algorithm works seamlessly without window limitations inherent in prior work, and is highly effective at detecting changes quickly. Furthermore, I formulated and extended our streaming solution to the local change detection problem that has not been addressed earlier.

As concrete applications of our techniques, we complement our analytic and algorithmic results with experiments on network traffic data to demonstrate the practicality of our methods at line speeds, and the potential power of streaming techniques for statistical modeling in data mining.

For more information, please contact:

Yingying Chen
Assistant Professor & NIS Graduate Program Director
Burchard
Room 210
Phone: 201.216.8066
Fax: 201.216.8246
yingying.chen@stevens.edu

Dept_Seminar_0513

               
Email  

Send this page to a friend

 
 
Stevens Main SiteWeb CampusOffice of the ProvostCollege of Arts & LettersSchool of Technology ManagementSES Webmaster
Stevens Institute of Technology | 1 Castle Point on Hudson, Hoboken, NJ 07030 | Phone: 201.216.5263 | Fax: 201.216.8909