Yuanyuan is currently a Research Staff Member at IBM Almaden Research Center. She received her PhD degree in Computer Science & Engineering in 2008 and MS degree in Computer Science & Engineering in 2005 both from University of Michigan, and BS degree in Computer Science & Technology with honor in 2003 from Peking University.
SQL-on-Hadoop, Big Data Federation, and HTAP (Hybrid Transactional and Analytics Processing): Yuanyuan has been working on Big Data since joining IBM Research. She has been collaborating with IBM software group and her research is closely tied to the IBM Infosphere BigInsights product. Her studies in this area include efficient join algorithms on Hadoop, co-location of data on HDFS to speed up joins and group-by operations, efficient joins in the Hybrid Warehouse environment (joining data from Hadoop with data from Enterprise Data Warehouse), and integrating SQL with analytics for Big Data. Her recent project is on supporting HTAP on top of the Spark platform.
Selected Papers: HTAP on Spark (CIDR'17, SIGMOD'16 Demo), Joins for Hybrid Warehouses (TODS'16, EDBT'15), Integration of SQL and Analytics (EDBT'15), CoHadoop (PVLDB'11), Hadoop Joins (SIGMOD'10)
Graph Analytics: Yuanyuan has long standing interests in graph analytics. Her current research includes building distributed graph-processing systems, designing distributed graph algorithms, and social network analysis. Her PhD thesis is on querying graph databases. The tools produced from her research have been widely applied in the National Center for Integrative Biomedical Informatics (NCIBI).
Graph Processing/Databases Papers: Dynamic Graph Analysis (ICDE'15), Giraph++ (PVLDB'13), Graph Summarization (CIKM'14, ICDE'10, SIGMOD'08), Graph Matching (ICDE'08, Bioinformatics'07)
Social Network Analysis Papers: Topic-Specific Influence Analysis (WSDM'14), Event-Based Social Network (SIGKDD'12).
Large-Scale Systems for Machine Learning: Yuanyuan is the co-inventor and was a lead developer for a large-scale machine learning sysytem, called SystemML. It is now open sourced. In SystemML, machine learning algorithms are expressed in a high-level language, and then automatically compiled and optimized into a set of efficient MapReduce jobs on a cluster of machines.
Selected Papers: SystemML on YARN (SIGMOD'15), SystemML Optimizer (IEEE DE Bulletin'14), ParFor in SystemML (PVLDB'14), Numerical Stability in SystemML (ICDE'12), SystemML Archtecture (ICDE'11).
2016 Outstanding Technical Achievement Award for the work in join algorithms for big data, IBM Research - Almaden
2016 Eminence & Excellence Award, IBM Research - Almaden
2015 IBM A-Level Accomplishment for the work in join algorithms for big data, IBM Research - Almaden
2015 IBM A-Level Accomplishment for the contributions to the SystemML project, IBM Research - Almaden
2013 High Value Patent Application Award, IBM Research - Almaden
2012 Eminence & Excellence Award, IBM Research - Almaden
2011 Eminence & Excellence Award, IBM Research - Almaden
2008 Distinguished Achievement Award, University of Michigan
2007 2nd Place, CSE Honor Competition, University of Michigan
2007 Rackham Predoctoral Fellowship, University of Michigan
2003 Rackham Graduate Fellowship, University of Michigan
Journal Editor: Associate Editor for VLDB 2018.
Workshop Chair: 3rd Workshop on Large Scale Network Analysis (LSNA 2014), 5th Workshop on Graph Data Management (GDM 2014), 2nd Workshop on Large Scale Network Analysis (LSNA 2013), 4th Workshop on Graph Data Management (GDM 2013), 1st Workshop on Large Scale Network Analysis (LSNA 2012)
- NSF Advisory Panel, 2013 & 2016.
- NSF Career Mentoring Panel, ICDE 2012.
My 2 cents on How to Be Competitive for Industrial Research Jobs presented in this career panel.
PC Member: VLDB 2017, VLDB 2016 Industrial Track, TKDE 2016 Poster Track, VLDB 2015, ICDE 2014, WISE 2013, SIGMOD 2012, GDM 2012, VLDB 2011 Industrial Track, DBSocial 2011, GDM 2011, ICDE 2011, GDM 2010, VLDB 2009.
Reviewer for Journals: VLDB Journal (2014, 2017), TODS (2013, 2015), Statistical Analysis and Data Mining (2009), Information System (2010, 2011, 2013), ACM Transactions on Intelligent Systems and Technology (2010), Distributed and Parallel Databases (2012).
Reviewer for Books: Data Processing Techniques in The Era of Big Data.
Reviewer for Research Grants: Research Grants Council (RGC) of Hong Kong (2010, 2011).
Reviewer for Awards: The NCWIT Award for Aspirations in Computing.
Giraph++: From "Think Like a Vertex" to "Think Like a Graph", Facebook, Nov 2013.
Large Scale Topic-specific Influence Analysis on Microblogs, UC Santa Barbara, May 2013.
Large Scale Topic-specific Influence Analysis on Microblogs, UC Santa Cruz, May 2013.
SystemML: Large Scale Machine Learning on MapReduce, Peking University, Beijing, China, Aug 2012.
SystemML: Large Scale Machine Learning on MapReduce, IBM China Research Lab, Beijing, China, Aug 2012.
SystemML: Large Scale Machine Learning on MapReduce, University of Maryland, College Park, Maryland, Apr 2012.