Invited Speakers

Prof. Shinichi Morishita (The University of Tokyo)

	Title	Searching Massive Epigenome Data for Evolutionarily Conserved Sequence Motifs
	Abstract	The epigenome, such as nucleosome structure and DNA methylation, regulates expression of genes. Searching for evolutionarily conserved sequence motifs essential for controlling the epigenome is a fundamental problem in biology. Collecting massive epigenome data has been becoming increasingly feasible because of the wide-spread availability of next-generation sequencing technology. Thus, there have been growing interests in the genome-wide analysis of the epigenome. There are some issues to be resolved. Care has to be taken to select samples so as to reduce false-positive findings. Processing enormous epigenome data is a computationally intensive task and needs a suite of software techniques such as suffix array, error correction, customizable data visualization, machine learning, and efficient database management. In this talk, I will overview these issues and their solutions, and discuss remaining bioinformatics problems.
	CV	Shinichi Morishita is a professor, Department of Computational Biology, University of Tokyo. He has studied mathematical logic, relational algebra, deductive database, data mining, genome evolution, RNAi, chromatin structure, DNA methylation, and genome sequence processing. He is now leading a group for analyzing personal genomes in University of Tokyo. His recent publications relevant to the subject of the talk are: Kasahara M, (34 authors), Takeda H, Morishita S, Kohara Y, “The medaka draft genome and insights into vertebrate genome evolution,” Nature 447(7145), 714-719 (2007). Nakatani Y, Takeda H, Kohara Y, Morishita S, “Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates,” Genome Research 17(9), 1254-1265 (2007). Saito TL, Morishita S, “Relational-Style XML Query,” ACM SIGMOD, 303-314 (2008). Sasaki S, (15 authors), Fire A, Morishita S, “Chromatin-associated periodicity in genetic variation downstream of transcriptional start sites,” Science 323(5912), 401-404 (2009). Qu W, Hashimoto S, Morishita S. Efficient frequency-based de novo short read clustering for error trimming in next-generation sequencing. Genome Research 19(7): 1309-1315 (2009)

Prof. Robert Grossman (University of Chicago)

	Title	The Emergence of Genomics as a Data Intensive Science
Abstract	Next generation sequencing, microarrays and other functional genomics technologies are changing the way that biological and biomedical research is carried out by providing genome-wide data on various cellular phenomena. However, archiving, managing, and analyzing the large datasets produced can be challenging using the current generation of database-based technologies. The new generation of sequencing platforms produces terabytes of data per run. In short, genomics, systems biology and related areas are becoming a data intensive science. In this talk, we discuss the emergence of biology as a data intensive science. We introduce some of the infrastructure being developed to support data intensive science, such as the Open Cloud Consortium's Open Science Data Cloud. We also describe the design and implementation of the Bionimbus system, which integrates cloud-based computing platforms with databases and provides a simplified framework for analyzing large biological datasets. Bionimbus (www.bionimbus.org) is a comprehensive bioinformatics system for the archiving, managing, analyzing, re-analyzing, and sharing genome-wide datasets.
CV	Robert Grossman is a faculty member at the University of Chicago, where he is the Director of Informatics at the Institute for Genomics and Systems Biology, a Senior Fellow at the Computation Institute, and a Professor of Medicine in the Section of Genetic Medicine. His research group focuses on bioinformatics, data mining, cloud computing, data intensive computing, and related areas. He is also the Founder and a Partner of Open Data Group, which provides strategic consulting and outsourced services in analytics. He is involved in several open source projects, including Bionimbus, a cloud computing platform for genomics Augustus, a python-based PMML-compliant analytics application The Sector/Sphere system for cloud computing. The UDT protocol for high performance data transport He is a Member of the Board of Directors of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). He was elected for the terms 2005-2009 and 2009-2011. He is the Chair of the Open Cloud Consortium, which hosts the Open Science Data Cloud and the Open Cloud Testbed. From 1998 to 2010, he was Chair of the Data Mining Group, which developed the Predictive Model Markup Language (PMML). He has written over 150 research publications. Additional information can be found at rgrossman.com

Title

The Emergence of Genomics as a Data Intensive Science

Abstract

Next generation sequencing, microarrays and other functional genomics technologies are changing the way that biological and biomedical research is carried out by providing genome-wide data on various cellular phenomena. However, archiving, managing, and analyzing the large datasets produced can be challenging using the current generation of database-based technologies. The new generation of sequencing platforms produces terabytes of data per run. In short, genomics, systems biology and related areas are becoming a data intensive science. In this talk, we discuss the emergence of biology as a data intensive science. We introduce some of the infrastructure being developed to support data intensive science, such as the Open Cloud Consortium's Open Science Data Cloud. We also describe the design and implementation of the Bionimbus system, which integrates cloud-based computing platforms with databases and provides a simplified framework for analyzing large biological datasets. Bionimbus (www.bionimbus.org) is a comprehensive bioinformatics system for the archiving, managing, analyzing, re-analyzing, and sharing genome-wide datasets.

Robert Grossman is a faculty member at the University of Chicago, where he is the Director of Informatics at the Institute for Genomics and Systems Biology, a Senior Fellow at the Computation Institute, and a Professor of Medicine in the Section of Genetic Medicine. His research group focuses on bioinformatics, data mining, cloud computing, data intensive computing, and related areas.

He is also the Founder and a Partner of Open Data Group, which provides strategic consulting and outsourced services in analytics.

He is involved in several open source projects, including

Bionimbus, a cloud computing platform for genomics
Augustus, a python-based PMML-compliant analytics application
The Sector/Sphere system for cloud computing.
The UDT protocol for high performance data transport

He is a Member of the Board of Directors of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). He was elected for the terms 2005-2009 and 2009-2011.

He is the Chair of the Open Cloud Consortium, which hosts the Open Science Data Cloud and the Open Cloud Testbed. From 1998 to 2010, he was Chair of the Data Mining Group, which developed the Predictive Model Markup Language (PMML).

He has written over 150 research publications.

Additional information can be found at rgrossman.com

Dr. Salman Habib (Los Alamos National Laboratory)

	Title	The Universe as a Data Engine: Modeling and Observations
	Abstract	The search for understanding the ultimate nature of the universe and our place within it is older than recorded history. But the emergence of a compelling, scientifically valid, picture of the universe and its evolution, dates to less than a century ago. Observations performed within the past two decades have set cosmology on a tantalizing course. They reveal a mysterious universe, remarkable -- paradoxically -- both for the extent to which it can be understood and the extent to which it cannot. The leap in our ability to carry out wide and deep observations of the sky rests on advances in solid-state technology and the ability to deal with large datasets, both static and real-time. The interpretation of the huge datasets from cosmological surveys will rely heavily on high-fidelity simulations of the observable universe, which in turn generate datasets of similar size. I will discuss the current status of computational cosmology and the directions in which it is headed with an emphasis on large dataset-related problems, and connections to observations.
	CV	Salman Habib is a technical staff member at Los Alamos National Laboratory, where he has been since 1991, following a postdoc at the University of British Columbia, a Ph.D. in physics from the University of Maryland, and an undergraduate degree from I.I.T. Delhi. Habib's research interests span a wide variety of topics, mostly concerned with dynamics of complex systems, both classical and quantum. He has worked on extending the reach of parallel supercomputing in new application directions such as beam physics, nonequilibrium quantum field theory, open quantum systems and quantum control, and stochastic partial differential equations. Habib's interests in computational cosmology focus on precision structure formation probes of the "Dark Universe" -- the dark energy and dark matter that dominate the mass-energy budget of the universe, but whose ultimate nature remains to be understood. Recently, Habib led the Roadrunner Universe project at Los Alamos, which resulted in the development of a hybrid petascale cosmology code for tracking the formation of structure in the universe.

Dr. Koki Iwao (AIST)

	Title	Architecture and Data in Global Earth Observations
Abstract	The Group on Earth Observations (GEO) was launched in response to calls for action by the 2002 World Summit on Sustainable Development and by the G8 to meet the urgent need for coordinated observations regarding the state of the Earth. GEO addresses not only concerns related to mitigation and adaptation to climate variability and change, but also reduction of loss of life and property from natural- and human-induced disasters; improvement in the management of energy and water resources; improvement in weather forecasting and warning; better management and protection of terrestrial, coastal and marine ecosystems; and conservation of biodiversity. GEO is coordinating efforts to build a Global Earth Observation System of Systems (GEOSS). Through implementation of GEOSS, a wide variety and large amount of globally-distributed Earth observation data (both satellite and in-situ) may be virtually shared or federated and converted into decision support information for stakeholders. Since 2005, the "GEO Grid" project primarily aims to provide an E-Science infrastructure for worldwide Earth Sciences community conducted by AIST, which may also be counted as part of the implementation activities of GEOSS. In this presentation, the current status of global Earth observation architecture and data under the GEO framework will be reviewed and then the activities of GEO Grid will be outlined as they relate to GEOSS implementation. An example of using Grid computing and satellite archives to develop global human settlement map will be introduced.
CV	Koki Iwao currently serves as a Scientific Officer for the GEO (Group on Earth Observations) Secretariat, seconded by Japanese government. He supports Architecture and Data as well as Weather-related intergovernmental activities related to global Earth observations. He is originally belongs to the GEO Grid Group, Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology AIST in Japan. His research interests include the development of global datasets based on large amounts of satellite data and quality assurances.

Title

Architecture and Data in Global Earth Observations

Abstract

The Group on Earth Observations (GEO) was launched in response to calls for action by the 2002 World Summit on Sustainable Development and by the G8 to meet the urgent need for coordinated observations regarding the state of the Earth. GEO addresses not only concerns related to mitigation and adaptation to climate variability and change, but also reduction of loss of life and property from natural- and human-induced disasters; improvement in the management of energy and water resources; improvement in weather forecasting and warning; better management and protection of terrestrial, coastal and marine ecosystems; and conservation of biodiversity. GEO is coordinating efforts to build a Global Earth Observation System of Systems (GEOSS). Through implementation of GEOSS, a wide variety and large amount of globally-distributed Earth observation data (both satellite and in-situ) may be virtually shared or federated and converted into decision support information for stakeholders.

Since 2005, the "GEO Grid" project primarily aims to provide an E-Science infrastructure for worldwide Earth Sciences community conducted by AIST, which may also be counted as part of the implementation activities of GEOSS.

In this presentation, the current status of global Earth observation architecture and data under the GEO framework will be reviewed and then the activities of GEO Grid will be outlined as they relate to GEOSS implementation. An example of using Grid computing and satellite archives to develop global human settlement map will be introduced.

Koki Iwao currently serves as a Scientific Officer for the GEO (Group on Earth Observations) Secretariat, seconded by Japanese government. He supports Architecture and Data as well as Weather-related intergovernmental activities related to global Earth observations. He is originally belongs to the GEO Grid Group, Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology AIST in Japan. His research interests include the development of global datasets based on large amounts of satellite data and quality assurances.

Prof. Tevfik Kosar (State University of New York at Buffalo)

	Title	Data-Aware Distributed Computing: Enabling Large-Scale Collaborative Science
	Abstract	Applications and experiments in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Some applications generate data volumes reaching hundreds of terabytes. Sharing, disseminating, and analyzing these large data sets becomes a big challenge, especially when distributed resources are used. The traditional distributed computing systems closely couple data handling and computation. They consider data resources as second class entities, and access to data as a side effect of computation. This makes the remote access and retrieval of data the main bottleneck in the end-to-end performance, reliability and automation of large-scale data-intensive and dynamic data-driven applications. The inadequacy of traditional distributed computing systems in dealing with complex data handling problem in our new data-rich world has motivated us to create a new computing paradigm called data-aware distributed computing. In my talk, I will present this new computing paradigm and examples of our work in this area so far.
	CV	Dr. Tevfik Kosar is an Associate Professor of Computer Science and Engineering at the State University of New York (SUNY) at Buffalo. He holds a B.S. degree in Computer Engineering from Bogazici University, Istanbul, Turkey and an M.S. degree in Computer Science from Rensselaer Polytechnic Institute, Troy, NY. Dr. Kosar has received his Ph.D. degree in Computer Science from University of Wisconsin-Madison. Dr. Kosar's main research interests lie in the cross-section of petascale distributed systems, eScience, Grids, Clouds, and collaborative computing with a focus on large-scale data-intensive distributed applications. He is the primary designer and developer of the Stork distributed data scheduling system which has been adopted by many national and international institutions, and the lead investigator of the state-wide PetaShare distributed storage network in Louisiana. He has published more than fifty academic papers in leading journals and conferences. Some of the awards received by Dr. Kosar include NSF CAREER Award, LSU Rainmaker Award, LSU Flagship Faculty Award, Baton Rouge Business Report's Top 40 Under 40 Award, 1012 Corridor's Young Scientist Award, College of Basic Science's Research Award, and CCT Faculty of the Year Award.

Prof. Sharma Chakravarthy (University of Texas at Arlington)

	Title	Obtaining Actionable Information without Drowning in Data: From a Technology Perspective
Abstract	Although we are drowning in data, it is quite difficult to find useful information. This has come about due to various technological advances which have tremendously increased our ability to generate, collect, and store very large amounts of data. This is true whether it is data on the web, personal data, or data collected by enterprises. In this talk, we first identify the causes that have helped us generate and accumulate large amounts of raw data. Then we overview the earlier approaches for managing very large amounts of data and obtaining actionable information. Finally, we explore potential current approaches for dealing with very large amounts of data that will allow us to filter/fuse/reduce it to obtain actionable knowledge. We present stream and complex event processing (CEP), mining, and information retrieval & ranking as examples of potential approaches that need to be synergistically mixed and matched to achieve the desired outcome. Other aspects, such as parallel processing and cloud computing will also be discussed briefly for dealing with very large amounts of data. This talk is based on presenter's research/projects over the last 25 years. A number of students and collaborators have participated in research/projects. Sharma Chakravarthy is a co-author of the book "Stream Data Processing: A Quality of Service Perspective", Springer-Verlag, April 2009 (ISBN: 978-0-387-71002-0).
CV	Sharma Chakravarthy is Professor of Computer Science and Engineering Department at The University of Texas at Arlington, Texas. He established the Information Technology Laboratory at UT Arlington in Jan 2000 and currently heads it. Sharma Chakravarthy has also established the NSF-funded, Distributed and Parallel Computing Cluster at UT Arlington in 2003. He is the recipient of the university-level “Creative Outstanding Researcher” award for 2003 and the department level senior outstanding researcher award in 2002. His book -- Stream Data Processing: A Quality of Service Perspective -- is published by Springer in 2009. He is well known for his work on stream and event processing, semantic query optimization, multiple query optimization, active databases (HiPAC project at CCA and Sentinel project at the University of Florida, Gainesville), and more recently web database ranking, graph mining, and identification of experts. His current research includes web technologies, stream data processing, complex event processing, mining & knowledge discovery, and information integration. He has published over 150 papers in refereed international journals and conference proceedings. He has given tutorial on a number of database topics, such as stream processing, graph mining, database mining, active, real-time, distributed, object-oriented, and heterogeneous databases in North America, Europe, and Asia. He is an associate editor of TKDE. He is listed in Who's Who Among South Asian Americans and Who's Who Among America's Teachers. Prior to joining UTA, he was with the University of Florida, Gainesville. Prior to that, he worked as a Computer Scientist at the Computer Corporation of America (CCA) and as a Member, Technical Staff at Xerox Advanced Information Technology, Cambridge, MA. Sharma Chakrvarthy received the B.E. degree in Electrical Engineering from the Indian Institute of Science, Bangalore and M.Tech from IIT Bombay, India. He worked at TIFR (Tata Institute of Fundamental Research), Bombay, India for a few years. He received M.S. and Ph.D degrees from the University of Maryland in College park in 1981 and 1985, respectively.

Title

Obtaining Actionable Information without Drowning in Data: From a Technology Perspective

Abstract

Although we are drowning in data, it is quite difficult to find useful information. This has come about due to various technological advances which have tremendously increased our ability to generate, collect, and store very large amounts of data. This is true whether it is data on the web, personal data, or data collected by enterprises.

In this talk, we first identify the causes that have helped us generate and accumulate large amounts of raw data. Then we overview the earlier approaches for managing very large amounts of data and obtaining actionable information. Finally, we explore potential current approaches for dealing with very large amounts of data that will allow us to filter/fuse/reduce it to obtain actionable knowledge. We present stream and complex event processing (CEP), mining, and information retrieval & ranking as examples of potential approaches that need to be synergistically mixed and matched to achieve the desired outcome. Other aspects, such as parallel processing and cloud computing will also be discussed briefly for dealing with very large amounts of data.

This talk is based on presenter's research/projects over the last 25 years. A number of students and collaborators have participated in research/projects.

Sharma Chakravarthy is a co-author of the book "Stream Data Processing: A Quality of Service Perspective", Springer-Verlag, April 2009 (ISBN: 978-0-387-71002-0).

Sharma Chakravarthy is Professor of Computer Science and Engineering Department at The University of Texas at Arlington, Texas. He established the Information Technology Laboratory at UT Arlington in Jan 2000 and currently heads it. Sharma Chakravarthy has also established the NSF-funded, Distributed and Parallel Computing Cluster at UT Arlington in 2003. He is the recipient of the university-level “Creative Outstanding Researcher” award for 2003 and the department level senior outstanding researcher award in 2002. His book -- Stream Data Processing: A Quality of Service Perspective -- is published by Springer in 2009.

He is well known for his work on stream and event processing, semantic query optimization, multiple query optimization, active databases (HiPAC project at CCA and Sentinel project at the University of Florida, Gainesville), and more recently web database ranking, graph mining, and identification of experts.

His current research includes web technologies, stream data processing, complex event processing, mining & knowledge discovery, and information integration. He has published over 150 papers in refereed international journals and conference proceedings. He has given tutorial on a number of database topics, such as stream processing, graph mining, database mining, active, real-time, distributed, object-oriented, and heterogeneous databases in North America, Europe, and Asia. He is an associate editor of TKDE. He is listed in Who's Who Among South Asian Americans and Who's Who Among America's Teachers.

Prior to joining UTA, he was with the University of Florida, Gainesville. Prior to that, he worked as a Computer Scientist at the Computer Corporation of America (CCA) and as a Member, Technical Staff at Xerox Advanced Information Technology, Cambridge, MA.

Sharma Chakrvarthy received the B.E. degree in Electrical Engineering from the Indian Institute of Science, Bangalore and M.Tech from IIT Bombay, India. He worked at TIFR (Tata Institute of Fundamental Research), Bombay, India for a few years. He received M.S. and Ph.D degrees from the University of Maryland in College park in 1981 and 1985, respectively.

Prof. Hideyuki Kawashima (University of Tsukuba)

	Title	Data Stream Processing for Real-Time Analysis
	Abstract	Stream information sources have been increasing. To manage queries over data streams, data stream processing engines have been studied. In this talk, we introduce our work with stream processing. The work includes distributed stream processing and efficient stream data archiving.
	CV	Hideyuki Kawashima received Ph.D from Keio University at 2004. From 2004 to 2007, he was an assistant professor in the department of information and computer science, Keio University. From 2007 to now, he has been an assistant professor both in the graduate school of systems and information engineering and center for computational sciences, University of Tsukuba. His research interest lies in real time data processing.

disws-1@ccs.tsukuba.ac.jp

Last modified: Thu Mar 3 08:06:41 JST 2011