Invited speakers

Rob Simmonds

Bio
Rob Simmonds is a Professor in the Department of Computer Science at the University of Cape Town. He is also an Associate Director of the Inter-University Institute for Data Intensive Astronomy in charge of Technology Initiatives and part of the management team of the SKA Science Data Processor design consortium, leading the data delivery design. He holds a PhD in Mathematical Sciences from the University of Bath in the UK. He previously worked at the University of Calgary in Canada, managed the Grid Research Centre in Calgary, in addition to being the Chief Technology Officer for WestGrid, which is part of the Compute Canada organization.

Title of talk
The IDIA Data Intensive Cloud

Abstract
The Inter-University Institute for Data Intensive Astronomy has established a cloud computing system that utilizes the OpenStack Infrastructure as a Service framework. It uses virtual machines provide isolation between environments and Singularity containers to load application codes and provide a means of securely accessing shared POSIX file-system volumes. CEPH storage is used to provide the block and object storage found in other cloud environments, while BeeGFS is used to provide large high performance POSIX file-system volumes. This talk describes the cluster and how it is being used to support the development MeerKAT processing pipelines.

Edwin Valentijn

Bio
Prof. Edwin A. Valentijn is a Professor in Astronomical Information Technology at the Kapteyn Astronomical Institute of the University of Groningen, The Netherlands. He leads the OmegaCEN datacenter, which develops and operates large astronomical surveys, such as OmegaCEN@VST and MUSE@VLT. Valentijn also leads the Target project, a multi-disciplinary public-private collaboration on Big Data projects. Valentijn’s present research focuses on extreme data lineage in information systems and dark matter studies involving the entanglement of entropy.

Title of talk
Entanglement in Information Systems

Abstract
Researchers collect, federate, distribute and link Big Data in their computers and networks for data intensive astronomy projects, but also for life sciences and many other domains. The Universe is also full of data. Are there fundamental connections? Questions which are addressed in The Information Universe. I will discuss current topics of The Information Universe such as entropy, entanglement and data federations built by metadata and joins. The Universe as a spreadsheet.

Peppe Longo

Bio
Giuseppe Longo is Professor of Astrophysics at the University of Napoli Federico II, Associate to the California Institute of Technology (Caltech) and to the Italian National Institute of Astrophysics and of Nuclear Physics. He is among the early pioneers of astroinformatics and has co-authored more than 200 scientific papers and many books and proceedings. He has been chairperson of the Interest Group on Knowledge Discovery in Databases of the International Virtual Observatory Alliance, co-proposer of the International Astronomical Union Working Group on Astrostatistics and Astroinformatics and of the IEEE task force “astrominers”. In collaboration with S.G. Djorgovski at Caltech, he established and co-chairs the annual international workshops on Astroinformatics. He also leads the machine learning efforts on photometric redshifts within the Euclid collaboration and in several other survey projects.

Title of talk
Machine Learning in Astronomy: Status and Perspectives

Abstract
In the last few years, due to the huge amount of data produced by modern multiband, multi-epoch surveys, machine learning methods have found wide application in many different areas of astrophysics and cosmology. Star/galaxy separation, morphological and physical classification of galaxies, globular cluster detection in external galaxies, photometric redshifts and strong gravitational lensing are only a few among the many topics addressed with machine learning methods. In spite of the huge variety of topics, however, ML methods still pose problems still far from being solved: from the evaluation of the optimal sets of features to be used for a specific problem, to the accurate evaluation of errors. In the talk, the template case of photometric redshifts evaluation will be used to illustrate some of the main problems and to analyse some solutions which have been proposed.

Michael Biehl

Bio
Michael Biehl is Associate Professor with tenure at the Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, The Netherlands. He received a PhD in Physics from the University of Gießen, Germany, in 1992 and the Habilitation in Theoretical Physics at the University of Würzburg, Germany, in 1996. His main research interests in computational intelligence lie in the theory of machine learning processes, algorithm development and interdisciplinary applications. He has co-authored 88 referreed journal publications and 96 conference contributions and book chapters.

Title of talk
Prototype-ased Machine Learning: Unsupervised and Supervised Data Analysis

Abstract
An overview is given of prototype-based systems in machine learning. Basic concepts and algorithms will be presented and discussed in terms of illustrative application examples. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of (dis-)similarity, such systems can be employed in the context of unsupervised and supervised analysis of potentially high dimensional, complex data sets. We discuss unsupervised schemes such as Competitive Vector Quantization (VQ) and Kohonen’s topology-preserving Self Organizing Map (SOM). Supervised learning in prototype systems is exemplified in terms of Learning Vector Quantization (LVQ). Most frequently, the familiar Euclidean distance serves as a dis-similarity measure. We present extensions of the framework to non-standard measures and give an introduction to the use of adaptive distances in so-called relevance learning schemes.

Kai Polsterer

Bio
Kai L. Polsterer received his Diploma in Computer Science at the Technical University of Dortmund, before he switched to physics and astronomy at the University of Bochum where he received his PhD in Physics and Astronomy. During that time he was responsible for realizing the control software for one of the main instruments (LUCI) at the worlds largest optical telescope, the Large Binocular Telescope. Besides working on control software, he started developing and applying machine learning techniques to analyse complex and large data sets. He is involved in the efforts of the International Virtual Observatory Alliance, ensuring uniform access to astronomical data and the IEEE taskforce on mining complex astronomical data. Since 2013 he is head of the Astroinformatics group at the Heidelberg Institute for Theoretical Studies.

Title of talk
Reproducibility in Era of Data Driven Science

Abstract
Reproducibility of scientific research results is of tremendous importance, to enable other researchers to validate, to check and to build on published results. In data-driven research this requirement is more than publishing research results as a plain paper. We have to start sharing and publishing code as well as referencing the software packages that had been utilized. Data sets used to train and/or derive models have to be published alongside with the code. The provenance of the data is as important as providing uncertainties. The use of proper scores to evaluate the performances and the publication of reference data sets have to become standard in astronomy. When using deep learning schemes the derived weights, biases and hyper-parameters have to be published, too. This talk will focus on some of these important aspects.

Jeffery Kern

Bio
Dr Jeffrey Kern received a PhD in Astrophysics from the New Mexico Institute of Mining and Technology in 2004. After graduating he joined the ALMA construction project at the National Radio Astronomy Observatory (NRAO) as a software engineer working on the Monitor and Control System. Throughout the construction and early commissioning period of ALMA, Kern worked within the North American ALMA Computing Team, leading the Monitor and Control subsystem and serving as the deputy North American ICT lead. From 2010 to 2017, he led the international Common Astronomical Software Applications (CASA) development team. Recently Kern has left the CASA project to serve as Project Director for the NRAO’s Science Ready Data Products project.

Title of talk
Surveying the Radio Sky: Challenges and Opportunities

Abstract
The latest generation of radio interferometers are beginning to unlock new frontiers in survey science. Execution of these surveys, from the acquisition of the data to the scientific understanding of the rich end products, challenges the traditional practices of radio astronomy and pushes us to a new automated paradigm. I discuss the challenges posed by these surveys and how the NRAO is addressing it in the VLA Sky Survey. Finally, I will discuss the challenges of analyzing these surveys, focusing on the unique challenges of radio data and opportunities for the techniques of astroinformatics to extract information hidden in these vast data sets.

Erik Rosolowsky

Bio
Erik Rosolowsky is an Associate Professor of Physics and Astronomy at the University of Alberta. Prior to coming to Edmonton, he completed his PhD at Berkeley and postdoctoral work at Harvard, where he was a Postdoctoral Fellow of the US National Science Foundation. His first appointment in Canada was at UBC’s Okanagan campus in Kelowna, where he worked for five years before joining the University of Alberta faculty. Rosolowsky’s research focuses on the connections between generations of stars, primarily using observations from radio and optical telescopes. Working with students and several international collaborations, he has made the definitive studies of how molecular gas forms stars in nearby galaxies.

Title of talk
Visual Analytics Challenges for Astronomically Big Data

Abstract
Like many other fields, astronomy faces an exponentially increasing data challenge. Hidden in this data flow are the insights that will drive the next generation of discoveries. Historically, this insight is gathered through data visualization guided by physics but, in facing our current data challenge, we need new visualization strategies to find these discoveries. In this talk, I will highlight a few representative data sets – the sparse images from the VLA Sky Survey, feature rich spectral-line data cubes, and multi-epoch imaging – and summarize the challenges these present. To meet these challenges, I will highlight the opportunities for using the tools of machine vision and machine learning developed in other domains to shape insightful visualizations of next-generation astronomical data.

Nicola Mulder

Bio
Prof. Nicola Mulder heads the Computational Biology Division at the University of Cape Town (UCT), and leads H3ABioNet, a large Pan-African Bioinformatics Network of over 30 institutions in 14 African countries. H3ABioNet aims to develop bioinformatics capacity to enable genomic data analysis on the continent by developing and providing access to and skills and computing infrastructure for data analysis. Prior to her position at UCT, Prof. Mulder worked at the European Bioinformatics Institute in Cambridge leading a bioinformatics service project. At UCT, her group’s research applies bioinformatics technologies to the study of infectious and human genetic diseases. She also does bioinformatics training and her group provides bioinformatics services for the local researchers, through which they develop new algorithms, visualization tools and analysis pipelines for high-throughput biology.

Title of talk
Big Data Challenges in Life Sciences

Abstract
Bioinformatics, or computational biology, is the application of computing to the analysis and interpretation of biological data. With the emergence of new laboratory technologies for high throughput data generation, the amount of data life scientists have to manage is increasing in both size and complexity. High throughput biological data can be noisy and heterogeneous, and algorithms for data analysis require constant improvement and customization as technologies evolve. As the data generation technologies are becoming cheaper and more accessible, African scientists are increasingly generating bigger and more complex data sets, most notably from next generation sequencing. This has led to significant challenges in all aspects of data management, including transfer, storage, analysis and interpretation. It has also led to a serious need for training in Big Data analysis in the biomedical field. One of the initiatives which aims to address these challenges is H3ABioNet, a Pan-African bioinformatics network established to build the capacity for large scale genomics research on the continent. This talk will outline some of the major bioinformatics research in the biomedical sciences in Africa, including challenges in Big Data in the biomedical sciences and some of the solutions implemented through H3ABioNet.

Dalya Baron

Bio
I am a PhD student at Tel Aviv University, under the supervision of Prof. Hagai Netzer. I am interested in galaxy evolution, AGN and the role of AGN-driven winds in regulating star formation and gas supply in galaxies. I am also interested in machine learning and the usage of various statistical tools for analysis of large astronomical data sets.

Title of talk
Searching for Unknown Structures and Objects in Large Spectroscopic Data Sets

Abstract
How can we discover in large astronomical surveys objects we did not know existed? I will describe the challenge in finding unknown unknowns, which can be systems that we did not know we should be looking for, or structures in the data set we are not aware of. I will present our recent works, where we develop and test unsupervised algorithms to find such novelties in large data sets. I will show their success in finding rare and interesting objects and detecting high-dimensional structure in complex data sets.

Matthew Graham

Bio
Matthew J. Graham is a Research Professor of Astronomy at the California Institute of Technology and the Project Scientist for the Zwicky Transient Facility, the first of the next generation of LSST-scale sky surveys. His research interests are the application of advanced analysis methodologies from computer science and statistics to astronomical time series, particularly to understand quasar variability. He is also Chair of the Technical Coordination Group of the International Virtual Observatory Alliance.

Title of talk
Beyond the Virtual Observatory: New Emerging Global Data Environments for Astronomy

Abstract
Data-intensive science in the late 1990s was dominated by a single computational concern: how to deal with the LHC. The grid computing paradigm that emerged as a solution was highly influential outside of the high energy physics community and inspired the Virtual Observatory (VO) concept in astronomy. However, the challenges faced by data-intensive astronomy in the late 2010s, particularly from the burgeoning fields of time domain astronomy and astroinformatics, are different and new methodologies are arising to deal with them. In this talk, I will review the new landscape of science platforms, multinational data projects, and smart networks that is emerging to deal with the LSST (and SKA) and what role the VO might play in it.

Ashish Mahabal

Bio
Ashish Mahabal is a Computational Data Scientist with the Center for Data Driven Discovery, Caltech. He holds a PhD in Astronomy. He works on Big Data projects in Astronomy (CRTS, ZTF, LSST), BioSciences (EDRN, MCL), EarthScience (EarthCube). He has been involved in all aspects related to data: procurement, processing, databases, visualization, scientific analysis, distribution, and archives. His interest lies in combining diverse datasets and presenting them to maximize scientific output using workflows that employ Statistical and Mathematical techniques and Machine Learning, and where possible, Citizen Science. Ongoing work includes machine learning and transient classification for ZTF and methodology transfer from Astronomy and Space Science to Health Care and Earth Science.

Title of talk
From Classifying Transients to Personalizing Medicine

Abstract
Astronomers have been classifying objects into various families of objects. At the broadest level these are obvious: asteroids, planets, stars, quasars, galaxies etc. Each then has subclasses like SN Ia, SN Ib, SN IIp plus several other SN subclasses. Even within a subclass there is enough variation based on physical processes dependent on environments that researchers can point to differences between any two events. Studies at that level of detail have become possible due to more diverse observations (e.g. wavebands), frequent observations, and early observations (e.g. pre-peak). In addition to that, techniques allowing us to combine the diverse observations more meaningfully is the key. There are many parallels with the relatively nascent field of personalized medicine. We are now beginning to understand how the same agents (chemical or biological) can evoke different responses in different individuals (equivalent to different environments in terms of their biome and genes). At a finer level, just like the study of an individual transient benefits from specialized treatment, so would individual patients. In case of astronomy, better meta-data have helped create a better big-picture from which to drill down to individual cases. We discuss the possible advantages of such an approach to personalizing medicine.