Wednesday, July 22, 2015

Ian Czekala and Starfish

This blog post is a collaborative effort with fellow Banneker student, Justin Myles, who demonstrates brilliance in all aspects of his life, even though he goes to Yale.


Last week, our advisor John Johnson assigned us the task of finding a graduate student at the Center for Astrophysics and talking with them about their research.  His reason for giving us this assignment was many-fold.
  1. It provides us with a (mandatory) opportunity to get to know one of the grad students in the department.
  2. It gives us a chance to learn about a topic of research that might not necessarily be super close to our own.
  3. The grad student we choose to “interview” gets free publicity.
It’s a win-win-win.  


That night, Justin and I both went home and scrolled through the list of CfA graduate students.  We both found Ian Czekala, and didn’t realize our overlapping intentions until the next morning, when we decided to do this project together.  (Who said Yalies and Harvard students couldn’t work together, huh?)


Ian, who had his first research experience as a summer student in primarily studies young stars and circumstellar disks. One of his recent achievements is developing a package called Starfish which fits an entire spectrum. This is a novel approach to spectroscopy, which is often limited to a narrow range of wavelengths (despite a wide range of data being collected) and a small number of species (e.g. Fe and Na). Starfish is written in Python, available on Github, and utilizes statistical methods -- all subjects which we’ve been learning about in class in the mornings. So we were both interested in learning more about Starfish.


Where other spectral fitting packages focus on fitting the spectral line itself, Starfish focuses on minimizing the residuals (or the difference) between the observed spectrum and the model while accounting for the covariance introduced by systematic discrepancies in the models.
Screen Shot 2015-07-20 at 5.05.20 PM.png
In the above plot, the synthetic spectrum is shown in red and the data are in blue. The residuals are in black. Zooming in to the gray region, we can see a region in the residuals in which the noise is clearly not simply Poisson noise:

Screen Shot 2015-07-20 at 5.08.48 PM.png
An autocorrelation reveals that there is significant correlation on scales roughly the size of a spectral line:

Screen Shot 2015-07-20 at 5.14.08 PM.png


In each row of this plot, the left plot is the covariance matrix, which illustrates the covariance of adjacent pixels and the right plot shows the residuals of the synthetic spectrum fit to the data. It is computationally expensive to interpolate spectra, which is why the following method is useful: by identifying a region of relatively large residuals, and scaling the covariance values to be larger, the residuals decrease. This is shown by the progression from the first to the third row.


In particular, by adding first a global, then a local kernel to the covariance matrix, random draws influenced by the covariance matrix accurately predict the residual noise and in this way model the residual noise.


At this point, you might be wondering how Starfish could be used by the wider astronomical community.  Well, stellar astronomers aren’t the only people who deal with spectra.  Spectroscopy is a tool used in every sub-field of astronomy.  


Let’s say, for example, that you are an astronomer who studies the formation and evolution of galaxies.  (Though there are definitely some people who claim that studying galaxies is nothing more than studying large groups of stars at once.)  You’re working with several spectral lines from a single galaxy, trying to use them to determine the galaxy’s physical characteristics. How do you do that?


The short answer is: make a bunch of model galaxies and compare the line fluxes from those models to the actual line fluxes you observed.


That takes So. Much. Time. Modeling a galaxy is hard work for a computer.  Modeling a few thousand slightly different galaxies? Starfish could take off some of the strain by first identifying the model spectra that best match the observed lines. Using a flexible likelihood function like that advocated by Czekala et al. '14 would deliver realistic parameter estimates and uncertainties, while also potentially identifying any particular lines that are treated incorrectly by the models.

No comments:

Post a Comment