Pages

Showing posts with label large. Show all posts
Showing posts with label large. Show all posts

Sunday, November 6, 2016

Advances in Variational Inference Working Towards Large scale Probabilistic Machine Learning at NIPS 2014



At Google, we continually explore and develop large-scale machine learning systems to improve our user’s experience, such as providing better video recommendations, deciding on the best language translation in a given context, or improving the accuracy of image search results. The data used to train these systems often contains many inconsistencies and missing elements, making progress towards large-scale probabilistic models designed to address these problems an important and ongoing part of our research. One principled and efficient approach for developing such models relies on an approach known as Variational Inference.

A renewed interest and several recent advances in variational inference1,2,3,4,5,6 has motivated us to support and co-organise this year’s workshop on Advances in Variational Inference as part of the Neural Information Processing Systems (NIPS) conference in Montreal. These advances include new methods for scalability using stochastic gradient methods, the ability to handle data that arrives continuously as a stream, inference in non-linear time-series models, principled regularisation in deep neural networks, and inference-based decision making in reinforcement learning, amongst others.

Whilst variational methods have clearly emerged as a leading approach for tractable, large-scale probabilistic inference, there remain important trade-offs in speed, accuracy, simplicity and applicability between variational and other approximative schemes. The goal of the workshop will be to contextualise these developments and address some of the many unanswered questions through:

  • Contributed talks from 6 speakers who are leading the resurgence of variational inference, and shaping the debate on topics of stochastic optimisation, deep learning, Bayesian non-parametrics, and theory.
  • 34 contributed papers covering significant advances in methodology, theory and applications including efficient optimisation, streaming data analysis, submodularity, non-parametric modelling and message passing.
  • A panel discussion with leading researchers in the field that will further interrogate these ideas. Our panelists are David Blei, Neil Lawrence, Shinichi Nakajima and Matthias Seeger.

The workshop presents a fantastic opportunity to discuss the opportunities and obstacles facing the wider adoption of variational methods. The workshop will be held on the 13th December 2014 at the Montreal Convention and Exhibition Centre. For more details see: www.variationalinference.org.

References:

1. Rezende, Danilo J., Shakir Mohamed, and Daan Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.

2. Gregor, Karol, Ivo Danihelka, Andriy Mnih, Charles Blundell and Daan Wierstra, Deep AutoRegressive Networks, Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.

3. Mnih, Andriy, and Karol Gregor, Neural Variational Inference and Learning in Belief Networks, Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.

4. Kingma, D. P. and Welling, M., Auto-Encoding Variational Bayes, Proceedings of the International Conference on Learning Representations (ICLR), 2014.

5. Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M., Streaming Variational Bayes, Advances in Neural Information Processing Systems (pp. 1727-1735), 2013.

6. Hoffman, M., Blei, D. M., Wang, C., and Paisley, J., Stochastic Variational Inference, Journal of Machine Learning Research, 14:1303–1347, 2013.
    Read More..

    Sunday, October 23, 2016

    Large Scale Machine Learning for Drug Discovery



    Discovering new treatments for human diseases is an immensely complicated challenge; Even after extensive research to develop a biological understanding of a disease, an effective therapeutic that can improve the quality of life must still be found. This process often takes years of research, requiring the creation and testing of millions of drug-like compounds in an effort to find a just a few viable drug treatment candidates. These high-throughput screens are often automated in sophisticated labs and are expensive to perform.

    Recently, deep learning with neural networks has been applied in virtual drug screening1,2,3, which attempts to replace or augment the high-throughput screening process with the use of computational methods in order to improve its speed and success rate.4 Traditionally, virtual drug screening has used only the experimental data from the particular disease being studied. However, as the volume of experimental drug screening data across many diseases continues to grow, several research groups have demonstrated that data from multiple diseases can be leveraged with multitask neural networks to improve the virtual screening effectiveness.

    In collaboration with the Pande Lab at Stanford University, we’ve released a paper titled "Massively Multitask Networks for Drug Discovery", investigating how data from a variety of sources can be used to improve the accuracy of determining which chemical compounds would be effective drug treatments for a variety of diseases. In particular, we carefully quantified how the amount and diversity of screening data from a variety of diseases with very different biological processes can be used to improve the virtual drug screening predictions.

    Using our large-scale neural network training system, we trained at a scale 18x larger than previous work with a total of 37.8M data points across more than 200 distinct biological processes. Because of our large scale, we were able to carefully probe the sensitivity of these models to a variety of changes in model structure and input data. In the paper, we examine not just the performance of the model but why it performs well and what we can expect for similar models in the future. The data in the paper represents more than 50M total CPU hours.
    This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

    One encouraging conclusion from this work is that our models are able to utilize data from many different experiments to increase prediction accuracy across many diseases. To our knowledge, this is the first time the effect of adding additional data has been quantified in this domain, and our results suggest that even more data could improve performance even further.

    Machine learning at scale has significant potential to accelerate drug discovery and improve human health. We look forward to continued improvement in virtual drug screening and its increasing impact in the discovery process for future drugs.

    Thank you to our other collaborators David Konerding (Google), Steven Kearnes (Stanford), and Vijay Pande (Stanford).

    References:

    1. Thomas Unterthiner, Andreas Mayr, Günter Klambauer, Marvin Steijaert, Jörg Kurt Wegner, Hugo Ceulemans, Sepp Hochreiter. Deep Learning as an Opportunity in Virtual Screening. Deep Learning and Representation Learning Workshop: NIPS 2014

    2. Dahl, George E, Jaitly, Navdeep, and Salakhutdinov, Ruslan. Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:1406.1231, 2014.

    3. Ma, Junshui, Sheridan, Robert P, Liaw, Andy, Dahl, George, and Svetnik, Vladimir. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 2015.

    4. Peter Ripphausen, Britta Nisius, Lisa Peltason, and Jürgen Bajorath. Quo Vadis, Virtual Screening? A Comprehensive Survey of Prospective Applications. Journal of Medicinal Chemistry 2010 53 (24), 8461-8467
    Read More..

    Saturday, September 17, 2016

    Computational Complexity The Burden of Large CS Enrollments


    The Computational Complexity blog recently posted a piece on The Burden of Large Enrollments in CS departments. It seems that this trend is global but interestingly they point out that if you average out the growth over the booms and bust in CS enrolment since they 1970s, the growth rate is a steady 10%. 
    Thanks to my colleague Mark for bringing this to my attention.




    from The Universal Machine http://universal-machine.blogspot.com/

    IFTTT

    Put the internet to work for you.

    Turn off or edit this Recipe

    Read More..