Pages

Showing posts with label scale. Show all posts
Showing posts with label scale. Show all posts

Sunday, November 6, 2016

Advances in Variational Inference Working Towards Large scale Probabilistic Machine Learning at NIPS 2014



At Google, we continually explore and develop large-scale machine learning systems to improve our user’s experience, such as providing better video recommendations, deciding on the best language translation in a given context, or improving the accuracy of image search results. The data used to train these systems often contains many inconsistencies and missing elements, making progress towards large-scale probabilistic models designed to address these problems an important and ongoing part of our research. One principled and efficient approach for developing such models relies on an approach known as Variational Inference.

A renewed interest and several recent advances in variational inference1,2,3,4,5,6 has motivated us to support and co-organise this year’s workshop on Advances in Variational Inference as part of the Neural Information Processing Systems (NIPS) conference in Montreal. These advances include new methods for scalability using stochastic gradient methods, the ability to handle data that arrives continuously as a stream, inference in non-linear time-series models, principled regularisation in deep neural networks, and inference-based decision making in reinforcement learning, amongst others.

Whilst variational methods have clearly emerged as a leading approach for tractable, large-scale probabilistic inference, there remain important trade-offs in speed, accuracy, simplicity and applicability between variational and other approximative schemes. The goal of the workshop will be to contextualise these developments and address some of the many unanswered questions through:

  • Contributed talks from 6 speakers who are leading the resurgence of variational inference, and shaping the debate on topics of stochastic optimisation, deep learning, Bayesian non-parametrics, and theory.
  • 34 contributed papers covering significant advances in methodology, theory and applications including efficient optimisation, streaming data analysis, submodularity, non-parametric modelling and message passing.
  • A panel discussion with leading researchers in the field that will further interrogate these ideas. Our panelists are David Blei, Neil Lawrence, Shinichi Nakajima and Matthias Seeger.

The workshop presents a fantastic opportunity to discuss the opportunities and obstacles facing the wider adoption of variational methods. The workshop will be held on the 13th December 2014 at the Montreal Convention and Exhibition Centre. For more details see: www.variationalinference.org.

References:

1. Rezende, Danilo J., Shakir Mohamed, and Daan Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.

2. Gregor, Karol, Ivo Danihelka, Andriy Mnih, Charles Blundell and Daan Wierstra, Deep AutoRegressive Networks, Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.

3. Mnih, Andriy, and Karol Gregor, Neural Variational Inference and Learning in Belief Networks, Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.

4. Kingma, D. P. and Welling, M., Auto-Encoding Variational Bayes, Proceedings of the International Conference on Learning Representations (ICLR), 2014.

5. Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M., Streaming Variational Bayes, Advances in Neural Information Processing Systems (pp. 1727-1735), 2013.

6. Hoffman, M., Blei, D. M., Wang, C., and Paisley, J., Stochastic Variational Inference, Journal of Machine Learning Research, 14:1303–1347, 2013.
    Read More..

    Sunday, October 23, 2016

    Large Scale Machine Learning for Drug Discovery



    Discovering new treatments for human diseases is an immensely complicated challenge; Even after extensive research to develop a biological understanding of a disease, an effective therapeutic that can improve the quality of life must still be found. This process often takes years of research, requiring the creation and testing of millions of drug-like compounds in an effort to find a just a few viable drug treatment candidates. These high-throughput screens are often automated in sophisticated labs and are expensive to perform.

    Recently, deep learning with neural networks has been applied in virtual drug screening1,2,3, which attempts to replace or augment the high-throughput screening process with the use of computational methods in order to improve its speed and success rate.4 Traditionally, virtual drug screening has used only the experimental data from the particular disease being studied. However, as the volume of experimental drug screening data across many diseases continues to grow, several research groups have demonstrated that data from multiple diseases can be leveraged with multitask neural networks to improve the virtual screening effectiveness.

    In collaboration with the Pande Lab at Stanford University, we’ve released a paper titled "Massively Multitask Networks for Drug Discovery", investigating how data from a variety of sources can be used to improve the accuracy of determining which chemical compounds would be effective drug treatments for a variety of diseases. In particular, we carefully quantified how the amount and diversity of screening data from a variety of diseases with very different biological processes can be used to improve the virtual drug screening predictions.

    Using our large-scale neural network training system, we trained at a scale 18x larger than previous work with a total of 37.8M data points across more than 200 distinct biological processes. Because of our large scale, we were able to carefully probe the sensitivity of these models to a variety of changes in model structure and input data. In the paper, we examine not just the performance of the model but why it performs well and what we can expect for similar models in the future. The data in the paper represents more than 50M total CPU hours.
    This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

    One encouraging conclusion from this work is that our models are able to utilize data from many different experiments to increase prediction accuracy across many diseases. To our knowledge, this is the first time the effect of adding additional data has been quantified in this domain, and our results suggest that even more data could improve performance even further.

    Machine learning at scale has significant potential to accelerate drug discovery and improve human health. We look forward to continued improvement in virtual drug screening and its increasing impact in the discovery process for future drugs.

    Thank you to our other collaborators David Konerding (Google), Steven Kearnes (Stanford), and Vijay Pande (Stanford).

    References:

    1. Thomas Unterthiner, Andreas Mayr, Günter Klambauer, Marvin Steijaert, Jörg Kurt Wegner, Hugo Ceulemans, Sepp Hochreiter. Deep Learning as an Opportunity in Virtual Screening. Deep Learning and Representation Learning Workshop: NIPS 2014

    2. Dahl, George E, Jaitly, Navdeep, and Salakhutdinov, Ruslan. Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:1406.1231, 2014.

    3. Ma, Junshui, Sheridan, Robert P, Liaw, Andy, Dahl, George, and Svetnik, Vladimir. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 2015.

    4. Peter Ripphausen, Britta Nisius, Lisa Peltason, and Jürgen Bajorath. Quo Vadis, Virtual Screening? A Comprehensive Survey of Prospective Applications. Journal of Medicinal Chemistry 2010 53 (24), 8461-8467
    Read More..

    Sunday, March 27, 2016

    High Quality Object Detection at Scale



    Update - 26/02/2015
    We recently discovered a bug in the evaluation methodology of our object detector. Consequently, the large numbers we initially reported below are not realistic, due to the fact that our separately trained context extractor was contaminated with half of the validation set images. Therefore, our initial results were overly optimistic and were not attainable by the methodology described in the paper. Re-evaluating our initial results, we have restricted ourselves to reporting only the single-model results on the other half of the dedicated validation set without retraining the models. With the updated evaluation, we are still able to report the best single-model result on the ILSVRC 2014 detection challenge data set, with 0.43 mAP when combining both Selective Search and MultiBox proposals with our post-classification model. The original draft of our paper "Scalable, High Quality Object Detection" has been updated to reflect this information. We are deeply sorry if our initial reported results caused any confusion in the community. Original post follows below. 
    -C. Szegedy, S. Reed, D. Erhan, and D. Anguelov

    The ILSVRC detection challenge is an influential academic benchmark for measuring the quality of object detection. This summer, the GoogLeNet team reported top results in the 2014 edition of the challenge, with ~2X improvement over the previous year’s best results. However, the quality of our results came at a high computational cost: processing each image took about two minutes on a state-of-the-art workstation.

    Naturally, we began to think of how we could both improve the accuracy and reduce the computation time needed. Given the already high quality of previous results like those of GoogLeNet[6], we expected that further improvements to detection quality would be increasingly hard to achieve. In our recent paper Scalable, High Quality Object Detection[7], we detail advances that instead have resulted in an accelerated rate of progress in object detection:
    Evolution of detection quality over time. On the y axis is the mean average precision of the best published results at any given time. The blue line shows result using individual models, the red line is multi-model ensembles. Overfeat[8] was the state-of-the-art at end of last year, followed by R-CNN[1] published in May. The later measurement points are the results of our team.[6,7]
    As seen in the plot above, the mean average precision has been improved since August from 0.45 to 0.56: a 23% relative gain. The new approach can also match the quality of the former best solution with 140X reduced computational resources.

    Most current approaches for object detection employ two phases[1]: in the first phase, some hand-engineered algorithm proposes regions of interest in the image. In the second phase, each proposed region is run through a deep neural network, identifying which proposed patches correspond to an object (and what that object is).

    For the first phase, the common wisdom[1,2,3,4] was that it took skillfully crafted code to produce high quality region proposals. This has come with a drawback though: these methods don’t produce reliable scoring for the proposed regions. This forces the second phase to evaluate most of the proposed patches in order to achieve good results.

    So we revisited our prior “MultiBox” work[5], in which we let the computer learn to pick the proposals to see whether we could avoid relying on any of the hand-crafted methods above. Although the MultiBox method, using previous generation vision network architectures, could not compete with hand-engineered proposal approaches, there were several advantages of fully relying on machine learning only. First, the quality of proposals increases with each new improved network architecture or training methodology without additional programming effort. Second, the regions come with confidence scores which are used for trading off running time versus quality. Additionally, the implementation is simplified.

    Once we used new variants of the network architecture introduced in [6], MultiBox also started to perform much better; Now, we could match the coverage of alternative methods with half as many proposal patches. Also, we changed our networks to take the context of objects into account, fueling additional quality gains for the second phase. Furthermore, we came up with a new way to train deep networks to learn more robustly even when some objects are not annotated in the training set, which improved both phases.

    Besides the significant gains in mean average precision, we can now cut the number of evaluated patches dramatically at a modest loss of quality: the task that used to take 2 minutes of processing time for a single image on a workstation by the GoogLeNet ensemble (of 6 networks), is now performed under a second using a single network without using GPUs. If we constrain ourselves to a single category like “dog”, we can now process 50 images/second on the same machine by a more streamlined approach[7] that skips the proposal generation step altogether.

    As a core area of research in computer vision, object detection is used for providing strong signals for photo and video search, while high quality detection could prove useful for self-driving cars and automatically generated image captions. We look forward to the continuing research in this field.

    References:

    [1]  Rich feature hierarchies for accurate object detection and semantic segmentation
    by Ross Girshick and Jeff Donahue and Trevor Darrell and Jitendra Malik (CVPR, 2014)

    [2]  Prime Object Proposals with Randomized Prim’s Algorithm
    by Santiago Manen, Matthieu Guillaumin and Luc Van Gool

    [3]  Edge boxes: Locating object proposals from edges
    by Lawrence C Zitnick, and Piotr Dollàr (ECCV 2014)

    [4]  BING: Binarized normed gradients for objectness estimation at 300fps
    by Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin and Philip Torr (CVPR 2014)

    [5]  Scalable Object Detection using Deep Neural Networks
    by Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov

    [6]  Going deeper with convolutions
    by Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke and Andrew Rabinovich

    [7]  Scalable, high quality object detection
    by Christian Szegedy, Scott Reed, Dumitru Erhan and Dragomir Anguelov

    [8]  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Network by Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus and Yann LeCun


    * A PhD student at University of Michigan -- Ann Arbor and Software Engineering Intern at Google?
    Read More..