Roman H.
Bellevue, Washington, United States
790 followers
500+ connections
View mutual connections with Roman
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View mutual connections with Roman
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View Roman’s full profile
Other similar profiles
-
Arghavan Bahadori
San Francisco Bay AreaConnect -
Yutong P.
Cupertino, CAConnect -
Muhammad Ahmed Riaz
San Francisco Bay AreaConnect -
Poorva Rane
San Jose, CAConnect -
Koundinya Nidumolu
San Francisco Bay AreaConnect -
Jay Franck
San Francisco, CAConnect -
Xingxing Huang
Cupertino, CAConnect -
Alex Shvets
Cupertino, CAConnect -
Maryna Longnickel 🇺🇦
Austin, TXConnect -
Aaksha Meghawat
San Francisco, CAConnect -
Qingbo Hu
San Francisco Bay AreaConnect -
Pararth Shah
San Francisco Bay AreaConnect -
Yanning Chen
San Francisco Bay AreaConnect -
Emily (Xiang) Zuo, PhD
Large Language Model (LLM) | Vector Database | NLP | Stanford Ignite
Santa Clara, CAConnect -
Philip Y. Yang
San Mateo, CAConnect -
Sida Wang
Cupertino, CAConnect -
Avik Basu
Sunnyvale, CAConnect -
Abhinav Arora
San Francisco Bay AreaConnect -
Lei N.
San Francisco Bay AreaConnect -
Anusha Balakrishnan
San Francisco Bay AreaConnect
Explore more posts
-
Juan David Gil López
Professor Lawrence is spot on in here. Most of the ML solutions that have served us well until now have less to do with the current hype on generative AI and more with a specialized domain knowledge combined with classical modeling techniques and a great deal of software engineering. Certainly what we call Generative AI today will have an important role to play as it is introducing a new way to interact with computers to do stuff for you. However thinking that it will solve all industry/society problems is a very big stretch. https://lnkd.in/e_UX_YUd
6
-
Raul Salles de Padua
🔍 Quantum AI Series: Modeling Sequences with Quantum States Picking up where I left off in my #QuantumAI series, let’s talk about Modeling Sequences with Quantum States, with insights from a fascinating paper on the topic. Classical probability distributions can be modeled using quantum states—particularly entangled states that retain information from subsystems. This approach allows quantum models to efficiently represent sequences without losing essential details. Tensor networks play a key role in optimizing the process, providing a compact, low-rank factorization of data. As I've shared previously, Singular Value Decomposition (SVD) is foundational to these models, as it helps represent the complex correlations present in sequential data. Matrix Product States (MPS) capture these rich interactions by using reduced densities to efficiently compress sequence representations while preserving their structure. This is the essence of keeping representation while building efficient lightweight models. In summary, quantum probability models allow us to go beyond classical limits in sequential tasks, contributing to information retention. Sharing the paper for a deeper dive if you'd like so: https://lnkd.in/d5aqXyyS #QuantumAI #TensorNetworks #SVD #QuantumComputing #AI
5
-
Gregory Mermoud, PhD
Very insightful work by Anthropic’s interpretability team. And an amazing paper, with outstanding writing and figures. The idea is very simple: interpret LLMs by leveraging sparse autoencoders as surrogate models of the MLP of transformer blocks, which allow one to disambiguate the superposition of features captured by a single neuron. A simple idea, but a very careful and complex execution, as it is often the case in our line of work. The paper goes into many details and provide a large array of insights, although the gist of the implementation remains obfuscated due to the closed source nature of Claude. Too bad, because this is the kind of work that we need to better understand and eventually trust LLMs. This is demonstrated by the authors in the section ‘Influence on Behavior’, where they show that clamping some features to either high or low value during inference is “remarkably effective at modifying model outputs in specific, interpretable ways”. Hopefully this kind of work is going to be replicated and generalized to open-weights models, such that we have new ways to steer their behavior. https://lnkd.in/eVym7f_f #interpretability #xai #explainableai #steerableai #anthropic #claude #anthropic
3
-
Nicholas Bellerophon
I just watched a summary of the new paper out of Meta entitled “Large Concept Models”. https://lnkd.in/e43n6HZg I’m late to the party in posting this reaction but I was immediately intrigued and have musings I wish to share. Does anyone remember reading Babel-17 by Roger Zelazny? It centres around the link between language and behaviour. Minds were reprogrammed using an engineered language that could force itself into the subconscious. It makes me wonder how many concepts humans presently possess, collectively, and from where they originate. It seems to me that some don’t rely on language to exist, e.g. objects in the world, simple emotions, physical actions. But then there are others which seem like they can’t exist without language to describe them, e.g. “justice”, or from para 1 of the paper “explicit higher-level semantic representation”. I understand these concepts could be constructed as combinations of simpler concepts, but is there any research out there investigating how this is done in humans and animals? I ask because I wonder if a LCM will be able to imagine new concepts, or will be limited by our own languages to concepts that humans have already discovered and specified? Can we apply some of the combinatorial imagination we see demonstrated in recent Large Image/Video Models to these new LCMs?
-
Anshu Avinash
What an end to year 2024, OpenAI o3 has achieved state of the art results on ARC-AGI benchmarks. From ARC-AGI's blog (https://lnkd.in/gKw3bU4U): "To sum up – o3 represents a significant leap forward. Its performance on ARC-AGI highlights a genuine breakthrough in adaptability and generalization, in a way that no other benchmark could have made as explicit. o3 fixes the fundamental limitation of the LLM paradigm – the inability to recombine knowledge at test time – and it does so via a form of LLM-guided natural language program search. This is not just incremental progress; it is new territory, and it demands serious scientific attention." This is indeed a significant leap and in 2025, I expect few things: * OpenAI, Google and others will keep pushing the boundaries with what we can do with reasoning models, models will keep getting better - we have not yet reached the ceiling. * We will see openweights models as well which will be comparable to o3 (we already have few going in that direction like QwQ, Deepseek - both from China). * Products will make an accelerated effort for catching up both of these: https://lnkd.in/gRBuFMGi
45
-
Yoga Wigardo
I remember when this all started. Faris - invited me to catch up at our favorite spot around Blok M. I had no idea he’d just wrapped up a procedure at Pondok Indah Hospital! (He really knows how to add drama to our catch-ups.) We’d been on separate paths and projects since "the winding down project", but there he was, reminding me of our dream to ride this AI wave together again. It was a plan we’d hatched with Sahal Zain & Gurnoor Dhillon when the world of AI was still mostly theoretical and jargon-heavy. Somehow, the conversation continued with Gurnoor Dhillon while he was waiting for his meeting with someone who would eventually become one of CalvinBall’s key early adopters. It felt like everything was coming together, like a series of seemingly random dots connecting to form a bigger picture. Seeing how far we’ve come as a team at CalvinBall Technologies is truly exciting. What started as theoretical discussions has now evolved into practical Gen AI solutions that are solving everyday problems and hold endless possibilities to shape the industry. 👨🚀 👩🚀 Here’s to pushing boundaries and achieving new milestones together! Gurnoor Dhillon, Sandeep Ramesh, Faris -, Sahal Zain, Eva Zuliya, Ratu Nadiah Khairunnisa 🚀 #GenAI #AIInnovation #TechJourney #ArtificialIntelligence #TechTransformation #CalvinballTech #Calvinballers
46
2 Comments -
Christopher Foster-McBride
Thanks Hamdi Amroun, PhD - this is a great paper to share. In the real world business context we often need AI to reason over large pieces of text, and often you need/want multi-document summarization (MDS). 'When testing five LLMs with benchmarks using news and conversation datasets, they found up to 75% of content in MDS summaries was hallucinated, with notable increases towards summaries' end. Alarmingly, even for non-existent topics, models like GPT-3.5-turbo and GPT-4 generated fabricated content 79% and 44% of the time, respectively. Analysis of 700+ generated insights showed that hallucinations often arose from failures to follow instructions or overly generic content.' I agree with Hamdi's sentiments that despite improvements in post-processing techniques, robust solutions are urgently needed to reduce hallucinations, as inaccurate summaries can lead to business risks (notably misinformation and misrepresentation). I would add that there is still a lot of utility in LLMs/Multimodal models but we need to be vigilant - this is not about AI malevolence or intentional deception. LLMs do not possess consciousness or intent; they generate content based on patterns in data they were trained on.
5
1 Comment -
Dr. Aditya Raj
A recent breakthrough titled "Matrix Multiplication-Free LLMs" demonstrates a huge advancement in the area of Large Language Models (LLMs) by reducing computational costs. The authors have eliminated MatMul operations from LLMs, claiming to 10 times reduction in memory usage and a 25.6% increase in training speed, all while maintaining strong performance at billion-parameter scales. Paper link: https://lnkd.in/ggph8qXc #AI #machinelearning #deeplearning #LLMs
15
-
Mohamed Abdelhady
BERT is back in a new optimized look (ModernBERT). I still remember the old days in 2018. Yes, 6 years in the AI realm is considered old. When BERT model was considered a "large" language model with only 340M parameters. Nowadays Phi-4 model with 14B parameters is considered a "small" language model. I think that the AI world is still in need for these encoders as the BERT architecture despite the dominance of GPT-like decoders. If they are not used as the main models for the use case, the encoders are still an important component for retrieval stage in addition to the decoders for generation stage in any RAG system design. Anyway, welcome back BERT whether you are called large or small!! 🎊
56
2 Comments -
Ed Henry
Well, my day ended up taking a different path than I'd thought, yesterday! Instead of my primary research, I ended up down a literature review path. I'm sure by now you've heard that OpenAI announced it's recent model release called o1 so I thought I'd provide some interesting papers that I think might align with what is implemented within the reasoning module outlined in the o1 system card. If not, it's at least a foray into some newer methods being explored today. 😊 📄 o1 System Card: https://lnkd.in/gwSRs46w 📄 Large Language Monkeys: Scaling Inference Compute with Repeated Sampling: https://lnkd.in/gSTe43Bq 📄 ReFT: Reasoning with Reinforced Fine-Tuning : https://lnkd.in/gKPzfhfN 📄 Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning: https://lnkd.in/gqMy_dKu 📄 Reinforced Self-Training (ReST) for Language Modeling: https://lnkd.in/gTeaCqrM
22
3 Comments -
Pierre de Lacaze
LLM Research Papers: The 2024 List (Sebastian Raschka, December 2024) "It’s been a very eventful and exciting year in AI research. This is especially true if you are interested in LLMs. I had big plans for this December edition and was planning to publish a new article with a discussion of all my research highlights from 2024. I still plan to do so, but due to an accident and serious injury, I am currently unable to work at a computer and finish the draft. But I hope to recover in the upcoming weeks and be back on my feet soon. In the meantime, I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It’s just a list, but maybe it will come in handy for those who are interested in finding some gems to read for the holidays. And if you are interested in more code-heavy reading and tinkering, My Build A Large Language Model (From Scratch) book is out on Amazon as of last month. In addition, I added a lot of bonus materials to the GitHub repository." https://lnkd.in/e7d6z4Vj
2
-
Rohit Agarwal
🚀 Exploring Extrinsic Hallucinations in LLMs! 🚀 The article delves into the causes and solutions for hallucinations in large language models (LLMs). 🌟Key highlights: - Causes of Hallucinations: Addressing issues in pre-training data and the challenges of fine-tuning with new knowledge. - Detection Methods: Utilizing techniques like retrieval-augmented evaluation and sampling-based detection to identify hallucinations. - Anti-Hallucination Techniques: Implementing strategies such as Retrieval-Augmented Generation (RAG), fine-tuning for factuality, and employing retrieval methods to enhance accuracy. https://lnkd.in/g3j5kU5b For more updates, follow Rohit Agarwal! #AI #LLMs #MachineLearning #TechInnovation #DataScience #ModelAccuracy #FactualAI #HallucinationDetection
4
-
Hussain Nadim
Chaos Report: Iran's Response Matrix: A Game Theory Analysis Iran is in a classic decision-making bind: it has to retaliate but it must also be able to survive. However, there is already a problem: it is thinking too much and acting too late. More here on my Substack: https://lnkd.in/gQzQnAKj
10
1 Comment -
Joel Rorseth
Ever wonder how exactly your knowledge sources are being used during RAG? We did too. I'm excited to share our early work in explaining LLMs, "RAGE Against the Machine: Retrieval-Augmented LLM Explanations"! 🎸✨ 📄 Paper: https://lnkd.in/eV4qF8YV 📺 Video: https://lnkd.in/e6mNV8tt Last week at ICDE 2024, I introduced RAGE, an interactive tool designed to explain the outputs of LLMs augmented with retrieval capabilities (RAG). RAG needs explaining since it is unclear how the presence and order of knowledge sources (among other things) affects the LLM's answer! 🤖💡 RAGE helps users understand how LLMs generate answers by identifying parts of the input context that, when moved or removed, change the LLM's response. This counterfactual approach makes AI decision-making more transparent, and allows us to derive a form of citation. 🔄📊 Key features include: - Combination Tests: Which sources lead to which answers? 🔍 - Permutation Tests: How does the answer change if sources are reordered? 🔀 - Interactive Demo: Explore behaviors of real LLMs using real data. 📚 Whether it's determining the greatest tennis player or answering your own unique questions, RAGE supports various use cases. Our code (coming soon) is designed with interfaces, making it highly adaptable to any LLM, retriever, or text data source! 🎾🏆 For technical details and implications of our findings, please check out our paper. Along with my co-authors at University of Waterloo, York University and AT&T, we are working on several extensions to explain many unexplained aspects of LLMs, RAG, and beyond. Stay tuned for more updates and an open-source RAGE code release in the near future! 🚀💡 #ExplainableAI #LLM #RAG #Research
15
2 Comments -
Hai Huang
Stole a few points from Prof Christopher Manning, who talked about LLMs and language modeling in general in the latest TWIML AI Podcast: 📌 Humans acquire language skills in a way very different from LLMs. We need millions of words, compared to billions or even trillions of tokens for LLMs. LLM researchers may want to investigate and learn from how humans acquire language skills. 📌 LLMs cannot reason. However, there are other deep learning models, such as AlphaGo, that can. LLM researchers may want to look into how to integrate that type of reasoning/searching/planning capability into LLMs. 📌 LLMs' world models should enable search and discovery. Although Prof. Manning didn’t call this out explicitly, my understanding is more similar to a knowledge graph type of structure. 📌 Next-gen LLM idea: a soft form of locality and hierarchy. Transformers attend every token to every other token, which is very inefficient, while human language can be modeled by n-grams most of the time. #artificialintelligence #machinelearning #deeplearning https://lnkd.in/euzwMQ6p
84
18 Comments -
Anoop Kunchukuttan
Glad to share we have a tutorial accepted at EMNLP 2025 on "Data and Model Centric Approaches for Expansion of Large Language Models to New Languages" - with Raj Dabre Mohammed Safi Ur Rahman Khan Thanmay Jayakumar Rudramurthy V You can see an early version of this here: https://lnkd.in/eS42rvHr But there is so much stuff happening, this is already behind the current literature .. so there will be a lot more to discuss by next year!
135
1 Comment -
Jeremy Kedziora, Ph.D.
One of the most rewarding things I get to do is to collaborate on the creation of new knowledge. And every now and then something pays off! So, this morning I'll share a preprint manuscript (currently on arxiv here: https://lnkd.in/gQTu4S46, soon to be submitted) that my super-talented former students and coauthors Jonathan Keane, Samuel Keyser, and I just finished. The TL;DR is that we studied methods for constructing guardrails for AI agents that use reward functions to learn decision making. We introduced an approach (which we call strategy masking) to explicitly learn and then suppress undesirable AI agent behavior. We applied our method to study lying in AI agents and showed that it can be used to effectively modify agent behavior by suppressing lying post-training without compromising agent ability to perform effectively. Fun stuff!
64
-
César Beltrán Miralles
Exciting news from UC Berkeley and Anyscale! 🚀 Introducing RouteLLM: an open-source framework optimizing LLM deployment by balancing cost and performance, reducing expenses by up to 48% compared to random routing. - 🤖 Developed with Canva, RouteLLM uses preference data to train routers, ensuring queries are directed to the most cost-effective LLMs without sacrificing quality. - 📊 Benchmarked against commercial systems, RouteLLM consistently achieves similar performance to high-cost models like GPT-4 Turbo while being over 40% cheaper. - 🌍 Demonstrating robustness, RouteLLM adapts seamlessly to different LLM pairs without requiring retraining, enhancing scalability and applicability. #AI #MachineLearning #CostEffective - RouteLLM's open-source release includes datasets and code, promoting accessibility and further development in AI deployment strategies. - The framework's use of preference data allows for nuanced decision-making in LLM routing, optimizing both cost efficiency and performance. - RouteLLM's success in reducing costs while maintaining quality positions it as a pivotal tool for businesses integrating large language models into operations. Researchers from UC Berkeley and Anyscale Introduce RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing https://lnkd.in/gucFugz7
-
Hai Huang
You don’t normally see LLM researchers and engineers talking about their day-to-day work. Big shoutout to this YouTube video from the Anthropic LLM Interpretability team, the team behind the Golden Gate Claude. It’s amazing to see that most of the team members have worked for the team for only one year, and yet they have made such a big breakthrough in interpretability research. Another thing I want to highlight is the unique challenges facing the team and how they approach these challenges, that is, through fast trial-and-error iterations and constant evaluation of what may work and what to prioritize next. #artificialintelligence #machinelearning #deeplearning https://lnkd.in/eUaPAYch
97
1 Comment
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Roman H. in United States
-
Roman Sterling H.
Los Angeles Metropolitan Area -
Roman H
Account Coordinator
Hilliard, OH -
Roman H
owner at ae construction
Justice, IL -
Roman H
--
Springfield, IL -
Roman H
--
Willowbrook, IL
11 others named Roman H. in United States are on LinkedIn
See others named Roman H.